AK_Analysis_Updated.Rmd

---
title: "Final_Project_Initial_Analysis"
author: "Arunima Kayath"
date: "April 13, 2019"
output: 
  pdf_document: 
    latex_engine: xelatex
---

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = FALSE)
```

```{r}
library(data.table)
library(stargazer)
```


```{r cars}
setwd("C:/Users/Arunima/Documents/w241/Final_Project/AK_coding_experiment/")
d <- read.csv("CodingExperimentFinalResults.csv")
dt <- as.data.table(d)
head(dt)
colnames(dt)
```

```{r EDA}
#How many people participated in week 1 and 2, by source
dt[,.(sum(week1_attempted),sum(week2_attempted)), by = .(upwork,berkeley_student)]
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),.N,by = .(upwork,berkeley_student)]


```

```{r}
#understand bias in attrition
#Average scores for attriders and non-attriders:

dt[week1_attempted == 1,.(.N,mean(w1_t1_score + w1_t2_score)), by = week2_attempted]

```

```{r}

#understand bias in didn't answer pre-treat correctly

table(Control = dt[week1_attempted == 1,pre_task_control], Treat = dt[week1_attempted == 1,pre_task_treat])
table(Control = dt[(week1_attempted == 1) & (upwork == 1),pre_task_control], Treat = dt[(week1_attempted == 1) & (upwork == 1),pre_task_treat])
table(Control = dt[(week1_attempted == 1) & (berkeley_student == 1),pre_task_control], Treat = dt[(week1_attempted == 1) & (berkeley_student == 1),pre_task_treat])

#Attrition / compliance by control groups
dt[week1_attempted == 1, .N, by = .(control_first)]
dt[week2_attempted == 1, .N, by = .(control_first)]
dt[(week2_attempted == 1) & (pre_task_control == 1) & (pre_task_treat == 1) & (pre_task_control == 1), .N, by = .(control_first)]
dt[(week2_attempted == 1) & (pre_task_control == 1) & (pre_task_treat == 1) & (pre_task_control == 1) & (w1_t3_score > 0 ) & (w2_t3_score > 0), .N, by = .(control_first)]

#Attrition / compliance by where they were recruited
dt[week1_attempted == 1, .N, by = .(upwork,berkeley_student)]
dt[week2_attempted == 1, .N, by = .(upwork,berkeley_student)]
dt[(week2_attempted == 1) & (pre_task_control == 1) & (pre_task_treat == 1) & (pre_task_control == 1), .N, by = .(upwork,berkeley_student)]
dt[(week2_attempted == 1) & (pre_task_control == 1) & (pre_task_treat == 1) & (pre_task_control == 1) & (w1_t3_score > 0 ) & (w2_t3_score > 0), .N, by = .(upwork,berkeley_student)]


dt[week2_attempted == 1, .N, by = .(control_first,pre_task_treat)]
dt[week1_attempted == 1,.(.N,mean(w1_t1_score + w1_t2_score)/2)]
dt[(week1_attempted == 1) & (subject_id != 28) & (subject_id != 57),.(.N,mean(w1_t1_score + w1_t2_score)/2), by = .(week2_attempted)]
dt[week1_attempted == 1,.(.N,mean(w1_t1_score + w1_t2_score)/2), by = .(week2_attempted)]
dt[(week1_attempted == 1) & (week2_attempted == 1),.(.N,mean(w1_t1_score + w1_t2_score)/2), by = pre_task_treat]
dt[(week1_attempted == 1) & (week2_attempted == 1) &(pre_task_treat == 1),.(.N,mean(w1_t1_score + w1_t2_score)/2), by = pre_task_control]
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,mean(w1_t1_score + w1_t2_score)/2), by = .(pre_task_control,control_first)]

dt[(week1_attempted == 1) & (week2_attempted == 1) &(pre_task_treat == 1) & (pre_task_control == 1) & (w1_t3_score > 0) & (w2_t3_score > 0),.(.N,mean(w1_t1_score + w1_t2_score)/2)]
#dt[week1_attempted == 1,.(.N,mean(w1_t1_score + w1_t2_score + w2_t1_score + w1_t2_score), Attempted_week_2 = sum(week2_attempted)), by = .(pre_task_treat,pre_task_control)]

```


```{r}
#using question 1 as the benchmark (since that was same across test and control, what were scores across treatment and control for the different groups)

#controlling for those who participated in both

dt[(week1_attempted == 1) & (week2_attempted == 1) ,.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = .(upwork,control_first)]

dt[(week1_attempted == 1) & (week2_attempted == 1),.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = .(upwork,berkeley_student,control_first,pre_task_control,pre_task_treat)]

#controlling for those who participated in both, and registered treatment

dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = .(control_first,pre_task_treat)]

dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = .(upwork,control_first,pre_task_treat)]


#controlling for those who participated in both and registered for treatment and control
#overall
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = control_first]

#by blocks
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = .(upwork,control_first)]

dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),.(.N,w1_t1_avg = mean(w1_t1_score), w1_t1_mins = mean(w1_t1_mins),w2_t1_avg = mean(w2_t1_score), w2_t1_mins = mean(w2_t1_mins)), by = .(upwork,berkeley_student,control_first,pre_task_control,pre_task_treat)]


```
```{r}

#How many were able to finish Ques 3.
dt[(week1_attempted == 1) & (week2_attempted == 1) & ((w1_t3_score > 0) & (w2_t3_score > 0)),.(.N,sum(upwork),sum(upwork_20dollars),sum(berkeley_student), sum(control_first),w1_t1_score,w1_t1_mins,w1_t2_score,w1_t2_mins,w1_t3_score,w2_t3_score,w1_t3_mins,w1_score,(w1_t1_score + w1_t2_score)/2)]

```

```{r}
#creating total score, control and treatment score, score difference columns
dt$w1_t12sum_score = (dt$w1_t1_score + dt$w1_t2_score)/2
dt$w2_t12sum_score = (dt$w2_t1_score + dt$w2_t2_score)/2
dt$w1_t12sum_mins = (dt$w1_t1_mins + dt$w1_t2_mins)
dt$w2_t12sum_mins = (dt$w2_t1_mins + dt$w2_t2_mins)
dt$ctrl_t12_score = ifelse(dt$control_first == 1,dt$w1_t12sum_score, dt$w2_t12sum_score)
dt$treat_t12_score = ifelse(dt$control_first == 1, dt$w2_t12sum_score,dt$w1_t12sum_score)
dt$ctrl_t12_mins = ifelse(dt$control_first == 1,dt$w1_t12sum_mins, dt$w2_t12sum_mins)
dt$treat_t12_mins = ifelse(dt$control_first == 1, dt$w2_t12sum_mins,dt$w1_t12sum_mins)
dt$score_diff = dt$treat_t12_score - dt$ctrl_t12_score
dt$mins_diff = dt$treat_t12_mins - dt$ctrl_t12_mins

setnames(dt, old=c("w1_t12sum_score","w2_t12sum_score", "w1_t12sum_mins", "w2_t12sum_mins"), new=c("Week1_Average_Score","Week2_Average_Score","Week1_Total_Minutes","Week2_Total_Minutes"))
#names(dt) <- c("w1_t12sum_score":"Week1_Average_Score")

head(dt)


```
```{r}
dt$ctrl_t1_score = ifelse(dt$control_first == 1,dt$w1_t1_score,dt$w2_t1_score)
dt$treat_t1_score = ifelse(dt$control_first == 1,dt$w2_t1_score,dt$w1_t1_score)
dt$t1_score_diff = dt$treat_t1_score - dt$ctrl_t1_score
mean(dt$t1_score_diff, na.rm = TRUE)

```
```{r}
#T-test for whether we observe a significant difference
t.test(dt[(week1_attempted == 1) & (week2_attempted == 1),ctrl_t12_score],dt[(week1_attempted == 1) & (week2_attempted == 1),treat_t12_score], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),ctrl_t12_score],dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),treat_t12_score], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1),ctrl_t12_score],dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1),treat_t12_score], paired = TRUE)


t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1) & (pre_task_treat == 1) & (pre_task_control == 1),ctrl_t12_score],dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1) & (pre_task_treat == 1) & (pre_task_control == 1),treat_t12_score], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1),ctrl_t12_score],dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1),treat_t12_score], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1) & (pre_task_treat == 1) & (pre_task_control == 1),ctrl_t12_score],dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1)  & (pre_task_treat == 1) & (pre_task_control == 1),treat_t12_score], paired = TRUE)

```

```{r}
both_attempted = dt[(week1_attempted == 1) & (week2_attempted == 1)]
flattened_dt <- data.table(
  subject_id=c(
    both_attempted[, subject_id],
    both_attempted[, subject_id]
  ),
  t1t2_mins=c(
    both_attempted[, Week1_Total_Minutes],
    both_attempted[, Week2_Total_Minutes]
  ),
  t1t2_score=c(
    both_attempted[, (w1_t1_score + w1_t2_score) / 2],
    both_attempted[, (w2_t1_score + w2_t2_score) / 2]
  ),
  total_mins=c(
    both_attempted[, w1_total_mins],
    both_attempted[, w2_total_mins]
  ),
  score=c(
    both_attempted[, w1_score],
    both_attempted[, w2_score]
  ),
  treat=c(
    both_attempted[, abs(control_first - 1)],
    both_attempted[, control_first]
  ),
  upwork=c(both_attempted[, upwork], both_attempted[, upwork]),
  berkeley_student=c(both_attempted[, berkeley_student], both_attempted[, berkeley_student]),
  control_first=c(both_attempted[, control_first], both_attempted[, control_first]),
  years_experience=c(both_attempted[, years_experience], both_attempted[, years_experience]),
  age=c(both_attempted[, age], both_attempted[, age])
)

setnames(flattened_dt, old=c("t1t2_score", "t1t2_mins"), new=c("Average_Score","Total_Minutes"))

lmAllWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt)
lmUpworkWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt[upwork == 1])
lmBerkeleyWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt[berkeley_student == 1])
lmOthersWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt[(upwork == 0) & (berkeley_student == 0)])
lmAllMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt)
lmUpworkMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt[upwork == 1])
lmBerkeleyMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt[berkeley_student == 1])
lmOthersMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt[(upwork == 0) & (berkeley_student == 0)])
```

```{r results = 'asis'}
stargazer(lmAllWithin, lmAllMins, type = 'text', omit = 'subject_id', report = "vcsp*")
```
```{r}
stargazer(lmAllWithin, lmUpworkWithin, lmBerkeleyWithin, lmOthersWithin,column.labels   = c("All", "Upwork", "Berkeley Students", "Others"), type = 'text', omit = 'subject_id', report = "vcsp*")

```
```{r}
stargazer(lmAllMins, lmUpworkMins, lmBerkeleyMins, lmOthersMins,column.labels   = c("All", "Upwork", "Berkeley Students", "Others"), type = 'text', omit = 'subject_id', report = "vcsp*")

```

```{r}
#Within subjects, compliers only
both_attempted = dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1)]
flattened_dt <- data.table(
  subject_id=c(
    both_attempted[, subject_id],
    both_attempted[, subject_id]
  ),
  t1t2_mins=c(
    both_attempted[, Week1_Total_Minutes],
    both_attempted[, Week2_Total_Minutes]
  ),
  t1t2_score=c(
    both_attempted[, (w1_t1_score + w1_t2_score) / 2],
    both_attempted[, (w2_t1_score + w2_t2_score) / 2]
  ),
  total_mins=c(
    both_attempted[, w1_total_mins],
    both_attempted[, w2_total_mins]
  ),
  score=c(
    both_attempted[, w1_score],
    both_attempted[, w2_score]
  ),
  treat=c(
    both_attempted[, abs(control_first - 1)],
    both_attempted[, control_first]
  ),
  upwork=c(both_attempted[, upwork], both_attempted[, upwork]),
  berkeley_student=c(both_attempted[, berkeley_student], both_attempted[, berkeley_student]),
  control_first=c(both_attempted[, control_first], both_attempted[, control_first]),
  years_experience=c(both_attempted[, years_experience], both_attempted[, years_experience]),
  age=c(both_attempted[, age], both_attempted[, age])
)

setnames(flattened_dt, old=c("t1t2_score", "t1t2_mins"), new=c("Average_Score","Total_Minutes"))

lmAllWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt)
lmUpworkWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt[upwork == 1])
lmBerkeleyWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt[berkeley_student == 1])
lmOthersWithin <- lm(Average_Score ~ treat + factor(subject_id), data=flattened_dt[(upwork == 0) & (berkeley_student == 0)])
lmAllMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt)
lmUpworkMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt[upwork == 1])
lmBerkeleyMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt[berkeley_student == 1])
lmOthersMins <- lm(Total_Minutes ~ treat + factor(subject_id), data=flattened_dt[(upwork == 0) & (berkeley_student == 0)])
```

```{r}
stargazer(lmAllWithin, lmUpworkWithin, lmBerkeleyWithin, lmOthersWithin,column.labels   = c("All Compliers", "Upwork Compliers", "Berkeley Students Compliers", "Others Compliers"), type = 'text', omit = 'subject_id', report = "vcsp*")

```

```{r}
stargazer(lmAllMins, lmUpworkMins, lmBerkeleyMins, lmOthersMins,column.labels   = c("All Compliers", "Upwork  Compliers", "Berkeley Students  Compliers", "Others  Compliers"), type = 'text', omit = 'subject_id', report = "vcsp*")

```


```{r}
#T-test for whether we observe a significant difference
t.test(dt[(week1_attempted == 1) & (week2_attempted == 1),ctrl_t12_mins],dt[(week1_attempted == 1) & (week2_attempted == 1),treat_t12_mins], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),ctrl_t12_mins],dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (pre_task_control == 1),treat_t12_mins], paired = TRUE)


t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1),ctrl_t12_mins],dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1),treat_t12_mins], paired = TRUE)


t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1) & (pre_task_treat == 1) & (pre_task_control == 1),ctrl_t12_mins],dt[(week1_attempted == 1) & (week2_attempted == 1) & (upwork == 1) & (pre_task_treat == 1) & (pre_task_control == 1),treat_t12_mins], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1),ctrl_t12_mins],dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1),treat_t12_mins], paired = TRUE)

t.test(dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1) & (pre_task_treat == 1) & (pre_task_control == 1),ctrl_t12_mins],dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1)  & (pre_task_treat == 1) & (pre_task_control == 1),treat_t12_mins], paired = TRUE)
```


```{r}
dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1),mean(ctrl_t12_score)]
dt[(week1_attempted == 1) & (week2_attempted == 1) & (berkeley_student == 1),mean(treat_t12_score)]
```

```{r}
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) ,.(.N,mean(ctrl_t12_score),mean(treat_t12_score),mean(score_diff)), by = .(upwork)]

```


```{r}
lmWithinScore <- lm(score_diff ~ control_first + upwork + berkeley_student + years_experience + team_size + control_first + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1)])
summary(lmWithinScore)

lmWithinMins <- lm(mins_diff ~ control_first + upwork + berkeley_student + years_experience + team_size + control_first + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1)])
summary(lmWithinMins)

lmWithinScoreSub <- lm(score_diff ~ control_first + upwork + berkeley_student + years_experience + team_size + control_first + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_control == 1) & (pre_task_treat == 1)])
summary(lmWithinScoreSub)

lmWithinMinsSub <- lm(mins_diff ~ control_first + upwork + berkeley_student + years_experience + team_size + control_first + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_control == 1) & (pre_task_treat == 1)])
summary(lmWithinMinsSub)
```

Observations :
Control first is slighly statistically significant. This means that the order in which people saw test or control made a difference to their outcome. People who were in control first were likely to do a little worse. This could be because people who saw treatment first assumed treatment would continue ie there wasn't a sufficiently long washout period.

```{r results = 'asis'}
#targazer(drivers, header=FALSE, type="html", title="Summary statistics")
stargazer(lmw1Score,lmw2Score, lmw1Mins, lmw2Mins, type = "text")

```


```{r}

#Regression of week 1 score on variables including control_first (in this case a proxy for whether the person was in test or control for week 1)

dt$week1_pre_task = ifelse(dt$control_first,dt$pre_task_control,dt$pre_task_treat)
dt$Treat = 1-dt$control_first


lmw1ScoreSmall <- lm(Week1_Average_Score ~ Treat + upwork + berkeley_student, data = dt[(week1_attempted == 1)])
summary(lmw1ScoreSmall)

lmw1Score <- lm(Week1_Average_Score ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week1_attempted == 1)])
summary(lmw1Score)

lmw1ScoreSub <- lm(Week1_Average_Score ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week1_attempted == 1) & (week1_pre_task == 1)])
summary(lmw1ScoreSub)

lmw1ScoreSubSmall <- lm(Week1_Average_Score ~ Treat + upwork + berkeley_student, data = dt[(week1_attempted == 1) & (week1_pre_task == 1)])
summary(lmw1ScoreSubSmall)

lmw1MinsSmall <- lm(Week1_Total_Minutes ~ Treat + upwork + berkeley_student, data = dt[(week1_attempted == 1)])
summary(lmw1MinsSmall)

lmw1Mins <- lm(Week1_Total_Minutes ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week1_attempted == 1)])
summary(lmw1Mins)

lmw1MinsSub <- lm(Week1_Total_Minutes ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week1_attempted == 1) & (week1_pre_task == 1)])
summary(lmw1MinsSub)

lmw1MinsSubSmall <- lm(Week1_Total_Minutes ~ Treat + upwork + berkeley_student, data = dt[(week1_attempted == 1) & (week1_pre_task == 1)])
summary(lmw1MinsSub)

#lmw1Score <- lm(w1_t12sum_score ~ control_first + upwork + berkeley_student + years_experience + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1)])
#summary(lmw1Score)

dt$week2_pre_task = ifelse(dt$control_first,dt$pre_task_treat,dt$pre_task_control)
dt$Treat = dt$control_first

lmw2ScoreSmall <- lm(Week2_Average_Score ~ Treat + upwork + berkeley_student, data = dt[(week2_attempted == 1)])
summary(lmw2ScoreSmall)

lmw2Score <- lm(Week2_Average_Score ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week2_attempted == 1)])
summary(lmw2Score)

lmw2ScoreSub <- lm(Week2_Average_Score ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week2_attempted == 1) & (week2_pre_task == 1)])
summary(lmw2ScoreSub)

lmw2ScoreSubSmall <- lm(Week2_Average_Score ~ Treat + upwork + berkeley_student, data = dt[(week2_attempted == 1) & (week2_pre_task == 1)])
summary(lmw2ScoreSub)

lmw2MinsSmall <- lm(Week2_Total_Minutes ~ Treat + upwork + berkeley_student, data = dt[(week2_attempted == 1)])
summary(lmw2MinsSmall)

lmw2Mins <- lm(Week2_Total_Minutes ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week2_attempted == 1)])
summary(lmw2Mins)

lmw2MinsSub <- lm(Week2_Total_Minutes ~ Treat + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week2_attempted == 1) & (week2_pre_task == 1)])
summary(lmw2MinsSub)

lmw2MinsSubSmall <- lm(Week2_Total_Minutes ~ Treat + upwork + berkeley_student, data = dt[(week2_attempted == 1) & (week2_pre_task == 1)])
summary(lmw2MinsSub)

#lmw2Score <- lm(w2_t12sum_score ~ control_first + upwork + berkeley_student + years_experience + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1)])
#summary(lmw2Score)
```

```{r results = 'asis'}
#targazer(drivers, header=FALSE, type="html", title="Summary statistics")
stargazer(lmw1ScoreSmall,lmw2ScoreSmall, lmw1MinsSmall, lmw2MinsSmall, type = "text")

```

```{r results = 'asis'}
#targazer(drivers, header=FALSE, type="html", title="Summary statistics")
stargazer(lmw1ScoreSubSmall,lmw2ScoreSubSmall, lmw1MinsSubSmall, lmw2MinsSubSmall, type = "text")

```

```{r results = 'asis'}
#targazer(drivers, header=FALSE, type="html", title="Summary statistics")
stargazer(lmw1Score,lmw2Score, lmw1Mins, lmw2Mins, type = "text")

```

```{r results = 'asis'}
#targazer(drivers, header=FALSE, type="html", title="Summary statistics")
stargazer(lmw1ScoreSub,lmw2ScoreSub, lmw1MinsSub, lmw2MinsSub, type = "text")

```


```{r}

#Zooming into week 1, test 1 score since test was standard there.

dt$Control = dt$control_first
#ITT
lmw1t1Score <- lm(w1_t1_score ~ Control + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week1_attempted == 1)])
summary(lmw1t1Score)

#CACE
lmw1t1Score <- lm(w1_t1_score ~ Control + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week1_attempted == 1) & (week1_pre_task == 1)])
summary(lmw1t1Score)

#lmw1t1Score <- lm(w1_t1_score ~ control_first + upwork + berkeley_student + years_experience, data = dt[(week1_attempted == 1) & (week2_attempted == 1)])
#summary(lmw1t1Score)

#Zooming into week 2, test 1 score since test was standard there. Also platform orientation effects should be smaller here.
#ITT
dt$Control = 1-dt$control_first
lmw2t1Score <- lm(w2_t1_score ~ Control + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week2_attempted == 1)])
summary(lmw2t1Score)

#CACE
lmw2t1Score <- lm(w2_t1_score ~ Control + upwork + berkeley_student + years_experience + age + team_size, data = dt[(week2_attempted == 1) & (week2_pre_task == 1)])
summary(lmw2t1Score)

#lmw2t1Score <- lm(w2_t1_score ~ control_first + upwork + berkeley_student + years_experience + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1)])
#summary(lmw2t1Score)

#Combined test 1 and 2 score since both were same for everyone.
#dt$t1_score = dt$w1_t1_score + dt$w2_t1_score

#lmt1Score <- lm(t1_score ~ control_first + upwork + berkeley_student + years_experience + age, data = dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1)])
#summary(lmt1Score)


```

Observations:
- Control first does not show up as a significant variable in predicting week 1 test 1 score, or week 2 test 1 score amongst compliers.


```{r Ques 3}

#Determining the effect of being in test or control or being first or second in order on the score for Q3, Q4

dt$Q3_score = ifelse(dt$w1_t2_id == 3,dt$w1_t2_score,dt$w2_t2_score)
dt$Q3_order = ifelse(dt$w1_t2_id == 3,'w1','w2')
dt$Q3_control = ifelse(dt$w1_t2_id == 3,ifelse(dt$control_first == 1,'C','T'),ifelse(dt$control_first == 1,'T','C'))

#dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,avg_Q3_score = mean(Q3_score))]

dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,avg_Q3_score = mean(Q3_score)),by = .(Q3_order,Q3_control)]

dt$Q4_score = ifelse(dt$w1_t2_id == 4,dt$w1_t2_score,dt$w2_t2_score)
dt$Q4_order = ifelse(dt$w1_t2_id == 4,'w1','w2')
dt$Q4_control = ifelse(dt$w1_t2_id == 4,ifelse(dt$control_first == 1,'C','T'),ifelse(dt$control_first == 1,'T','C'))

#dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,avg_Q4_score = mean(Q4_score),sum(Q4_score > 0))]
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1),.(.N,avg_Q4_score = mean(Q4_score),sum(Q4_score > 0),sum(Q4_score == 100)),by = .(Q4_order,Q4_control)]

#head(dt[,.(w1_t2_id,w2_t2_id,w1_t2_score,w2_t2_score,Q4_score,Q4_order)])

lmQ3 <- lm(Q3_score ~ upwork + berkeley_student + years_experience + age + team_size + Q3_control + Q3_order + Q3_control*Q3_order,data = dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1)])
summary(lmQ3)

lmQ4 <- lm(Q4_score ~ upwork + berkeley_student + years_experience + age + team_size + Q4_control + Q4_order ,data = dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1)])
summary(lmQ4)


```

Observations:
- For Q3, control scores are very similar in week 1 and 2. However, treatment scores are very different in week 1 and 2. Based on the regression, neither being in control or order (week 1 or 2) is statistically significant for the score on this test. Interestingly, year_experience is mildly significant. Although team size shows up as significant for a couple of sizes, the pattern of that with team size doesn't seem intuitive.

- For Q4, average scores in week 2 were lower whether it was test or control. In digging in further (see people with score < 100 below), some people just had a score of 0 on the test. All others had a score of 100. So this is likely noise. Again, neither control vs treat, nor being in week 1 or 2 shows up as siginificant for predicting the score on Q4.


```{r}
dt[(week1_attempted == 1) & (week2_attempted == 1) & (pre_task_treat == 1) & (Q4_score <100),.(subject_id,Q4_score)]
```
```{r}
dt$total_1to4_score = dt$w1_t12sum_score + dt$w2_t12sum_score

lm1to4score <- lm(total_1to4_score ~ upwork + berkeley_student + years_experience + age + team_size,data = dt[(week1_attempted == 1) & (week2_attempted == 1)])

summary(lm1to4score)

```

```{r}
# Reference for doing a within subject analysis
# lm( y ~ treat + factor(ID))

```

### Observations:

Looking at the total score, years of experinece and team size are statistically significant in predicting total scores amongst those who took both the tests. People with more years of experienceshave slightly higher scores +10, while people who have worked in small teams have lower scores. Note that we didn't randomize these variables, so we can't make causal interpretations here.