# Exercise: Using Data from a SMART to Address Primary Aims about Embedded Adaptive Interventions


</br>
<font size=3>
    This material has been developed for [Getting SMART About Adaptive Interventions in Education](https://d3lab.isr.umich.edu/training/) led by [d3lab](https://d3lab.isr.umich.edu). 
    
    Notebooks were developed by [Nicholas J. Seewald](https://nickseewald.com). 
    SAS code originally written by Daniel Almirall, Inbal Nahum-Shani, and Susan A. Murphy.
    The code was translated into R by Audrey Boruvka and Nicholas J. Seewald.
</font>


### Exercise Tasks
- [Task 1: Create an indicator for whether an individual is consistent with (JASP+EMT, INTENSIFY)](#task-1)
- [Task 2: Create weights](#task-2)
- [Task 3: Fit a regression model to estimate the main effect of second-stage treatment among slow-responders to JASP + EMT](#task-3)
- [Task 4: Compute sample size for a comparison of first-stage main effects](#task-4)

<hr>

In the series of practicum exercises, we'll be using *simulated* data in the context of the so-called autism SMART:
<img src="assets/autism-smart-diagram.jpg" alt="Autism SMART diagram" style="width: 500px;"/>

**First-Stage Coding**:
- JASP+EMT: A1 = 1
- JASP+EMT+SGD: A1 = -1

**Second-Stage Coding**:
- ADD SGD: A2 = 1 
- INTENSIFY: A2 = -1


## Function Definitions
The file `functions.R` contains code that will help us produce cleaner output from some of the models we'll fit in this module. Advanced R users are encouraged to look at this file to see how these functions work; otherwise, just know that this code will help us mimic SAS's estimate statements which are used in the training slides. <a href="ADHD_Data_Description_Handout.pdf"> THIS LINK DOES NOT WORK </a>

In [1]:
library(geepack)
source('functions.R')

function 'estimate' loaded successfully.


As in the [Main Effects Practicum](01_MainEffects_Practicum.ipynb), we need to do some data management before we can get started. See that notebook for more details; here, just run the cell below to perform all necessary operations.

In [1]:
aut <- read.csv("assets/autism-simulated-dataset.csv")
names(aut) <- tolower(names(aut))
aut <- aut[order(aut$id), ]

aut$o11c <- with(aut, o11 - mean(o11))
aut$o12c <- with(aut, o12 - mean(o12))
aut$o21c <- with(aut, o21 - mean(o21))
aut$o22c <- with(aut, o22 - mean(o22))
aut$o11cnr <- aut$o12cnr <- NA
aut$o21cnr <- aut$o22cnr <- NA
aut$o11cnr[aut$r == 0] <- with(subset(aut, r == 0), o11 - mean(o11))
aut$o12cnr[aut$r == 0] <- with(subset(aut, r == 0), o12 - mean(o12))
aut$o21cnr[aut$r == 0] <- with(subset(aut, r == 0), o21 - mean(o21))
aut$o22cnr[aut$r == 0] <- with(subset(aut, r == 0), o22 - mean(o22))

aut$s <- ifelse(aut$a1 == 1 & aut$r == 0, 1, 0)

aut <- aut[order(aut$id), ]

## Part 1: Estimate the mean outcome under an embedded AI

We'll start by creating an indicator for the (JASP+EMT, INTENSIFY) adaptive intervention. The indicator, which we'll call $Z_1$, is defined as 
$$
Z_1 = \left\{ 
\begin{array}{lr}
    1  & \text{Individual consistent with (JASP+EMT, INTENSIFY)} \\
    -1 & \text{otherwise}
\end{array}
\right. .
$$

### <a name="task-1"></a> Task 1: Create an indicator for whether an individual is consistent with (JASP+EMT, INTENSIFY)
Below, we start code to create the indicator $Z_1$ described above. Fill in the blanks to finish the code.

In [3]:
aut$z1 <- -1
#responders to JASP+EMT are consistent with (JASP+EMT, INTENSIFY)
aut$z1[aut$a1 == 1 & aut$r == 1] <- 1
# non-responders to JASP+EMT who receive INTENSIFY are consistent
aut$z1[aut$a1 == _____ & aut$r == _____ & aut$a2 == _____] <- 1

table(aut$z1)


 -1   1 
128  72 

When you are done, keep your cursor in the above cell and press `SHIFT`+`ENTER`. The table should show that **72** children are consistent with (JASP+EMT, INTENSIFY) (i.e., there are 72 1's in the table).

### <a name="task-2"></a> Task 2: Create weights
In order to estimate the mean outcome under (JASP+EMT, INTENSIFY), we need to construct weights which account for the imbalance (by design) in the numbers of responders and slow-responders who are consistent with this AI. 

Remember that the probability that a responder follows any given adaptive intervention is 1/2. The probability that a slow responder to JASP+EMT+SGD is consistent with the single AI that begins with that intervention is 1/2. Slow-responders to JASP+EMT are consistent with those AIs with probability 1/4. Therefore, we want to weight slow responders to JASP+EMT by 4, and all other children by 2. 

Below, you'll create the weight variable, called `w`.

In [6]:
# Start by giving everyone a weight of 2
aut$w <- 2

# Give slow responders to JASP+EMT (A1 = 1) an appropriate weight
aut$w[aut$a1 == _____ & aut$r == _____] <- _____

table(aut$w)

ERROR: Error in parse(text = x, srcfile = src): <text>:5:17: unexpected input
4: # Give slow responders to JASP+EMT (A1 = 1) an appropriate weight
5: aut$w[aut$a1 == _
                   ^


When you've filled in the blanks above, keep your cursor in the cell and press `SHIFT`+ `ENTER` to run the code. If you've completed the task successfully, there will be **56** children with a weight of 4.

### Modeling
*You will need to have completed Task 1 to run the code below.*

In [7]:
## Run weighted regression

model3 <- geeglm(y ~ z1, weights = w, id = id, data = aut)

estimate(model3,
         rbind("Mean Y under AI #1 (JASP+EMT, INTENSIFY)" = c(1, 1)))

ERROR: Error in geeglm(y ~ z1, weights = w, id = id, data = aut): could not find function "geeglm"


An alternative way to estimate the mean under (JASP+EMT, INTENSIFY) is to restrict the analysis to just children with `z1 == 1`, and then just estimate a weighted mean (i.e., fit an intercept-only model).

In [8]:
model3alternative <- geeglm(y ~ 1, weights = w, id = id, data = aut,
                            subset = z1 == 1)

summary(model3alternative)

ERROR: Error in geeglm(y ~ 1, weights = w, id = id, data = aut, subset = z1 == : could not find function "geeglm"


## Part 2: Compare the means of two embedded adaptive interventions
We are now going to compare the mean outcomes had every child been consistent with (JASP+EMT+SGD, INTENSIFY) to the mean outcomes had every child been consistent with (JASP+EMT, Add SGD). The goal is to do this simultaneously (i.e., with one regression). This also facilitates making inferences about the difference in means.

Below, we use an intuitive (but less efficient) way to compare these two adaptive interventions. In the regression below, we'll use data only from participants who are consistent with one of the two AIs we're comparing.

### <a name="task-3"></a> Task 3: Create indicator variables for consistency with the AIs under study
To perform this single regression to compare mean outcomes under (JASP+EMT+SGD, INTENSIFY) and (JASP+EMT, Add SGD), we need to create indicator variables for whether or not each child was consistent with the appropriate AI.

**Notice: we can identify children who were consistent with (JASP+EMT+SGD, INTENSIFY) using only their first-stage treatment!**

In [None]:
# Create indicator z2 for consistency with (JASP+EMT, Add SGD)
## Give everyone -1 to start with (not consistent)
aut$z2 <- -1
## Change indicator to 1 if consistent
aut$z2[aut$a1 == 1 & r == 1] <- 1
aut$z2[aut$a1 == 1 & r == 0 & a2 == 1] <- 1

# Create indicator z3 for consistency with (JASP+EMT+SGD, INTENSIFY)
## Give everyone -1 to start with (not consistent)
aut$z3 <- -1
## Change indicator to 1 if consistent
# aut$z3[________] <- 1

table(aut$z3)
table(aut$a1)

When you've filled in the above blank, keep your cursor in the cell and press `SHIFT` + `ENTER` to run the code. If you've done this correctly, you