# Assess Intervention Efficacy

## Step 6 - Methods for Efficacy in Engagement Profiles
As the EFFIP trial was historical, the SAP was already written prior to this proposed framework. However, an example of the text that would have been included for each of the two Estimand approaches is described below.

#### 1) Intercurrent Event Definition
The identified engagement profile will be converted into a series of active user definitions that can then be used in the Estimand framework. The active user definitions cannot be specified in advance as the engagement profile will not be identified until after the trial has completed. To determine the most appropriate set of active user definitions, the engagement profiles identified will be explored visually across the set of engagement indicator variables used. These visualisations will identify whether the groups follow a monotonic trend or not. If there is a monotonic trend, the binary active user definitions will be defined using thresholds, whereas if there is not a monotonic trend the groups will be analysed separately each in their own active user definition. To visualise indicator variables that summarise total activity across the identified engagement profiles, boxplots, histograms and two-way scatters will be generated. Indicator variables to describe the total intervention retention will be visualised using Kaplan-Meier plots.


#### 2) Principal Stratum Estimand
For the Principal Stratum Estimand the trial population is divided into latent sub-populations (principal strata) based on values of the intercurrent event of interest. These latent sub-populations are generated using the active user definitions, the attributes of this Estimand for the EFFIP trial are in below.

*Table 1* - Estimand Attributes for the Princiapl Stratum estimand. 
| Estimand Attribute                | Description | 
|:--------------------------------- |:------------------------------- |
| Population                        | Carers of individuals affected by psychosis |
| Intervention                      | COPe-Support: A multi-component support intervention compared to a non-interactive web-based information resource as the control |
| Outcome                           | Mental Wellbeing using the WEMWBS questionnaire at 20 weeks post-randomisation |
| Intercurrent Event                | The selected active user definition |
| Strategies for Intercurrent Event | **Principal Strata – Estimates only from those meeting specified active user definition with post-baseline data** |
| Summary Measure                   | Adjusted mean difference at 20 weeks post randomisation |

This Estimand would address the below research question:
- Is the COPe-Support intervention superior to the control arm for improving wellbeing in those that would adhere to the intervention as specified in the active user definition?

The principal stratum Estimand will be estimated using complier average causal effect (CACE) analysis to assess the intervention effect for the specified principal strata of the active user definition. The CACE analysis is used as the intercurrent event is a post-randomisation variable and so is endogenous to the outcome. So instead, this is estimated using random intervention allocation as this is an exogenous (instrumental) variable. The statistical model for the CACE analysis will be a two-stage least squares regression (2SLS), the first stage regression, equation (1), will be a linear regression where the endogenous variable, so the active user definition, is estimated for both intervention arms using randomised allocation, the instrument variable. The active user definition is not observed in the control arm, so this is estimated using baseline characteristic variables associated with group allocation. These variables will be determined using a data-driven approach through the group lasso, and variables selected will be those in the group lasso with the smallest mean squared error. In the second stage of the 2SLS, equation (2), the estimated values of the first stage will be used instead of the observed values of the endogenous variable, the active user definition. These will then be used to estimate the intervention efficacy for those who met the active user definition. For COPe-Support the second stage regression will use a linear mixed model.

In both stages of the regression the models will include the randomisation stratification variables gender, and include the participants baseline score, parent status (Y/N) and living with the cared-for individual status (Y/N). In the second stage only, the model will also include the estimated values from the first stage, time as a categorical variable (10, 20, 40 weeks) and an interaction between the time variable and the estimated values from the first stage. Additionally, as the second stage regression is a mixed model, this will also include the participant identifier and cohort as hierarchical random intercepts.


#### 3) Hypothetical Estimand
Under the hypothetical approach the analysis investigates the hypothetical scenario where all individuals had met the selected definition of an active user. 

| Estimand Attribute                | Description | 
|:--------------------------------- |:------------------------------- |
| Population                        | Carers of individuals affected by psychosis |
| Intervention                      | COPe-Support: A multi-component support intervention compared to a non-interactive web-based information resource as the control |
| Outcome                           | Mental Wellbeing using the WEMWBS questionnaire at 20 weeks post-randomisation |
| Intercurrent Event                | The selected active user definition  |
| Strategies for Intercurrent Event | **Hypothetical – Estimates for all participants under the assumption all in the COPe-Support arm with post-baseline data met the specified active user definition** |
| Summary Measure                   | Adjusted mean difference at 20 weeks post randomisation |

This Estimand would address the below research question:
 - Is the COPe-Support superior to the control arm for improving wellbeing, if everyone had met the active user definition specified?

To implement the hypothetical Estimand, the primary outcome values for individuals in the COPe-Support arm who meet the specified active user definition will be kept, and post-baseline primary outcome values for all other indvidiuals in the COPe-Support arm will be deleted. These missing observations will then be imputed using multiple imputation, performed separately by arm, to estimate the intervention efficacy under this hypothetical scenario. Although participants with missing values for the primary outcome post-baseline are missing for different reasons (i.e. lost to follow-up), the mixed linear model used will implicitly imputes these, so instead these values will also be explicitly imputed instead. In the control arm, no data will be deleted and no other missing data will will be imputed. 

To impute the data variables in the primary analysis model, cohort, parent status, living with cared for individual, gender and baseline WEMWBS scores will be used. Additionally, the imputation model will include auxiliary baseline variables that were associated with the active user status for individuals in the intervention arm. These auxiliary variables will be determined using a data-driven approach through the group lasso, and variables selected will be those in the group lasso with the smallest mean squared error. For COPe-Support there was 3 post-baseline timepoints, 10 weeks, 20 weeks and 40 weeks. Values of the primary outcome at these timepoints were imputed using chained equations, i.e., imputed values at 10 weeks were used in the imputation of primary outcome values at 20 weeks and so on. In total 100 imputed datasets were generated, and an estimate of the intervention efficacy was generated using a repeated measures with model covariates for baseline score, post-baseline time point, intervention, a post-baseline time point and intervention interaction, gender, parent (Y/N) and living with the cared-for individual (Y/N) as fixed effects with cohort and subject as random intercepts. Estimates from imputed datasets will be combined using Rubin's rules. 

