# Design Stage

## Step 2 - Derive the Engagement Indicator Variables
In this second step of the design phase of the framework, where the engagement measures are asssessed, the measures captured are “information rich” so there are typically multiple indicator variables that can be derived alongside the measure itself. When deriving the engagement indicator variables, each engagement indicator should precisely describe a single aspect of an individual’s engagement to the intervention.  

In this case study, trial has already been completed so the data is already collected. So in this notebook, I am showing how the indicator variables were derived. 

In [1]:
qui do Folders // Load Folder Global Macros
qui cd "$RAW"

use weekly, clear
sort ID week

In [2]:
%head

Unnamed: 0,ID,activated,cohort,week,wkpv,wkpost,ate,ptp,totrow
1,1,1,1,1,1,0,0,0,18
2,1,1,1,2,0,0,0,0,18
3,1,1,1,3,0,0,0,0,18
4,1,1,1,4,0,0,0,0,18
5,1,1,1,5,0,0,0,0,18


#### Deriving First to Last Variable
Pageviews, posts to the peer-to-peer forum, and posts to the ask-the-expert forum are all measured weekly. Here I am deriving the total time between the first and last week where each variable was reported. 

#### Deriving Weekly Variability
For the variables measured weekly, the variability in their values from week to week are generated. The data is in long format, so using Stata's collapse command and group by participant all data for these variables are summarised by their SD. 

In [3]:
* Generate binary indicator variable for each variable when not 0 and not missing for each week
gen ftl_wkpv = (wkpv > 0 & wkpv < .)
gen ftl_ptp = (ptp > 0 & ptp < . )
gen ftl_ate = (ate > 0 & ate < . )

tempname wkpv ptp ate
local inds `" wkpv ptp ate "' // store variables in a string of words to access in loop 
local n : word count `inds'
* Loop through all three variables
forvalues i = 1/`n' {
	preserve
	local ind : word `i' of `inds' // in the macro "ind" select the ith word 
	keep if ftl_`ind' == 1 // Keep data only if
	bysort ID: egen act_`ind' = sum(ftl_`ind') // sum all non-zero values of the binary indicator (needed to derive intensity variable)
	sort ID week
	bysort ID: keep if _n == 1 | _n == _N
	bysort ID: gen first = week[1]
	bysort ID: gen last  = week[_N]
	gen dur_`ind' = (last - first) + 1
	egen pickone = tag(ID) 
	keep if pickone
	keep ID dur_`ind' act_`ind'
	save ``ind'' , replace
	restore
}

/* --- Deriving Variability in Pageviews, PtP and AtE --- */
tempname variation
collapse (sd) sdwkpv = wkpv (sd) sdptp = ptp (sd) sdate = ate , by(ID)
save `variation', replace

(2,812 observations deleted)
(556 observations deleted)
(133 observations deleted)
file __000001.dta saved
(3,571 observations deleted)
(30 observations deleted)
(28 observations deleted)
file __000002.dta saved
(3,545 observations deleted)
(49 observations deleted)
(30 observations deleted)
file __000003.dta saved
file __000004.dta saved


#### Coallating Total Variables
Total variables were already available in the engagement measures collected in the COPe-Support intervention. These variables are imported, then the derived variables that were saved in temporary filenames are then merged on using participant ID. 

In [4]:
use summary , clear
order ID cohort , first

* Merge Duration (First to Last to Dataset)
merge m:1 ID using `wkpv' , nogen keep(1 3)
merge m:1 ID using `ptp' , nogen keep(1 3)  
merge m:1 ID using `ate' , nogen keep(1 3)

* Merge on Variability
merge 1:1 ID using `variation', nogen keep(1 3)


    Result                      Number of obs
    -----------------------------------------
    Not matched                            33
        from master                        33  
        from using                          0  

    Matched                               171  
    -----------------------------------------

    Result                      Number of obs
    -----------------------------------------
    Not matched                           161
        from master                       161  
        from using                          0  

    Matched                                43  
    -----------------------------------------

    Result                      Number of obs
    -----------------------------------------
    Not matched                           156
        from master                       156  
        from using                          0  

    Matched                                48  
    -----------------------------------------

    Resul

In [5]:
ds // Review Variable Names

ID         totalmins  pageviews  ate        dur_wkpv   act_ate    sdptp
cohort     logindays  posts      totaldays  act_ptp    dur_ate    sdate
activated  loginwks   ptp        act_wkpv   dur_ptp    sdwkpv


#### Deriving Intensity Variables

In [6]:
gen rate_wkpv = act_wkpv / dur_wkpv 
gen rate_ptp = act_ptp / dur_ptp
gen rate_ate = act_ate / dur_ate

(33 missing values generated)
(161 missing values generated)
(156 missing values generated)


#### Registation & Activation
After discussions with the Chief Investigator, the engagement measure that was called "activated" was actually considered to be more a status indicator on whether a participant had registered for the intervention. So this variable was renamed to registered, and from this a new variable was derived to assess the activation status of participants. To be activated the participant must have registered for the intervention and recorded at least 1 of activity. 

In [7]:
rename activated registered
gen activated = (registered == 1 & totalmins > 0 & totalmins < . )
lab define yn 0 "No" 1 "Yes"
lab values activated yn
order activated , after(registered)

tab registered activated // Shows 1 participant registered but not activated


Activation |       activated
    Status |        No        Yes |     Total
-----------+----------------------+----------
        No |        29          0 |        29 
       Yes |         1        174 |       175 
-----------+----------------------+----------
     Total |        30        174 |       204 


#### Total Days
The intervention period was 18 weeks, some individuals continued to use the intervention after this period. For those individuals the total number of days using the intervention needs to be censored to 126 days which is the maximum time they could use the intervention.

In [8]:
/* Data only extracted for first 18 Weeks, but total days is extracted
	for the whole time participants used the interventions.  */
replace totaldays = 126 if totaldays > (18 * 7)  & totaldays < . 
sum totaldays

(20 real changes made)

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   totaldays |        181    66.49171     42.6976          1        126


#### Resolve Missing Data
For those who did not activate the data is missing for some varaibles. But because we know the reason for this missingness because they did not activate the intervention, so instead we need to change this to be 0 instead.

In [9]:
/* --- Replace Missing with 0's where appropriate --- */

* When not Activated
replace totalmins = 0 if activated == 0
replace pageviews = 0 if activated == 0
replace posts = 0 if activated == 0
replace ptp = 0 if activated == 0
replace ate = 0 if activated == 0
replace totaldays = 0 if activated == 0

* When other variables are 0
replace totalmins = 0 if pageviews == 0
replace logindays = 0 if pageviews == 0
replace loginwks = 0 if pageviews == 0
replace totaldays = 0 if pageviews == 0


foreach v of varlist rate_wkpv sdwkpv act_wkpv dur_wkpv  {
	replace `v' = 0 if totalmins == 0
}

foreach v of varlist rate_ptp sdptp act_ptp dur_ptp {
	replace `v' = 0 if ptp == 0
}

foreach v of varlist rate_ate sdate act_ate dur_ate {
	replace `v' = 0 if ate == 0
}

(30 real changes made)
(13 real changes made)
(12 real changes made)
(0 real changes made)
(0 real changes made)
(30 real changes made)
(2 real changes made)
(32 real changes made)
(32 real changes made)
(2 real changes made)
(32 real changes made)
(0 real changes made)
(32 real changes made)
(32 real changes made)
(161 real changes made)
(0 real changes made)
(161 real changes made)
(161 real changes made)
(156 real changes made)
(0 real changes made)
(156 real changes made)
(156 real changes made)


#### Labelling and Tidy Up

In [10]:
lab var registered "Registered to Intervention"
lab var activated "Activated (exposed) to Intervention"
lab var dur_wkpv "Pageviews - Weeks between first and last count > 0"
lab var dur_ptp "PtP - Weeks between first and last count > 0"
lab var dur_ate "AtE - Weeks between first and last count > 0"
lab var act_wkpv "Pageviews - Activity across Total Duration"
lab var act_ptp "PtP - Activity across Total Duration"
lab var act_ate "AtE - Activity across Total Duration"
lab var rate_wkpv "Pageviews - Proportion of Weeks Active"
lab var rate_ptp "PtP - Proportion of Weeks Active"
lab var rate_ate "AtE - Proportion of Weeks Active"
lab var sdwkpv "(Variation) SD of Weekly Pageviews"
lab var sdptp "(Variation) SD of Weekly PtP Posts"
lab var sdate "(Variation) SD of Weekly AtE Posts"

drop act_*
sort ID

In [11]:
describe


Contains data from summary.dta
 Observations:           204                  
    Variables:            21                  27 Aug 2025 16:25
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
ID              long    %10.0g                Participant Identifier
cohort          float   %9.0g                 Study Cohort
registered      float   %9.0g      status     Registered to Intervention
activated       float   %9.0g      yn         Activated (exposed) to
                                                Intervention
totalmins       double  %10.0g                Total activity (minutes)
logindays       byte    %10.0g                No of login on different days
loginwks        byte    %10.0g                No of weeks with login
pageviews       int     %10.0g                Pa

In [12]:
%head

Unnamed: 0,ID,cohort,registered,activated,totalmins,logindays,loginwks,pageviews,posts,ptp,ate,totaldays,dur_wkpv,dur_ptp,dur_ate,sdwkpv,sdptp,sdate,rate_wkpv,rate_ptp,rate_ate
1,1,1,1,1,3.0,1,1,1,0,0,0,18,1,0,0,0.23570226,0,0.0,1.0,0,0.0
2,4,6,1,1,176.91667,.,6,206,2,0,2,93,10,0,2,22.708439,0,0.32338083,0.6,0,1.0
3,8,2,1,1,4.95,1,1,4,0,0,0,10,1,0,0,0.94280905,0,0.0,1.0,0,0.0
4,11,3,1,1,243.75,24,12,403,5,0,5,94,14,0,11,37.23872,0,0.66911316,0.8571429,0,0.2727273
5,13,6,1,1,44.783333,.,2,308,0,0,0,21,3,0,0,41.356255,0,0.0,1.0,0,0.0
