In [1]:
suppressMessages(capture.output(library(dplyr)))
suppressMessages(capture.output(library(magrittr)))
suppressMessages(data <- read.csv("data/synthetic_data.csv"))

# SYNOPSIS OF STUDY DESIGN AND PROCEDURES

## Analysis Considerations

### Types of analysis

We will compare our binary primary outcome; **entry to the smoking cessation service**,
between the intervention and control groups. Thus, the proportion of people entering the
smoking cessation service (over a period of 6 months from the receipt of the invitation letter)
will be reported along with the difference between the intervention and control groups
together with a 95% confidence interval. This will be our primary result.


In [2]:
# proportion entering
prop <- mean(data$enter)

cat("Proportion of people entering the smoking cessation service:", prop)

Proportion of people entering the smoking cessation service: 0.5056

In [3]:
# difference in means
avgt <- data %>% filter(treated == 1) %>% summarize(avg = mean(enter))
avgc <- data %>% filter(treated == 0) %>% summarize(avg = mean(enter))
diff <- as.numeric(avgt - avgc)

# standard error
st <- data %>% filter(treated == 1) %>% summarize(var = var(enter)) %>% as.numeric()
sc <- data %>% filter(treated == 0) %>% summarize(var = var(enter)) %>% as.numeric()

n_t <- sum(data$treated)
n <- dim(data)[1]
se <- sqrt(st / n_t + sc / (n - n_t))

cat("Difference in means with 95% CI:", diff, "±", 1.96 * se)

Difference in means with 95% CI: 0.009333333 ± 0.02829432

However, adjustment
for baseline covariates is often advised, firstly to correct for any chance imbalances in
important baseline variables following randomisation, and secondly, because adjusting for
highly important baseline variables in an RCT can improve the precision of treatment effect
estimates even when the outcome measure is binary. 

Statistical testing for baseline
imbalances is not advised and instead key covariates should be selected prior to analysis
based on the likely magnitude of the association with the outcome measure (European
Agency for the Evaluation of Medicinal Products, 2003). 

We will therefore also perform a
multivariable logistic regression to take into account any imbalance that may occur in
important baseline characteristics known to predict smoking cessation outcomes between
the groups:
- gender
- age
- dependence score (cigs per day+time from waking)
- intention to quit
- determination to quit
- longest previous quit
- live with smokers
- deprivation (IMD score)
- previous NHS SSS attendance


Odds ratios will be quoted together with their 95% confidence intervals and exact P-values. 

In [4]:
## compute dependence scores as in section 2.1.2
data <- data %>% mutate(cigscore = ifelse(cigsperday <= 10, 0,
                                         ifelse(cigsperday <= 20, 1,
                                               ifelse(cigsperday <= 30, 2,
                                                     3))),
                       wakescore = ifelse(timefromwaking > 60, 0, 
                                         ifelse(timefromwaking >30, 1,
                                               ifelse(timefromwaking > 5, 2, 3)
                                               )
                                         ),
                       depscore = cigscore + wakescore)


In [5]:
## fit multivariate logistic regression for covariate adjustment
fit1 <- glm(enter ~ treated +gender + age + depscore + intquit + 
           determquit + livesmoke + imd + prev,
          family = binomial,
          data)

In [6]:
# compute estimates for odds ratios and 95% CIs
suppressMessages(ci1 <- exp(confint(fit1)))
oddsr1 <- data.frame(list(names(coef(fit1)), exp(coef(fit1)),
                         ci1[,1], ci1[,2]))
names(oddsr1) <- c("coefficients", "estimate", "2.5%", "97.5%")
oddsr1

Unnamed: 0,coefficients,estimate,2.5%,97.5%
(Intercept),(Intercept),0.530444,0.1660724,1.692652
treated,treated,1.0376879,0.9264529,1.162293
gender,gender,0.9737083,0.8713133,1.088122
age,age,0.9975362,0.9919944,1.003105
depscore,depscore,0.9808867,0.8934839,1.076822
intquit,intquit,1.00216,0.9537233,1.053059
determquit,determquit,1.0023492,0.9636502,1.042604
livesmoke,livesmoke,0.9561738,0.8556823,1.068439
imd,imd,1.0004137,0.9998625,1.000966
prev,prev,0.9544624,0.8541433,1.066537


We will account for the therapist effect (see section 1.10 above), by including the allocated
taster session in our model as a random effect nested within the SSS cluster. We have
chosen to nest within SSS rather than practice as the therapists were SSS rather than
practice based.
For the analysis of the 7-day point prevalence abstinence at the 6-month follow-up we
will follow the same plan as described above.
If cessation shows an effect without attendance then we will examine differences in the
pattern of characteristics within each arm.

In [7]:
output_table <- data.frame(list("Cigs per day (Baseline questionnaire Qs A4/6)"=c("5", "6 to 10", "11 to 20", "21 to 30", ">30"), 
                                "Score"=c(0, 0, 1, 2, 3),
                                "Time from waking (baseline questionnaire Qs B2)"=c(">2hrs", "1-2hrs","31-60 mins", "<30 mins", "<5 mins"), 
                                "Score"=c(0,0,1,2,3)))
names(output_table) <- c("Cigs per day",
                         "Score",
                        "Time from waking", "Score")

#### Unit of analysis considerations
In the multivariable analysis we will use following categorisation for the covariate analyses:

- Gender (binary): Baseline questionnaire D4 - male/female
- Age in years (continuous): Baseline questionnaire D6
- Dependence score (continuous score 0-6):

In [8]:
output_table

Cigs per day,Score,Time from waking,Score.1
5,0,>2hrs,0
6 to 10,0,1-2hrs,0
11 to 20,1,31-60 mins,1
21 to 30,2,<30 mins,2
>30,3,<5 mins,3


- Intention to quit (categorical): Baseline questionnaire B4: “Are you planning to quit:
within 2weeks/30 days/ 6 months/ not within 6 months?”
- Determination to quit: Baseline questionnaire B9 “How determined are you to quit for
good?” Likert scale 1 to 5
- Longest previous quit (categorical): Baseline questionnaire B3: “What is the longest you
have ever quit smoking for?” less than 24 hrs/1-6 days/1-4 weeks/> 1 month
- Live with other smokers (binary): Baseline questionnaire D2 yes/no
- Deprivation (measured by IMD score) (continuous)
- Previous NHS SSS attendance (binary): Baseline questionnaire B7 ‘Have you attended
an NHS SSS ----?’ yes/no

#### Effect modification and sub group analyses
In order to assess whether the intervention is any more effective for any particular subgroup
of smokers we will explore if there is an interaction between treatment and gender, treatment
and age, and treatment and deprivation. This will be carried out for the primary outcome
(attendance) and 7-day point prevalent abstinence at the 6-month follow-up.


In [9]:
## fit multivariate logistic regression with interactions
fit2 <- glm(enter ~ treated + treated * gender + treated * depscore + 
           gender + age + depscore + intquit + 
           determquit + livesmoke + imd + prev,
          family = binomial,
          data)

In [10]:
# compute estimates for odds ratios and 95% CIs
suppressMessages(ci2 <- exp(confint(fit2)))
oddsr2 <- data.frame(list(names(coef(fit2)), exp(coef(fit2)),
                         ci2[,1], ci2[,2]))
names(oddsr2) <- c("coefficients", "estimate", "2.5%", "97.5%")
oddsr2

Unnamed: 0,coefficients,estimate,2.5%,97.5%
(Intercept),(Intercept),0.533444,0.1608458,1.767559
treated,treated,1.0232154,0.6172014,1.69619
gender,gender,0.9312349,0.7812966,1.109846
depscore,depscore,0.986209,0.8490991,1.145416
age,age,0.9975242,0.9919809,1.003094
intquit,intquit,1.0015919,0.9531518,1.052495
determquit,determquit,1.0023544,0.9636527,1.042612
livesmoke,livesmoke,0.9565514,0.8560103,1.068874
imd,imd,1.000416,0.9998647,1.000968
prev,prev,0.9545049,0.8541621,1.066608


#### Timing of analyses
Preliminary analyses will be done in January 2014. The final analysis will be done in April
2015.