## 16.2 Estimating the effect of treatment

In the kidney stones example the effect of treatment type on the outcome is confounded by stone size, because smaller stone size is associated both with an increased likelihood of getting lithotripsy rather than surgery, and with greater chance of a good outcome. In this situation we know that to discover the real ‘effect’ of treatment type on the outcome we have to control for stone size. But what exactly do we mean by the ‘treatment effect’? There are different ways of measuring the treatment effect and we could be interested in a marginal treatment effect or a conditional treatment effect.

To define the treatment effect we introduce some new notation.

### 16.2.1 Seeing’ versus ‘doing’

To discuss what we mean by a treatment effect let’s first consider the difference between an observational study and a randomized controlled trial. The study of kidney stone treatment described above is observational, so we ‘see’ which treatment individuals in the study population received in the ‘real world’. By contrast, if we had instead performed a randomized trial to investigate the treatment effect we would intervene on the treatment to be assigned (through the randomization). Thus in an observational study we ‘see’ the treatment X, whereas in a randomized trial we ‘do’ the treatment X. A treatment effect is defined in terms of what the difference in the distribution of the outcome would have been under the two treatments. Suppose that instead of ‘seeing’ X in the kidney stones study we had instead been able to ‘do’ X. We can then imagine the hypothetical situations in which everyone in the study population had been given surgery, denoted do(X = 0), or in which everyone had been given lithotripsy, denoted do(X = 1). The ‘do’ notation was introduced by Pearl (1995). Accessible introductions to this concept are given in the books by Pearl, Glymour and Jewell (2016) and Pearl & Mackenzie (2018).

The treatment effect can be defined in terms of the difference in the probability of the outcome
under the two hypothetical interventions


<center>$Pr(Y = 1|do(X = 1)) − Pr(Y = 1|do(X = 0))$<center>
<div style="text-align: right"> (1) </div>

This is referred to as the Average Causal Effect (ACE). It is a ‘marginal’ treatment effect because it is ‘marginalized’ or ‘averaged’ over the distribution of Z in the study population. Here the treatment effect is measured using a difference in probabilities (or risk difference). Alternatively we could measure the treatment effect using a ratio of probabilities (risk ratio)

    
<center>$\frac{Pr(Y = 1|do(X = 1))}{Pr(Y = 1|do(X = 0)) }$<center>
<div style="text-align: right"> (2) </div>
    
or using an odds ratio

<center>$\frac{Pr(Y = 1|do(X = 1)) Pr(Y = 0|do(X = 0))}{Pr(Y = 1|do(X = 0)) Pr(Y = 0|do(X = 1))}$<center>
<div style="text-align: right"> (3) </div>
    
The quantity that we use to measure the treatment effect is called the ‘estimand’ - meaning the
quantity we aim to estimate.
    
The question arises as to what population the marginal treatment effect refers to. It is the ‘population average treatment effect’, where the population is the study population. Therefore, in general the marginal treatment effect depends on the distribution of Z in the study population, because it is defined as averaged over Z. In a different study with a different distribution of Z the marginal treatment effect would be different.
    
We could also consider a treatment effect conditional on Z, which again can be quantified in terms
of a risk difference, risk ratio or odds ratio:
    
    
<center>$Pr(Y = 1|do(X = 1), Z = z) − Pr(Y = 1|do(X = 0), Z = z) $<center>
<div style="text-align: right"> (4) </div>
    
<center>$\frac{Pr(Y = 1|do(X = 1), Z = z)}{Pr(Y = 1|do(X = 0), Z = z)}$<center>
<div style="text-align: right"> (5) </div>
    
<center>$\frac{Pr(Y = 1|do(X = 1), Z = z) Pr(Y = 0|do(X = 0), Z = z)}{Pr(Y = 1|do(X = 0), Z = z) Pr(Y = 0|do(X = 1), Z = z)}$<center>
<div style="text-align: right"> (6) </div>
    
We can never observe the outcome Y under the hypothetical situations in which all individuals in the population of interest received treatment X = 0 and in which all individuals received treatment X = 1, because in reality each individual can only receive one treatment at a given time. However, a randomized trial mimics this scenario through randomization, which makes the groups of individuals in the two treatment groups comparable. In a randomized trial we ‘do’ X on two comparable groups of individuals and so the probabilities Pr(Y = 1|do(X = x)) (x = 0, 1) can be estimated directly from the data on Y and X because Pr(Y = 1|do(X = x)) = Pr(Y = 1|X = x) (x = 0, 1). In the observational study Pr(Y = 1|do(X = x)) $\neq$  Pr(Y = 1|X = x) due to the confounding by Z.
    
In the observational study in this example, Z is the only confounder, and therefore after conditioning on Z we have Pr(Y = 1|do(X = 1), Z = z) = Pr(Y = 1|X = 1, Z = z). This means that conditional treatment effect (conditional on Z) can be estimated from the observational data usingthe observed probabilities Pr(Y = 1|X = 1, Z = z), because after conditioning on Z there is no confounding of the association between X and Y.
   

### 16.2.2 Standardization

We have seen above how conditional treatment effects can be estimated from the observational data where there is confounding. Now let’s consider how we could estimate a marginal treatment effect, as defined in 1, 2 and 3. As noted above, the probabilities Pr(Y = 1|do(X = x)) cannot be estimated using Pr(Y = 1|X = x) from the observational data due to the confounding by Z.

Using the law of total probability the probability Pr(Y = 1|do(X = x)) can be written as

<center>$Pr(Y = 1|do(X = x)) = \sum \limits _{z=0,1} Pr(Y = 1|do(X = x), Z = z) Pr(Z = z|do(X = x)).$
<center>
<div style="text-align: right"> (7) </div>
    
In the previous section we argued that the conditional probabilities Pr(Y = 1|do(X = 1), Z = z) can be estimated easily from the observational data because Pr(Y = 1|do(X = 1), Z = z) = Pr(Y = 1|X = 1, Z = z). Now let’s think about the term Pr(Z = z|do(X = 1)). It helps to look at the DAG in Figure 1 for this part. If we intervene on X, as implied by do(X = 1), then this has no impact on Z because Z comes before X, i.e. it is not downstream from X. It follows that Pr(Z = z|do(X = 1)) = Pr(Z = z). Using these results we can write the above expression as

<center>$Pr(Y = 1|do(X = x)) = \sum \limits _{z=0,1} Pr(Y = 1|X = x, Z = z) Pr(Z = z)$
<center>
<div style="text-align: right"> (8) </div>
    
The probabilities Pr(Y = 1|X = 1, Z = z) and Pr(Z = z) can both be estimated from theobservational data and hence we can estimate the marginal treatment effects. This technique of expressing Pr(Y = 1|do(X = 1)) in terms of the conditional probabilities Pr(Y = 1|X = 1, Z = z) and Pr(Z = z) is called ‘standardization’. It is an example of the use of weighted averaging. Standardization is also referred to as ‘marginalizing’ or ‘averaging’ over Z and it has a long history of use in epidemiology (e.g. see Rothman, Greenland & Lash 2008).
    
In the practical we will apply these methods to the Kidney data.

### 16.2.3 Estimating the treatment effect using logistic regression

So far we have not used any regression models. The probabilities involved in estimating treatment effects can of course be estimated by applying logistic regression to our data. Suppose the data were provided in individual-level format instead of in grouped format (Table 1). An illustration of how the data may appear in individual format is shown in Table 3.

The conditional probabilities Pr(Y = 1|X = x, Z = z) can be estimated using the logistic regression model

<center>$Pr(Y = 1|X = x, Z = z) = \frac{exp(\beta_{0} + \beta_{X}X + \beta_{Z}Z + \beta_{XZ}XZ)}{1 + exp(\beta_{0} + \beta_{X}X + \beta_{Z}Z + \beta_{XZ}XZ)}$
<center>
<div style="text-align: right"> (9) </div>
    
This model includes an interaction term between X and Z. A model without the interaction terms could be used, which would make the assumption that the effect of treatment X on the outcome Z is not modified by Z. In our example, this would be the assumption that the impact of lithotripsy on the outcome is the same inpatients with large stone size or small stone size. 
    
In R, after fitting a logistic regression model using ‘glm’ one can obtain estimates of the probabilities Pr(Y = 1|X = x, Z = z) using ‘predict’. The ‘margins’ command can be used to obtain marginal treatment effect estimates. We will try this out in the practical. 

Table 3: Example of individual level data for the kidney example for 10 individuals. Y : outcome (0 success, 1 failure), X: treatment (0 surgery, 1 lithotripsy), Z stone size (0 small, 1 large)
    
| ID | Y | X | Z |
| ---: | :--- | :--- | :--- |
    |1|1|0|0|
    |2|1|1|0|
    |3|0|1|0|
    |4|1|0|1|
    |5|0|1|1|
    |6|0|1|0|
    |7|0|1|1|
    |8|1|0|1|
    |9|0|0|0|
    |10|1|0|1|

Confidence intervals for conditional treatment effect estimates can be estimated analytically in some cases. Confidence intervals for marginal treatment effect estimates obtained using standardization can be obtained either using approximations or using bootstrapping.
