### Simpson's Paradox in Experimental Data

#### Introduction

Simpson's Paradox is a phenomenon in which a trend that appears in groups of data disappears or is reversed when the groups are aggregated. One of the most famous examples of this paradox is a study that examined gender bias in the graduate school admissions to UC Berkeley. The following simplified example is taken from Udacity's A/B Testing Course:


|     Department   | Men applied | Women applied | Men accepted | Women accepted |
|--------------|-------------|---------------|--------------|----------------|
| Department A | 825         | 108           | 512 (62%)    | 89 (82%)       |
| Department B | 417         | 375           | 137 (33%)    | 132 (35%)      |
| Total        | 1242        | 483           | 649 (52%)    | 221 (46%)      |



In this example, we can see that the acceptance rate for women is higher in both departments A and B. However when the data is aggregated across both departments, the acceptance rate for women is lower - the trend has reversed. Why is this? The majority of women applied to department B which has a lower acceptance rate thus pulling down the overall rate for women. The differing proportions of men and women in each department is in this case a confounding variable. 

#### Simpson's Paradox in Experimental Data

Recently I reviewed Udacity's A/B Testing Course. Simpson's Paradox was briefly mentioned in the context of A/B Tests and the following two reasons were given for its possible appearance: 
> How can it happen?
1. Something went wrong in the experiment setup with your randomisation function
2. Change affects new users and experienced users differently

The first reason is clear. Simpson's Paradox appears in data due to some confounding variable; in an experiment, your aim is to randomise all confounding variables across each group in your experiment such that the effect of these confounders gets washed out. The observation of Simpson's Paradox in experimental data would therefore suggest that something went wrong with your randomisation function.

The second reason points at something else; it suggests that this phenomenon could still appear even if your randomisation function has worked correctly. Below is a mock dataset that shows how this can happen.

**Our control**

| User        | Number of Cookies | Sessions per Cookie | Sessions  | Conversions | Click through Rate |
|-------------|-------------------|---------------------|-----------|-------------|--------------------|
| New         | 100,000           | 1                   | 100,000   | 20,000      | 20.0%              |
| Returning   | 100,000           | 1.5                 | 150,000   | 20,000      | 13.3%              |
| Total       | 200,000           | -                   | 250,000   | 40,000      | 16.0%              |

**Our variant**

| User        | Number of Cookies | Sessions per Cookie | Sessions  | Conversions | Click through Rate |
|-------------|-------------------|---------------------|-----------|-------------|--------------------|
| New         | 100,000           | 1                   | 100,000   | 21,000      | 21.0%              |
| Returning   | 100,000           | 2.95                | 295,000   | 40,000      | 13.6%              |
| Total       | 200,000           | -                   | 395,000   | 61,000      | 15.4%              |

**Combining both tables we have:**


| User        | CTR Control | CTR Variant | Winner? |
|-------------|-------------|-------------|---------|
| New         | 20.0%       | 21.0%       | Variant |
| Returning   | 13.3%       | 13.6%       | Variant |
| Overall     | 16.0%       | 15.4%       | Control |



In this example, something in the variant causes returning users to return for 2.95 sessions during the experiment compared to 1.5 for returning users in the control. As we can see from the number of cookies column, the randomisation function has worked correctly. 

It's worth noting that if we had the same unit of diversion and unit of analysis, then this would be impossible. Simpson's Paradox, in this case, would imply that the randomisation function did not work correctly.

#### Wrap-up

There you have it. Whilst Simpson's Paradox most commonly occurs in observational data, it can also appear in experimental data. The presence of a confounding variable (causing Simpson's Paradox) would initially suggest that something had gone wrong with the randomisation function, but we have also seen an example where this phenomenon appears even when the randomisation function is working. One possible scenario is when a change affects new users and experienced users differently.