# Causal Inference Learning

## Randomized Controlled Tests (RCTs)
- also called as **A/B testing**
- in e-commerce test, email is sent around hoping that it will help increase purchase conversion
- but dont know if these email will help or hurt
- test this using RCTs
- 5 steps
  - select users to participate using some uniform criteria
  - split users into 2 groups evenly
  - one group send email(Treatment group) and not to the other(Control group)
  - Monitor purchase conversion for each user over time
  - Make decision based on test results

**Why RCTs**
- Users are selected randomly to be part of control and treatment group
- because the only difference between these two groups is that one receives email and other doesn't
- by controlling the effects of other variables through randomization, experiment results gives confidence
- randomized tests can also be used for inferring causality

**Scenarios when RCTs cannot be run**
- setting up experiment is not possible
  - for example: efficacy of billboard ads 
- experiment takes too long
  - make inferences based on historically observed data
    - but make sure that other factors dont affect the observations

## Challenges to Causal Inferencing
- 3 main challenges
  - **Confounders**
    - before releasing a new medical product, clinical trial is needed
    - such test is possible
    - one group of users - treatment group is given the new product and a placebo or no treatment to the control group
    - if treatment group has recovered lot more than control group, it implies that new product works
    - but the average age of treatment group is 25 compared to control group average age of 70
      - so this doesnot give definitive conclusion of the test
    - **age is a confounding variable**
    - age variable is not controlled and can have a causal effect 
    - confounding variable is a challenge in causal inferencing that uses prior data
  - **Selection Bias**
    - the chosen group is not a good representation of all users in the population
  - **Counterfactuals**
    - what would have happened if person not received the new medication
    - conterfactuals must be tested for each individual, while conducting the test
    - so that apples-to-apples comparison is done
    - strategies - for example: matching

## Causal Graphs and assumptions
**why assumptions**
- prior data is tailored to make it representative as a randomized control test as much as possible
- there will always be some confounders which will have some weird and unintended effects on the outcome, that cannot be controlled

**What are the assumptions**
- **Causal Markov Condition (Markov Assumption)**
  - Causal graphs are need for analysis
    - Causal graphs are graphs with directed graphs that show causation
  - If convoluted, to simplify the causal graph to be a directed acyclic graph (DAG), 
    - Confounders have an effect on Treatment and Outcome (C->T, C->O)
    - Treatment has an effect on Outcome (T->O)
- **SUTVA (Stable Unit Treatment Value Assumption)**
  - a sample in the control group doesn't affect the samples in the treatment group
  - this assumption is required to prevent any interaction effect
  - people who receive the new medicine will not influence the people who dont receive it
- **Ignorability**  
  - there is no additional confounders that has an effect on the treatmnet and the output
  - this is important, as even if treatment group does better, the cause cannot be assured of
- https://stats.stackexchange.com/questions/474616/strong-ignorability-confusion-on-the-relationship-between-outcomes-and-treatmen

- **Measuring Average Treatment Effect**
  - Does the new medicine make people feel better
    - take users as a column
    - treatment outcome column - whether belonged to treatment group or not
    - control outcome column - whether belonged to control group or not
    - calculate Mean(Treatment) = $\frac{\text{Count of people within treatment group who improved}}{\text{Total count of people belonging to treatment group}}$
    - calculate Mean(Decision) = $\frac{\text{Count of people within control group who improved}}{\text{Total count of people belonging to control group}}$
    - calculate Effect = Difference of the two means
      - this tells us if the impact is positive or negative
    - to test confounding effects
      - calculate Mean(Age | Treatment group if impact was positive)
      - calculate Mean(Age | Control group if impact was positive)
        - if the average age of treatment group people who improved, is substantially higher than of the average age of control group people who improved, that implies age is a confounding variable
    - to solve for this problem, **determine the counterfactuals for every person taken into account**
    - could the group of people who took medicine got improvement without medicine and could the group of people who did not take medicine and got better, got improvement with medicine
    - this can be done using "matching"
    - find people of the same age who receive the other treatment and use that as counterfactual estimate
    - the other technique is machine learning, where a model is built that takes input as confounders(age) and treatment and then predicts the output, train it on factual data and try to predict the counterfactuals

## Example Scenario for Causal Inferencing

**Before applying counterfactuals**

| Person     | Treatment outcome   | Control outcome   | 
|:----------:|:-------------------:|:-----------------:|
|Ajay(26)    |1                    |                   |
|Sam(24)     |                     |1                  |
|Less(48)    |0                    |                   |
|Sid(35)     |                     |1                  |
|Clay(25)    |                     |0                  |
|Rhode(39)   |                     |0                  |
|Clyde(51)   |1                    |                   |
|Rondo(24)   |0                    |                   |
|Chrom(67)   |1                    |                   |
|Don(34)     |                     |0                  |

$$\text{Mean(Treatment)} = \frac{1+0+1+0+1}{5} = +0.6$$
$$\text{Mean(Control)} = \frac{1+1+0+0+0}{5} = +0.4$$
$$\text{Effect} = +0.2$$
$$\text{Mean(Age|Treatment)} = \frac{26+51+67}{3} = 48$$
$$\text{Mean(Age|Control)} = \frac{24+35}{2} = 29.5$$

- this proves that age is a confounding variable

**After applying counterfactuals**

| Person     | Treatment outcome   | Control outcome   | Individual treatment effect (ITE) |
|:----------:|:-------------------:|:-----------------:|:---:|
|Ajay(26)    |1                    |$\color{red}{\text{1}}$                  |0 |
|Sam(24)     |$\color{red}{\text{0}}$                  |1                  |-1 |
|Less(48)    |0                    |$\color{red}{\text{0}}$                  |0 |
|Sid(35)     |$\color{red}{\text{1}}$                    |1                  |0 |
|Clay(25)    |$\color{red}{\text{0}}$                    |0                  |0 |
|Rhode(39)   |$\color{red}{\text{1}}$                    |0                  |1 |
|Clyde(51)   |1                    |$\color{red}{\text{0}}$                  |1 |
|Rondo(24)   |0                    |$\color{red}{\text{1}}$                  |-1 |
|Chrom(67)   |1                    |$\color{red}{\text{1}}$                  |0 |
|Don(34)     |$\color{red}{\text{1}}$                    |0                  |1 |
    
- Average Treatment Effect **(ATE)** - this is calculated for every single individual
$$\text{ATE} = \frac{\text{Sum of ITE}}{\text{10}} = +0.1$$
- it helps conclude that the medicine does indeed help even when accouting for age
- Can we conclude that "give the medicine to everyone who is sick?" - Does this statement hold true?

- Average treatment effect which is conditioned on age or Conditional Average treatment effect **(CATE)**
  - this will help answer, how does the medicine affect people over/under the age of 35
$$\text{CATE(age >= 35)} = \frac{0+0+1+1+0}{5} = +0.4$$
$$\text{CATE(age <  35)} = \frac{0-1+0-1+1}{5} = -0.2$$

  - this demonstrates that the treatment affects different age group differently
  - this is called **Treatment Heterogeneity**
  - it can be concluded that **this medicine does help older patients get better from the flu, but doesnot have a positive effect on younger people**
  - prescribe the medicine to older people only who have the flu

