<a href="http://agrum.org" target="blank"><img src="http://agrum.gitlab.io/theme/img/logoAgrum.png" align="left" style="height:100px"/></a><a href="https://agrum.gitlab.io/pages/pyagrum.html" target="blank"><img src="https://agrum.gitlab.io/images/pyAgrum.png" align="right" style="height:75px"/></a><a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This pyAgrum's notebook is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a><br/>
Author: <b>Aymen Merrouche</b> and Pierre-Henri Wuillemin.

<font size="+3" color="GREEN">**Walking Example**</font>
#### This notebook follows the example from "The Book Of Why" (Pearl, 2018) chapter 4 page 135. 

## Confounding

In [1]:
from IPython.display import display, Math, Latex,HTML

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.causal as csl
import pyAgrum.causal.notebook as cslnb
import os

> In 1998 a study unveiled a correlation between physical exercise and longevity among nonsmoking retired men. Of course what we want to know is whether men who exercise more live longer, suggesting a causal relationship. Study measurements are to be found at the end of this notebook.

### We create the causal diagram:
The corresponding causal diagram is the following:

In [2]:
# We create the causal diagram
we = gum.fastBN("Walking{casual|normal|intense}->Mortality{dead|alive}")

# We fill the CPTs
we.cpt("Walking")[:]=[151/707,379/707,177/707]
we.cpt("Mortality")[{"Walking":"casual"}]=[0.43,0.57]
we.cpt("Mortality")[{"Walking":"intense"}]=[0.215,0.785]
we.cpt("Mortality")[{"Walking":"normal"}]=[0.277,0.723]
                  
gnb.sideBySide(we,we.cpt("Walking")*we.cpt("Mortality"),we.cpt("Walking"),we.cpt("Mortality"),
               captions=["the BN","the joint distribution","the marginal for $Walking$","the CPT for $Mortality$"])

0,1,2,3
G Walking Walking Mortality Mortality Walking->Mortality the BN,Walking  Mortality  casual  normal  intense dead0.09180.14850.0538 alive0.12170.38760.1965 the joint distribution,Walking  casual  normal  intense 0.21360.53610.2504 the marginal for $Walking$,Mortality  Walking  dead  alive casual0.43000.5700 normal0.27700.7230 intense0.21500.7850 the CPT for $Mortality$

Unnamed: 0_level_0,Walking,Walking,Walking
Mortality,casual,normal,intense
dead,0.0918,0.1485,0.0538
alive,0.1217,0.3876,0.1965

Walking,Walking,Walking
casual,normal,intense
0.2136,0.5361,0.2504

Unnamed: 0_level_0,Mortality,Mortality
Walking,dead,alive
casual,0.43,0.57
normal,0.277,0.723
intense,0.215,0.785


> The study showed that after 12 years, 43% of casual walkers died while only 21,5% of intense walkers died.

## Causal effect of walking on mortality in this model:

In [3]:
weModele = csl.CausalModel(we)
cslnb.showCausalImpact(weModele,"Mortality",doing="Walking",values={})

Unnamed: 0_level_0,Mortality,Mortality
Walking,dead,alive
casual,0.43,0.57
normal,0.277,0.723
intense,0.215,0.785


> Before jumping to any conclusions, we should consider the presence of possible confounders. We need to ask the following question: <b>what characterizes intense walkers from casual walkers? </b><br>
Without abandoning the idea of a possible cause-and-effect relationship between walking and mortality, we introduce a third variable, a "confounder", a common cause of the two variables that could explain the correlation that exists between them. Our aim is to distinguish between the causal effect of walking on mortality (if there is a cause and effect relationship) the bias induced by this third variable. For this purpose, we need to adjust for it. 

In [4]:
weModele1 = csl.CausalModel(we, [("confounder", ["Walking","Mortality"])], True)
cslnb.showCausalImpact(weModele1, "Mortality", "Walking",values={"Walking":"intense"})

## Introducing age as a confounder:

> We want to measure the causal effect of walking on mortality, the introduction of a confounding bias occurs when a third variable called "confounding variable" influences both walking and mortality. <br>
An obvious confounder is <b>age</b>, younger subjects exercise more and have more time to live! (there are other confounders)

### Let's use fictitious data:

In [5]:
wea = gum.fastBN("Age{cat1|cat2|cat3}->Walking{casual|normal|intense}->Mortality{dead|alive}<-Age{cat1|cat2|cat3}")
                 
gnb.sideBySide(wea,wea.cpt("Age"),wea.cpt("Walking"),wea.cpt("Mortality"),
               captions=["the BN","the marginal for $Age$","the CPT for $Walking$","the CPT for $Mortality$"])

0,1,2,3
G Walking Walking Mortality Mortality Walking->Mortality Age Age Age->Walking Age->Mortality the BN,Age  cat1  cat2  cat3 0.46840.42400.1077 the marginal for $Age$,Walking  Age  casual  normal  intense cat10.22290.40600.3711 cat20.58100.19430.2247 cat30.09270.05430.8531 the CPT for $Walking$,Mortality  AgeWalking  dead  alive  cat1casual0.50680.4932 normal0.60670.3933 intense0.42960.5704  cat2casual0.65650.3435 normal0.44280.5572 intense0.54200.4580  cat3casual0.54400.4560 normal0.16020.8398 intense0.18470.8153 the CPT for $Mortality$

Age,Age,Age
cat1,cat2,cat3
0.4684,0.424,0.1077

Unnamed: 0_level_0,Walking,Walking,Walking
Age,casual,normal,intense
cat1,0.2229,0.406,0.3711
cat2,0.581,0.1943,0.2247
cat3,0.0927,0.0543,0.8531

Unnamed: 0_level_0,Unnamed: 1_level_0,Mortality,Mortality
Age,Walking,dead,alive
cat1,casual,0.5068,0.4932
cat1,normal,0.6067,0.3933
cat1,intense,0.4296,0.5704
cat2,casual,0.6565,0.3435
cat2,normal,0.4428,0.5572
cat2,intense,0.542,0.458
cat3,casual,0.544,0.456
cat3,normal,0.1602,0.8398
cat3,intense,0.1847,0.8153


## Causal effect of walking on mortality with age as a confounder:

In [6]:
weModele2 = csl.CausalModel(wea)
cslnb.showCausalImpact(weModele2, "Mortality", "Walking",values={})

Unnamed: 0_level_0,Mortality,Mortality
Walking,dead,alive
casual,0.5743,0.4257
normal,0.4891,0.5109
intense,0.4509,0.5491


> We adjusted for Age using the back-door criterion (Age blocks all back-door paths from Walking to Mortality, setting Walking=
"intense" or conditioning on Walking="intense" has the same effect on Mortality)

## Conclusion:

> After adjusting for age, we obtain that 40.5% (43% unadjusted) of casual walkers died, whereas only 23.8% (21,5% unadjusted) of intense walkers died. The correlation induced by Age between the two variables is negligible. <br>
Even after adjusting for all plausible confounders, after getting rid of all confounding bias, Walking is still associated to Mortality. Unless we missed any other confounders, <b> in which case the remaining uncertainty is proportional to the correlation induced by these hidden variables, </b>we can say that intentional walking prolongs life among the studied population.

> <b>In an observational study, adjusting for confounding factors is systematic in order to measure the causal effect of a treatment on an outcome.</b>

## Study measurements both unadjusted and age-adjusted: 

![title](images/WalkingExampleInfo.jpeg)