# A causal look into the factors of world happiness [1-Graphical modeling]

## intro
In this part we try to fly over the basic concepts of *Causal Inference leveraging Causal Graphs modeling*. We start from common-sense assumptions in the form of graphs and we try to fit the dataset to these representation. The main concepts we are going to tackle are:
* **root nodes** and **conditional nodes**
* **mechanisms**
* **interventions**
* **colliders** and **y-shapes**
* **biases** in causal graphs

The objective is to get familiar with the principles of causal analysis, the different tools we have to infer values of conditional nodes, how to fit the assumptions in the graph to the dataset and how to apply intervention on nodes. What we are doing here is trying to test different naive and educated guesses and compare them. There is also the possibility of using algorithms to do what is called causal discovery (for example, Fast Causal Inference). There are also combinatorial processes to design the causal graph. These other aspects will be tackled briefly in later posts. 

Graphical modeling (`gcm`) at current time is still an experimental feature in the `doWhy` library.

Important: the dataset comes from Kaggle without any metadata about how the columns are computed! So I had to do some reverse engineering of the features in the dataset. 🛑 Here we are just trying to showcase some useful tools, not really to provide a thorough analysis; we *happily* leave that to teams of experts supported by software engineers.

In [1]:
from functools import partial
def flatten(xss):
    return [x for xs in xss for x in xs]
import pandas as pd
from tabulate import tabulate

In [2]:
df = pd.read_csv('world-happiness-report-2021.csv')
df.drop(list(df.filter(regex='Explained')), axis=1, inplace=True)
df.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,2.798


Let's rename the columns so to match our graphs:

    Ladder score: Y
    Logged (natural) GDP per capita: S
    Social support: J
    Healthy life expectancy: X
    Freedom to make life choices: W

In [3]:
df = df[["Ladder score",
         "Logged GDP per capita",
         "Social support",
         "Healthy life expectancy",
         "Freedom to make life choices"]
].copy()
df.rename(columns={
    "Ladder score": "Y",
    "Logged GDP per capita": "S",
    "Social support": "J",
    "Healthy life expectancy": "X",
    "Freedom to make life choices": "W" 
}, inplace=True)
df.head(5)

Unnamed: 0,Y,S,J,X,W
0,7.842,10.775,0.954,72.0,0.949
1,7.62,10.933,0.954,72.7,0.946
2,7.571,11.117,0.942,74.4,0.919
3,7.554,10.878,0.983,73.0,0.955
4,7.464,10.932,0.942,72.4,0.913


In [4]:
df.describe()

Unnamed: 0,Y,S,J,X,W
count,149.0,149.0,149.0,149.0,149.0
mean,5.532839,9.432208,0.814745,64.992799,0.791597
std,1.073924,1.158601,0.114889,6.762043,0.113332
min,2.523,6.635,0.463,48.478,0.382
25%,4.852,8.541,0.75,59.802,0.718
50%,5.534,9.569,0.832,66.603,0.804
75%,6.255,10.421,0.905,69.6,0.877
max,7.842,11.647,0.983,76.953,0.97


## hypothesis 0
Now that we have a selection of what we wanted as defined in [the previous blogpost](https://economyoftime.net/a-causal-look-into-the-factors-of-world-happiness-0-causal-graphs-b8f4dd1f13c3), we can start running experiments on our assumptions.

Let's start with trying to fit a uneducated guess, the "pauperistic" causal graph (0):

![cg0](./assets/Model01_00_pauperistic.png)

In [5]:
from dowhy import gcm
import networkx as nx

scm0 = gcm.StructuralCausalModel(
    nx.DiGraph([('S', 'Y')])
)
# we draw the mechanism for the root node S by using "a model that uniformly samples from data samples"
scm0.set_causal_mechanism(
    'S', gcm.EmpiricalDistribution())  ## alternative BayesianGaussianMixtureDistribution
scm0.set_causal_mechanism(
    'Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))

In [6]:
# we defined a statistical mechanism for each node according to a probabilistic distribution
scm0.causal_mechanism("Y")

<dowhy.gcm.fcms.AdditiveNoiseModel at 0x7ff52862bc70>

Let's fit the designed causal graph to the dataset:

In [7]:
gcm.fit(scm0, df[["S", "Y"]])

Fitting causal mechanism of node Y: 100%|██████████| 2/2 [00:00<00:00, 48.76it/s]


Now we  are ready for some exploratory activities
One of the most important tool to climb the Ladder of Causation is *intervention*. In this case of one covariate `S` that defines an effect `Y` via linear regression what we can expect is to spot more or less association between the two. To step up our understanding we need to do some intervention (in terms of causal analysis `do(S)`) on the covariate and see what happens. With this basic one covariate effect what we get is a non-causal relation as there is no conditionality involved, `S` just "transmits" its value to `Y` according to a mechanism defined statistically on a probability distribution.

Let's try to see what happens with applying an **intervention**:

In [8]:
def do_intervention_atomic(model, covariate, value):
    "Make an intervention by setting a covariate to a given VALUE"
    return gcm.interventional_samples(
        model,
        {covariate: lambda x: value},
        num_samples_to_draw=149
    )



print("Atomic Intervention: Set value S=n")
# take a random sample of 3 countries
sample_df01 = df[["S", "Y"]].sample(
    n=3, random_state=10101, ignore_index=False).copy()
sample_indices01 = sample_df01.index.copy()

table01 = []
for i in sample_indices01:
    row = [[i]]
    for v in (-1, 0, 5, 9.06, 12):  ## some fixed values to compare
        if v != -1: row.append(
            do_intervention_atomic(
                scm0,
                "S",
                v).iloc[i].to_list()  # <--
        )
        else: row.append(df[["S", "Y"]].iloc[i].to_list())
    table01.append(flatten(row))

print(tabulate(
    table01, 
    headers=["i", "S_orig","Y_orig","S=0", "Y_0", "S=5",
             "Y_5", "S=9.6", "Y_9.6", "S=12", "Y_12"]))

Atomic Intervention: Set value S=n
  i    S_orig    Y_orig    S=0        Y_0    S=5      Y_5    S=9.6    Y_9.6    S=12     Y_12
---  --------  --------  -----  ---------  -----  -------  -------  -------  ------  -------
 35     9.859     6.317      0  -0.763987      5  2.29662     9.06  5.57879      12  7.49462
 75    10.189     5.477      0  -0.538651      5  3.21426     9.06  5.38563      12  6.71837
 71     9.94      5.581      0  -1.59934       5  2.89779     9.06  6.22315      12  6.20904


This is an "atomic intervention", the covariate is set to a fixed value.

🔭 it seems that country 75 is much more sensitive to have its income reduce to 1$ than 35 and 71, even they start from a similar original income. This is just a first impression though according to a causal graph that we designed to be limited and flawed according to common sense. 


We can also perform "shift interventions" where we apply a function instead of a fixed value.

In [9]:
def do_intervention_shift(model, covariate, func):
    "Make an intervention by setting a covariate by a given FUNCTION"
    return gcm.interventional_samples(
        model,
        {covariate: lambda x: func(log_value=x)},
        num_samples_to_draw=1000
    )

def multiply_by(log_value, multiplier):
    from math import log, e
    return log((e ** log_value) * multiplier)

print("Shifting Intervention [POSSIBLY WRONG]: increase by a percentage")
table02 = []
for i in sample_indices01:
    row = [[i]]
    for v in (-1, 1.1, 1.2, 1.3):
        if v != -1: row.append(
            do_intervention_shift(
                scm0, 
                "S", 
                partial(multiply_by, multiplier=v)).iloc[i].to_list() ## <---
        )
        else: row.append(df[["S", "Y"]].iloc[i].to_list())
    table02.append(flatten(row))

print(tabulate(
    table02,
    headers=["i", "S_orig","Y_orig", "S_10%", "Y_10%",
             "S_20%", "Y_20%", "S_30%", "Y_30%"]))



Shifting Intervention [POSSIBLY WRONG]: increase by a percentage
  i    S_orig    Y_orig     S_10%    Y_10%     S_20%    Y_20%    S_30%    Y_30%
---  --------  --------  --------  -------  --------  -------  -------  -------
 35     9.859     6.317   8.03831  4.80073   7.65932  4.70675  8.45136  4.19171
 75    10.189     5.477  11.2123   6.9306    8.72432  4.72469  8.80036  4.61454
 71     9.94      5.581   9.67931  5.45519  10.2533   6.02233  8.80036  5.07191


We tried to increase by 10% steps the value for GDP for the three random samples.

🛑 These look like inconsistant outcomes. Maybe another clue that the "pauperistic" casual graph (Hypothesis 0, only `S` considered as a cause) cannot help in understanding why the effect looks like it looks? Or the model cannot just work with this one covariate setup? As per any other statistical process, it takes many clues to come to a useful result. Maybe we can disproof this hypothesis by showing that the others are better at explaining the effect (`Y`) that we observe.  

## hypothesis 1

Let's start with trying to fit the "naive" causal graph (1), a graph in which all the covariates are effect modifiers (they are all effecting directly):


![naive](./assets/Model01_01_naive.png)

For consistency of observations we use the same mechanism used for hypothesis 0, a more in depth study of the best mechanisms to use is always advisable though.

In [10]:
columns_order = ["S", "J", "X", "W", "Y"]

scm1 = gcm.StructuralCausalModel(nx.DiGraph([("S", "Y"), ("J", "Y"), ("X", "Y"), ("W", "Y")]))

scm1.set_causal_mechanism(
    'S', gcm.EmpiricalDistribution())
scm1.set_causal_mechanism(
    'J', gcm.EmpiricalDistribution())
scm1.set_causal_mechanism(
    'X', gcm.EmpiricalDistribution())
scm1.set_causal_mechanism(
    'W', gcm.EmpiricalDistribution())
scm1.set_causal_mechanism(
    'Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))

In [11]:
gcm.fit(scm1, df)

Fitting causal mechanism of node W: 100%|██████████| 5/5 [00:00<00:00, 94.16it/s]


Let's see now what happens if we do the same interventions as for hypothesis 0 by comparing how the other covariates contribute to the score by fixing the values of ``:

In [12]:
sample_df11 = df.sample(
    n=3, random_state=10101, ignore_index=False).copy()
sample_indices11 = sample_df11.index

original11 = df.iloc[sample_indices11[1]]
# let's try to set the GDP for the the the second sample to 3.0
intervention11_on_s = do_intervention_atomic(
    scm1,
    "S",
    9.06  ## <-- approx 1/3
).iloc[sample_indices11[1]]

table11 = [
    ["original", sample_indices11[1]] + original11[columns_order].to_list(),
    ["S=1/3", sample_indices11[1]] + intervention11_on_s[columns_order].to_list()]

print(tabulate(table11, headers=["action", "i"] + columns_order))


action      i       S      J       X      W        Y
--------  ---  ------  -----  ------  -----  -------
original   75  10.189  0.903  64.703  0.718  5.477
S=1/3      75   9.06   0.75   69.702  0.859  6.16992


As we used the same sample for the previous hypothesis and this one, we can see the difference in the same country sample by doing the same intervention on hypothesis 0 and hypothesis 1.


Atomic Intervention on index 75 in the context of **hypothesis 0**: Set value to 9.06 (~1/3 of the income)
```
i    S=9.06  Y_9.06  S=original  Y=original
---  ----  -------   ----------  ----------
75   9.06  4.08419   10.189      5.477

```
Atomic Intervention on index 75 in the context of **hypothesis 1**: Set value to 9.06 (~1/3 of the income)
```
i    S=9.06   Y       S=original  Y=original
---  ------  -------  ----------  ----------
75   9.06    5.48872  10.189      5.477
```

Trying to keep the rest fixed, we see how the hypothesis 0 is much more "pessimistic" about the association of `S` to the effect, and this is attributed only to the mere presence of the other effect modifiers. 🛑 Still not very explanatory about causal relations but we can start understanding the basics of graphical causal reasoning. *For now* the intuition is: the better we can graphically explain the "problem", the clearer will be where to look for meaningful associations. This may be called "graphical causal discovery". Both hypothesis looks quite weak as 0 is flawed by design and 1 results into an increase of `Y` after the income is reduce to 1/3 of its original value. **NOT EVEN CLOSE YET**, associations are quite contraddictory and not even a glimpse of an explicative causal relation.

Let's move to a less naive causal graph.

## hypothesis 2

Let's start with trying to fit the "less naive" causal graph (2), a graph with effect modifiers and a mediator:


![naive](./assets/Model01_02_less_naive.png)

For consistency of observations we use the same mechanism used for hypothesis 0, a more in depth study of the best mechanisms to use is always advisable though.

I bet this is going to look a little better by still not good enough. We will see down the road that we can make a lot of use of patterns in the graph:
* "colliders": like for a hypothetical graph `A → C ← B` --- `A` and `B` "collides" in `C`.
* "y-pattern": looks like `[A → C ← B, C → D]`.

So we may want to see and justify the presence of these kinds of patterns in our graph at some point, that is what automated causal discovery algorithms try to do.

In [13]:
# we just need to change one of our edges compared to `scm1``
scm2 = gcm.StructuralCausalModel(nx.DiGraph([("J", "Y"), ("S", "X"), ("X", "Y"), ("W", "Y")]))

scm2.set_causal_mechanism(
    'S', gcm.EmpiricalDistribution())
scm2.set_causal_mechanism(
    'J', gcm.EmpiricalDistribution())
scm2.set_causal_mechanism(
    'X', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))  ## X is not a root node in this one
scm2.set_causal_mechanism(
    'W', gcm.EmpiricalDistribution())
scm2.set_causal_mechanism(
    'Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))

gcm.fit(scm2, df)

Fitting causal mechanism of node W: 100%|██████████| 5/5 [00:00<00:00, 136.37it/s]


In [14]:
sample_df21 = df.sample(n=3, random_state=10101, ignore_index=False).copy()
sample_indices21 = sample_df21.index

original21 = df.iloc[sample_indices21[1]]
# let's try to set the GDP for the the the second sample to 3.0
intervention21_on_s = do_intervention_atomic(
    scm2,
    "S",
    9.06
).iloc[sample_indices21[1]]

table21 = [
    ["original", sample_indices21[1]] + original21[columns_order].to_list(),
    ["S=1/3", sample_indices21[1]] + intervention21_on_s[columns_order].to_list()]

print(tabulate(table21, headers=["action", "i"] + columns_order))

action      i       S      J        X      W        Y
--------  ---  ------  -----  -------  -----  -------
original   75  10.189  0.903  64.703   0.718  5.477
S=1/3      75   9.06   0.691  66.7576  0.927  6.61317


Still not what we expect from an empirical perspective, we see a relevant movement down of `X` (life expectancy) when the income is reduced to 1/3, but still its effect on `Y` is ambiguous, the "Ladder score" (a score of happiness as computed relatively to a fictional baseline country that has all the worst value for every feature) is still improving for sample 75 even with a lower income and a shorter life... How is it possible? We need to trust the data somehow, the problem is probably in the causal graph representation, that is why we are working out to improve it. Maybe we need to try to look for possible colliders or y-shapes?

## hypothesis 3
We are going to try now an hypothesis that contains a collider shape (hypothesis 3, "health and freedom"), and later on we will try to exclude the income from the causal chain by using it as an effect instead of as an indirect cause as in the previous hypothesis.

![health_freedom](./assets/Model01_03_health_freedom.png)

There is one collider in `X` and one mediator in `W`.

In [15]:
# we have a little more complex graph
scm3 = gcm.StructuralCausalModel(nx.DiGraph([("J", "X"), ("S", "X"), ("S", "W"), ("W", "Y")]))

scm3.set_causal_mechanism(
    'S', gcm.EmpiricalDistribution())
scm3.set_causal_mechanism(
    'J', gcm.EmpiricalDistribution())
scm3.set_causal_mechanism(
    'X', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))
scm3.set_causal_mechanism(
    'W', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))  ## W is not a root node in this one
scm3.set_causal_mechanism(
    'Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))

gcm.fit(scm3, df)

Fitting causal mechanism of node Y: 100%|██████████| 5/5 [00:00<00:00, 156.90it/s]


In [16]:
sample_df31 = df.sample(
    n=3, random_state=10101, ignore_index=False).copy()
sample_indices31 = sample_df31.index

original31 = df.iloc[sample_indices31[1]]

# intervention
intervention31_on_s = do_intervention_atomic(
    scm3,
    "S",
    9.06
).iloc[sample_indices31[1]]

table31 = [
    ["original", sample_indices31[1]] + original31[columns_order].to_list(),
    ["S=1/3", sample_indices31[1]] + intervention31_on_s[columns_order].to_list()]

print(tabulate(table31, headers=["action", "i"] + columns_order))

action      i       S      J        X         W        Y
--------  ---  ------  -----  -------  --------  -------
original   75  10.189  0.903  64.703   0.718     5.477
S=1/3      75   9.06   0.934  63.3745  0.753987  5.53393


Here it starts to sound and look a little closer to what we expected: the "happiness" went down after a lowering in the income that trasmitted both in the "life expecancy" and in the "freedom to choose". Still not good enough but little better. The "social support" reamined unchanged as there is no direct or undirect path from `S` We should start to think that other factors have a meaningful impact compared to the flawed hypothesis 0? Can we account this minor improvement to the presence of the collider `S → X ← J`?

Let's try to "control" for another covariate, the "social support" `J`: 

In [17]:
intervention32_on_j = do_intervention_atomic(
    scm3,
    "J",
    0.600
).iloc[sample_indices31[1]]

table32 = [
    ["original", sample_indices31[1]] + original31[columns_order].to_list(),
    ["J=0.6", sample_indices31[1]] + intervention32_on_j[columns_order].to_list()
]

print(tabulate(table32, headers=["action", "i"] + columns_order))

action      i       S      J        X         W        Y
--------  ---  ------  -----  -------  --------  -------
original   75  10.189  0.903  64.703   0.718     5.477
J=0.6      75   9.47   0.6    65.7821  0.622057  5.08605


As we can see "social support" `J` is a quite relevant variable in this graph, lowering makes some moving down to happen in `S`, `X` but the "happiness" seems to not agree and moves up. Again, we need to trust the data so it makes sense to say that this version of the causal graph is not good enough yet again; things look like they are starting to move in the right direction though...

A social support of `0.6` is a statistical anomaly according to the descriptive statistics for `J`, let's try to use the mean:

In [18]:
intervention33_on_j = do_intervention_atomic(
    scm3,
    "J",
    df["J"].median()
).iloc[sample_indices31[1]]

table33 = [
    ["original", sample_indices31[1]] + original31[columns_order].to_list(),
    ["median J", sample_indices31[1]] + intervention33_on_j[columns_order].to_list()
]

print(tabulate(table33, headers=["action", "i"] + columns_order))

action      i       S      J        X         W        Y
--------  ---  ------  -----  -------  --------  -------
original   75  10.189  0.903  64.703   0.718     5.477
median J   75  10.238  0.832  67.2643  0.517891  3.97823


This starts to look cool! Considering a more realistic value for `J` the effect seems to go in the direction we expected. Lowering social support to the median value made the income, life expectancy and "happiness" to lower even if of margin that could be irrelevant to the statistical error. Can we say that this causal graph is marginally better at explaining the relations among the variables?

Let's try to control income and freedom to choose to their median values:

In [19]:
s_median = df["S"].median()
w_median = df["W"].median()

intervention34_on_s = do_intervention_atomic(
    scm3,
    "S",
    s_median
).iloc[sample_indices31[1]]
intervention35_on_w = do_intervention_atomic(
    scm3,
    "W",
    w_median
).iloc[sample_indices31[1]]

table34 = [
    ["original", sample_indices31[1]] + original31[columns_order].to_list(),
    [f"S={s_median}", sample_indices31[1]] + intervention34_on_s[columns_order].to_list(),
    [f"W={w_median}", sample_indices31[1]] + intervention35_on_w[columns_order].to_list()
]

print(tabulate(table34, headers=["action", "i"] + columns_order))

action      i       S      J        X         W        Y
--------  ---  ------  -----  -------  --------  -------
original   75  10.189  0.903  64.703   0.718     5.477
S=9.569    75   9.569  0.787  66.8134  0.894503  4.25943
W=0.804    75   9.629  0.71   65.9221  0.804     5.40623


Still a lot to work to do here but we start seeing some sense according to human expectations. These may be good or bad according to if and which **biases** are present in our interpretation of the relations in the dataset. We are still observing on only one sample, but the model is considering the full dataset while doing its adjustments and we could compare the difference in control mechanism for the entirety of the dataset and maybe tell more.

The presence of a collider and a y-shape can lift us from any doubt, as if confirmed to be fit enough to be in the causal graph it will exclude the presence of an unobserved confouder. Collider shape if spotted using FCI "has a unique independence relationship compared with other causal relationships. In fact, it is one of the “primitives” that constraint-based algorithm, like FCI, looks for. A feature specific to FCI even among constraint-based methods is its ability to discover latent (unobserved) confounders. This is enabled by another primitive, the “Y” structure. Four variables define a “Y” structure when they have the following causal relationships: W1 → X ← W2 and X → Y. Within the “Y” structure, both W1 and W2 are independent of Y conditional on X. This conditional independence helps rule out the possibility of an unmeasured confounder between X and Y. In other words, when FCI finds a “Y” structure in the graph, the causal relationship from X to Y is guaranteed to be unconfounded; otherwise, FCI assumes that possibly unobserved confounders exist."[1]

So if we can demonstrate in the dataset that our y-shape relation exists indeed among `S, J, X, Y`, we can rule out unobserved confounders [2]. Obviously this takes a great amount of attention, patient and properly acquired and structured datasets from repeatable observations; requirements that we cannot here establish for the sake of this simple example dataset. Teams around the world in any discipline work to make this kind of analysis possible. We will try to apply FCI to the dataset doen the road.

## hypothesis 4
Let's finish this part of the series of blogpost with hypothesis 4 or "lockean" causal graph for the dataset.

![lockean](./assets/Model01_04_lockean.png)

We did remove the collider here, and consider income as an effect instead of a cause, to check how adding an effect will change what we see. For sure automated causal discovery algorimth (like FCI or other causal learning tools) will do and say better down the road.

In [20]:
scm4 = gcm.StructuralCausalModel(
    nx.DiGraph([("J", "W"), ("W", "Y"), ("W", "S"), ("X", "Y")])
)

scm4.set_causal_mechanism(
    'S', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))  ## S is now an effect
scm4.set_causal_mechanism(
    'J', gcm.EmpiricalDistribution())
scm4.set_causal_mechanism(
    'X', gcm.EmpiricalDistribution())  ## X is back to being a root node
scm4.set_causal_mechanism(
    'W', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))
scm4.set_causal_mechanism(
    'Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor()))

gcm.fit(scm4, df)

Fitting causal mechanism of node X: 100%|██████████| 5/5 [00:00<00:00, 120.15it/s]


In [21]:
# let's paly the do(...) game for this last one
s_median = df["S"].median()
w_median = df["W"].median()

sample_df41 = df.sample(
    n=3, random_state=10101, ignore_index=False).copy()
sample_indices41 = sample_df41.index

original41 = df.iloc[sample_indices41[1]]


intervention41_on_s = do_intervention_atomic(
    scm4,
    "S",
    s_median
).iloc[sample_indices41[1]]
intervention42_on_w = do_intervention_atomic(
    scm4,
    "W",
    w_median
).iloc[sample_indices41[1]]

table41 = [
    ["original", sample_indices41[1]] + original41[columns_order].to_list(),
    [f"S={s_median}", sample_indices41[1]] + intervention41_on_s[columns_order].to_list(),
    [f"W={w_median}", sample_indices41[1]] + intervention42_on_w[columns_order].to_list()
]

print(tabulate(table41, headers=["action", "i"] + columns_order))

action      i         S      J       X         W        Y
--------  ---  --------  -----  ------  --------  -------
original   75  10.189    0.903  64.703  0.718     5.477
S=9.569    75   9.569    0.879  67.1    0.869331  6.50264
W=0.804    75   8.82944  0.821  68.25   0.804     6.50061


These results are quite unexpected! How is it possible that `S` still effects the other covariates even without any edge going out? In particular, in this setup, it looks to have an incredible effect on `X` and `W` and `Y` exactly like Locke would have expected according to its philosophical points about the foundational role played by the rule of law (in our dataset `W` the freedom to choose) to facilitate the ingenuity present in society.

This is funny, to see how we saw all these different alluring associations in the same dataset. This is probably the first thing to understand, that for a proper analysis we need to clean out the room from biases and confounders, otherwise we risk to loose the meaning about what the data can tell. One thing we can build upon is the importance and the attention to give to **interventions, colliders and y-shapes in their role to eliminate confounders and the many other biases** that can make our data unusable or a generator of paradoxical outcomes.

There are different methods to establish a causally sound graph in accordance with observational data, for a list of tools see [here](https://github.com/rguo12/awesome-causality-algorithms).

Next time we will tackle something different.

# References
[1] https://www.nature.com/articles/s41598-020-59669-x <br/>
[2] Mani, S., Spirtes, P. & Cooper, G. F. A theoretical study of Y structures for causal discovery. CoRR abs/1206.6853 (2012).