# Collider Bias

Here is a simple mnemonic example to illustate the collider or M-bias.

Here the idea is that people who get to Hollywood have to have high congenility = talent + beauty.  Funnily enough this induces a negative correlation between talents and looks, when we condition on the set of actors or celebrities.  This simple example explains an anecdotal observation that "talent and beaty are negatively correlated" for celebrities.  

In [None]:
!pip install pgmpy

In [None]:
import numpy as np
import statsmodels.formula.api as smf
import networkx as nx
from pgmpy.base.DAG import DAG
from pgmpy.models.BayesianModel import BayesianNetwork
from pgmpy.inference.CausalInference import CausalInference
import pylab as plt

In [None]:
digraph = nx.DiGraph([('T','C'),
                    ('B','C')])
g = DAG(digraph)


nx.draw_planar(g, with_labels=True)
plt.show()

In [None]:
#collider bias
np.random.seed(123)
num_samples = 1000000
talent = np.random.normal(size=num_samples)
beauty = np.random.normal(size=num_samples)
congeniality = talent + beauty + np.random.normal(size=num_samples) #congeniality
cond_talent = talent[congeniality > 0]
cond_beauty =  beauty[congeniality > 0]
data = {"talent": talent, "beauty": beauty, "congeniality": congeniality, "cond_talent": cond_talent, "cond_beauty": cond_beauty}

print(smf.ols("talent ~ beauty", data).fit().summary())
print(smf.ols("talent ~ beauty + congeniality", data).fit().summary())
print(smf.ols("cond_talent ~ cond_beauty", data).fit().summary())

We can also use package pgmpy to illustrate collider bias, also known as M-bias.

In [None]:
inference = CausalInference(BayesianNetwork(g))
inference.get_all_backdoor_adjustment_sets('T', 'B')
## empty set -- we should not condition on the additional variable C.