# Collider Bias & Survivorship (Success Stories)
**Hands‑on Notebook**

This notebook illustrates **collider bias** via a simple model of success stories:

- People differ in **risk-taking** and **luck**.
- Success depends on **both** risk and luck.
- We then look only at the **successful people** and inspect correlations.

This mimics real-world situations where we only hear from successful founders, artists, or influencers.
Conditioning on success (the collider) creates spurious relationships.

In [None]:
import numpy as np
import pandas as pd

rng = np.random.default_rng(1)
N = 200000

# Risk-taking: 0 (low) or 1 (high)
risk = rng.binomial(1, 0.3, size=N)

# Luck: continuous
luck = rng.normal(0, 1, size=N)

# Success probability depends on both risk and luck
logit = -2 + 2 * risk + 1.5 * luck
p_success = 1 / (1 + np.exp(-logit))
success = rng.binomial(1, p_success)

df = pd.DataFrame({"risk": risk, "luck": luck, "success": success})
df.head()

## 1. Correlations in the Full Population

First, inspect the relationship between `risk` and `luck` in the **entire population**.

In [None]:
df[["risk", "luck"]].corr()

In the full population, `risk` and `luck` should be (approximately) **independent**.
Their correlation should be close to 0.

This represents the idea that risk-taking and luck are separate traits.

## 2. Condition on the Collider: Look Only at Successful People

Now we restrict the data to those who **succeeded** (`success == 1`) and re-compute the correlation.

In [None]:
df_success = df[df["success"] == 1]
df_success[["risk", "luck"]].corr()

You should see that among successful people, `risk` and `luck` become **negatively correlated**.

Intuition:
- People who are very lucky can succeed even with low risk.
- People who are very risk-taking can sometimes compensate for low luck.
- Within the selected group (`success == 1`), having more of one trait is associated with having less of the other.

This is **collider bias**:
we have conditioned on `success`, a collider between `risk` and `luck`.

## 3. Inspect Group Means

Compare the average luck among low-risk and high-risk individuals, but **only among the successful**.

In [None]:
df_success.groupby("risk")["luck"].mean()

Typically, you should see that **among the successful**, high-risk individuals have, on average, *lower* luck than low-risk individuals.

This can generate a misleading story:

> “Look at all the successful people — many of them took big risks and were not especially lucky. So risk-taking must be the key to success!”

But this ignores all the people who took similar risks and failed, and thus never appear in the dataset.

## Exercise

1. Repeat the correlation analysis **only among failures** (`success == 0`).
   - What is the correlation between `risk` and `luck` there?