In [3]:
import pandas as pd 
df = pd.read_stata("lakisha_aer.dta")

In [6]:
import altair as alt

black_years_exp = df[df["race"] == "b"]["yearsexp"].to_list()
white_years_exp = df[df["race"] == "w"]["yearsexp"].to_list()

exp = pd.DataFrame({"black": black_years_exp,
                    "white": white_years_exp})

alt.Chart(exp).transform_fold(
    ['black', 'white'],
    as_=['Experiment', 'Measurement']
).mark_area(
    opacity=0.3,
    interpolate='step'
).encode(
    alt.X('Measurement:Q', bin=alt.Bin(maxbins=100)),
    alt.Y('count()', stack=None),
    alt.Color('Experiment:N')
)

In [10]:
import numpy as np
np.mean(df[df["race"] == "b"]["yearsexp"]) - np.mean(df[df["race"] == "w"]["yearsexp"])

-0.026694045174538772

### Effect size

The "effect size" between two groups is a measure of how far apart the means of two groups are. One common measure of effect size is Cohen's $d$ which is defined as 

$d=\frac{\overline{x}_1 - \overline{x}_2}{s}$

where $\overline{x}_i$ is the mean of group $i$ and $s$ is the standard deviation of the dataset. Recall that the standard deviation is the square root of the variance.* 

Cohen's $d$ is one way to tell how far apart the means of the treatment and control group are. In general, the bigger the effect size, the easier it will be to tell that the difference between groups is unlikely to be observed by chance, i.e. that the difference is "statistically significant" (more on those magic words to come.

*Note, some versions of Cohen's $d$ use a more complex formula for $s$, let's stick with the simple one for now.

1. What is the Cohen's $d$ between the years experience in the white and black groups?
2. Why is it important to include $s$ in measuring effect size?

3. Simulate data for the two groups using different means and standard deviations and observe the faceted historgrams. What happens to the histograms as Cohen's $d$ gets bigger or smaller?

In [13]:
! cp Bertrand* /Users/abramhandler/Desktop/