In [None]:
# === Environment Setup ===
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
from IPython.display import display, Markdown
from pathlib import Path

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 12, 'figure.figsize': (11, 7), 'figure.dpi': 130})

# --- Utility Functions ---
def note(msg): display(Markdown(f"<div class='alert alert-block alert-info'>📝 **Note:** {msg}</div>"))
def sec(title): print(f"\n{80*'='}\n| {title.upper()} |\n{80*'='}")
note("Environment initialized.")

# Appendix 1: Replication Exercise - Chetty et al. (2014)

---
### Introduction: The Geography of Intergenerational Mobility

This notebook provides a hands-on replication exercise for one of the most influential economics papers of the last decade: **"Where is the Land of Opportunity? The Geography of Intergenerational Mobility in the United States"** by Raj Chetty, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez (2014).

The authors use a massive dataset of administrative tax records to measure intergenerational income mobility for 741 "commuting zones" (CZs) across the United States. Their key finding is that a child's chances of moving up the income ladder depend dramatically on the city or region where they grow up.

**Our Objective:** We will replicate a core result from the paper: the strong, negative correlation between a CZ's level of income inequality (as measured by the Gini coefficient) and its level of intergenerational mobility. This exercise will provide practical experience in data handling, regression analysis, and interpreting real-world economic results.

### 1. Data Acquisition

The authors have made their data publicly available through the [Equality of Opportunity Project](https://opportunityinsights.org/). We will download the main dataset containing the CZ-level statistics.

**Key Variables of Interest:**
- `gini96`: The Gini coefficient in 1996, a measure of income inequality.
- `rank_slope`: The slope of the rank-rank regression of child income rank on parent income rank. A higher slope means lower mobility (a child's income is more tied to their parent's).
- `abs_mob_p25`: The absolute upward mobility for children with parents at the 25th percentile of the national income distribution. This is our primary measure of mobility.

In [None]:
sec("Data Acquisition")

# Define local data path
data_dir = Path("data/chetty_2014")
data_file = data_dir / "cz_outcomes.csv"
data_url = 'https://opportunityinsights.org/wp-content/uploads/2018/10/cz_outcomes.csv'

# Ensure the data directory exists
data_dir.mkdir(exist_ok=True)

if data_file.exists():
    note(f"Loading data from local file: {data_file}")
    df = pd.read_csv(data_file)
else:
    note(f"Local data not found. Downloading from {data_url}...")
    try:
        df = pd.read_csv(data_url)
        df.to_csv(data_file, index=False)
        note(f"Successfully downloaded and saved data to {data_file}. Shape: {df.shape}")
    except Exception as e:
        note(f"Failed to download data. Error: {e}")
        df = None

if df is not None:
    display(df[['czname', 'gini96', 'rank_slope', 'abs_mob_p25']].head())

### 2. Replicating the Core Result

We will now replicate the central finding that inequality is negatively correlated with mobility. We will do this by running a simple OLS regression of the mobility measure (`abs_mob_p25`) on the inequality measure (`gini96`). We will also create a binned scatter plot, a powerful visualization technique used throughout the paper.

In [None]:
sec("Regression Analysis: Mobility vs. Inequality")

# Placeholder for regression analysis
if 'df' in locals():
    model = smf.ols('abs_mob_p25 ~ gini96', data=df).fit(cov_type='HC1') # Use robust standard errors
    print(model.summary())
    note("The coefficient on `gini96` is negative and highly statistically significant, confirming the core result from the paper: higher inequality is associated with lower upward mobility.")

In [None]:
sec("Visualization: Binned Scatter Plot")

# Placeholder for binned scatter plot
if 'df' in locals():
    # Create 20 bins of equal size based on the Gini coefficient
    df['gini_bin'] = pd.qcut(df['gini96'], 20, labels=False, duplicates='drop')
    bin_means = df.groupby('gini_bin')[['abs_mob_p25', 'gini96']].mean()

    plt.figure(figsize=(12, 8))
    sns.scatterplot(x='gini96', y='abs_mob_p25', data=bin_means, s=100, label='CZ Bins')
    sns.regplot(x='gini96', y='abs_mob_p25', data=bin_means, scatter=False, color='red', label='Best-Fit Line')
    
    plt.title('Upward Mobility vs. Inequality Across the U.S.')
    plt.xlabel('Income Inequality (Gini Coefficient)')
    plt.ylabel('Absolute Upward Mobility (Parent at 25th Pctile)')
    plt.legend()
    plt.grid(True)
    plt.show()

### 3. Other Correlates of Mobility

While income inequality is a powerful predictor, the authors identified several other factors that are strongly correlated with a region's level of upward mobility. In their summary, they highlight five key factors:

1.  **Segregation:** Areas with higher levels of racial and economic segregation have lower mobility.
2.  **Income Inequality:** As we replicated, regions with a smaller middle class and a higher Gini coefficient have lower mobility.
3.  **School Quality:** Regions with higher-quality public schools (as measured by test scores and lower dropout rates) have higher mobility.
4.  **Social Capital:** Areas with higher levels of social capital—measured using factors like religious participation and civic engagement—have higher mobility.
5.  **Family Structure:** The strongest predictor is the fraction of single-parent households. Areas with fewer single parents have significantly higher upward mobility.

### 4. Discussion and Conclusion

This replication exercise confirms a key finding from Chetty et al. (2014): a strong negative relationship between income inequality and intergenerational mobility across the United States. The binned scatterplot is a particularly powerful tool for visualizing this robust relationship in a non-parametric way.

**Thinking Critically:**
- **Correlation vs. Causation:** It is crucial to remember that these findings are correlations. While inequality is a strong predictor of mobility, it may not be the direct causal factor. It could be that other factors, like segregation or family structure, cause both high inequality and low mobility. The authors' subsequent work delves deeper into the causal channels.
- **Policy Implications:** If you were a policymaker, how might these findings inform your decisions? Would you focus on reducing Gini coefficients directly, or would you target other factors like school quality or residential segregation? This paper sparked a significant debate on which policies are most effective at improving opportunities for disadvantaged children.