## Estimating The Effect of Having Children on Labor Supply using a Sibling-Sex Composition Instrument

I follow [Angrist and Evans (1998)](http://piketty.pse.ens.fr/fichiers/enseig/ecoineg/articl/AngristEvans1998.pdf) and use 1980 census data to estimate the effect of having children on labor supply.

In particular, I exploit the fact that if the first two children in a family are of the same sex, the probability of having a third child is higher.
I use this exogenous variation in number of children to estimate the effect of having a third child on labor supply through an Instrumental Variables (IV) approach.
I estimate my results separately for all women, married women, and husbands.

The empirical specification is as follows:

First Stage:
$$
\text{ThreeOrMoreChildren}_i = \alpha_0 + \alpha_1 \cdot \text{SameSex}_i + \mathbf{X}_i' \beta + u_i
$$

Second Stage:
$$
\text{Y}_i = \gamma_0 + \gamma_1 \cdot \hat{\text{ThreeOrMoreChildren}}_i + \mathbf{X}_i' \delta + \epsilon_i
$$

Where:
- $\text{ThreeOrMoreChildren}_i$ is a binary variable indicating whether family $i$ has three or more children.
- $\text{Y}_i$ is the outcome for a parent in family $i$. This can be an indicator for labor force participation, hours worked per week, weeks worked per year, or annual labor income.
- $\hat{\text{ThreeOrMoreChildren}}_i$ is the predicted value from the first stage.
- $\text{SameSex}_i$ is the instrument for having three or more children.
- $\mathbf{X}_i$ is a vector of control variables including the mother's age, race, and age at birth of oldest child, as well as indicators for the gender of children.

The coefficient of interest is $\gamma_1$, which captures the causal effect of having three or more children on labor supply.

The data used is available for public use and can be downloaded at this [link](https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/11288) from the Harvard Dataverse (the filename is `m_d_806.tab`).

### Import Data and Packages

In [1]:
# Import packages
import pandas as pd
import numpy as np
from utils import *

# Import 1980 IPUMS USA census data
df = pd.read_csv("data/ipums_1980_angrist.csv") 

### Data Processing

1. **Variable Renaming**: Rename original variables for clarity, ensuring consistency across the dataset (e.g., `weeksm` becomes `mom_weeks_worked`).
   
2. **Missing Value Handling**: Replace `0` values with `NaN` for specific variables where 0 is not meaningful (e.g., `mom_age_married`, `mom_quarter_married`).

3. **Quarter Adjustments**: Adjust quarter variables (`mom_quarter_married`, `mom_quarter_birth`, `quarter_birth_oldest_child`) by subtracting 1 to correct for 1-indexing.

4. **Computed Variables**:
   - **Marriage Timing**: Calculate the year and quarter of marriage based on the mother’s birth year and age at marriage.
   - **Age at First Birth**: Compute the mother’s and father’s ages at the birth of their first child using the quarter age of the oldest child.
   - **Unmarried Birth Indicator**: Create a binary variable indicating if the birth of the first child occurs before marriage.

5. **Race/Ethnicity Indicators**: Create binary indicators for race/ethnicity categories, with `mom_other_race` defined as neither Black, Hispanic, nor White.

6. **Father's Birth Year**: Compute the father's birth year based on his age and adjust for the quarter of birth.

7. **Child Gender Indicators**: Create binary variables to indicate the sex of the first and second born, as well as whether both children are the same sex.

8. **Labor Force Participation**: Create binary indicators for whether the mother or father works based on weeks worked.

9. **CPI Adjustment**: Adjust the income variables to 1995 dollars using a CPI factor of 1.85.

10. **Constant Term**: Add a constant column to the dataset for regression purposes.


In [2]:
# Rename variables and select relevant columns
column_names = {
    "weeksm": "mom_weeks_worked",
    "hoursm": "mom_hours_worked",
    "income1m": "mom_labor_income",
    "income2m": "mom_self_employment_income",
    "weeksd": "dad_weeks_worked",
    "hoursd": "dad_hours_worked",
    "income1d": "dad_labor_income",
    "income2d": "dad_self_employment_income",
    "ageqk": "age_oldest_child_quarters",
    "ageq2nd": "age_second_child_quarters",
    "ageq3rd": "age_third_child_quarters",
    "ageq4th": "age_fourth_child_quarters",
    "ageq5th": "age_fifth_child_quarters",
    "agem": "mom_age",
    "yobm": "mom_year_birth",
    "qtrbthm": "mom_quarter_birth",
    "qtrbkid": "quarter_birth_oldest_child",
    "racem": "mom_race",
    "sexk": "first_born_sex",
    "sex2nd": "second_born_sex",
    "kidcount": "num_children",
    "agemar": "mom_age_married",
    "qtrmar": "mom_quarter_married",
    "aged": "dad_age",
    "qtrbthd": "dad_quarter_birth",
    "timesmar": "number_of_times_married",
    "marital": "marital_status",
    "faminc": "family_income",
}

df = df.rename(columns=column_names)
df = df[list(column_names.values())]

df["mom_age_married"] = df["mom_age_married"].replace(0, np.nan) # Replace 0 with NaN

# Adjust 'mom_quarter_married' variable to remove the 1-indexing and replace 0 with NaN
df["mom_quarter_married"] = df["mom_quarter_married"].replace(0, np.nan) - 1

# Adjust 'quarter_birth' variables to remove the 1-indexing
df["mom_quarter_birth"] -= 1
df["dad_quarter_birth"] -= 1
df["mom_year_birth"] = 1980 - df["mom_age"]

# Compute 'year_married' based on timing of marriage and birth
df["year_married"] = df["mom_year_birth"] + df["mom_age_married"] + (df["mom_quarter_birth"] > df["mom_quarter_married"]).astype(int)
df["year_quarter_married"] = df["year_married"] + (df["mom_quarter_married"] / 4)
df["mom_age_first_birth"] = df["mom_age"] - (df["age_oldest_child_quarters"] / 4)
df["year_oldest_child_birth"] = df["mom_year_birth"] + df["mom_age_first_birth"]
df["quarter_birth_oldest_child"] -= 1  # Adjust for indexing
df["year_quarter_birth"] = df["year_oldest_child_birth"] + (df["quarter_birth_oldest_child"] / 4)
df["unmarried_birth"] =  (df["year_quarter_married"] > df["year_quarter_birth"]).astype(int)

# Race indicators
df["mom_black"] = (df["mom_race"] == 2).astype(int)
df["mom_hispanic"] = (df["mom_race"] == 12).astype(int)
df["mom_white"] = (df["mom_race"] == 1).astype(int)
df["mom_other_race"] = 1 - df["mom_black"] - df["mom_hispanic"] - df["mom_white"]

# Compute father's year of birth and age at first birth
df["dad_year_birth"] = 79 - df["dad_age"]
df.loc[df["dad_quarter_birth"] == 0, "dad_year_birth"] = 80 - df["dad_age"]
df["dad_age_quarters"] = (4 * (80 - df["dad_year_birth"]) - df["dad_quarter_birth"])
df["dad_age_first_birth"] = (df["dad_age_quarters"] - df["age_oldest_child_quarters"]) // 4

# Children data indicators
df["first_born_boy"] = (df["first_born_sex"] == 0).astype(int)
df["second_born_boy"] = (df["second_born_sex"] == 0).astype(int)
df["both_boys"] = ((df["first_born_sex"] == 0) & (df["second_born_sex"] == 0)).astype(int)
df["both_girls"] = ((df["first_born_sex"] == 1) & (df["second_born_sex"] == 1)).astype(int)
df["same_sex"] = (df["both_boys"] | df["both_girls"]).astype(int)
df["more_than_two_children"] = (df["num_children"] > 2).astype(int)

# In labor force indicators
df["mom_worked_indicator"] = (df["mom_weeks_worked"] > 0).astype(int)
df["dad_worked_indicator"] = (df["dad_weeks_worked"] > 0).astype(int)

cpi_adjustment =  1.85 # CPI adjustment factor from 1980 to 1995
df["mom_labor_income"]*= cpi_adjustment
df["dad_labor_income"] *= cpi_adjustment

df["constant"] = 1

### Restricting Samples

1. **Main Sample**: Filter the dataset to include only mothers aged 21 to 35, with at least two children, whose second child is older than 1 year (age in quarters > 4), and who gave birth to their first child at age 15 or older. The age restrictions allows for mothers with all children in their household to not be highly selected.

2. **Married Sample**: Further filter the dataset to include only married couples where the father’s age is available, the parents have been married once, have no children born out of wedlock, and both parents were at least 15 years old at the birth of their first child.


In [3]:
df = df[(
    (df["mom_age"] >= 21)
    & (df["mom_age"] <= 35)
    & (df["num_children"] >= 2)
    & (df["age_second_child_quarters"] > 4)
    & (df["mom_age_first_birth"] >= 15)
)]

df_married = df[(
    df["dad_age"].notna()
    & (df["number_of_times_married"] == 1)
    & (df["marital_status"] == 0)
    & (df["unmarried_birth"] == 0)
    & (df["dad_age_first_birth"] >= 15)
    & (df["mom_age_first_birth"] >= 15)
)]

### Define Variables for IV

In [4]:
mom_outcomes = [
    "mom_weeks_worked",
    "mom_hours_worked",
    "mom_labor_income",
    "mom_worked_indicator",
]
dad_outcomes = [
    "dad_weeks_worked",
    "dad_hours_worked",
    "dad_labor_income",
    "dad_worked_indicator",
]
covariates = [
    "mom_age",
    "mom_age_first_birth",
    "mom_black",
    "mom_hispanic",
    "mom_other_race",
    "first_born_boy",
    "second_born_boy",
    "constant",
]

instrument = "same_sex"
instrumented = "more_than_two_children"
outcome_labels = ["Weeks Worked per Year", "Hours Worked per Week", "Labor Income", "Worked for Pay"]

### Run 2SLS Regressions and Display Results

In [5]:
# Run regressions for all women
results_all_women = run_iv_regressions(df, mom_outcomes, covariates, instrumented, instrument)
# Run regressions for married women
results_married_women = run_iv_regressions(df_married, mom_outcomes, covariates, instrumented, instrument)
# Run regressions for husbands (fathers)
results_husbands = run_iv_regressions(df_married, dad_outcomes, covariates, instrumented, instrument)

summary_table = pd.DataFrame({
    'Outcome':outcome_labels,
    'All Women': results_all_women,
    'Married Women': results_married_women,
    'Husbands':results_husbands
})

df_to_table(summary_table) # Display table

Outcome,All Women,Married Women,Husbands
Weeks Worked per Year,-5.50*** (1.12),-4.66*** (1.26),0.51 (0.61)
Hours Worked per Week,-4.56*** (0.95),-4.15*** (1.05),0.58 (0.72)
Labor Income,-1711.45*** (467.77),-1184.04* (504.42),-629.94 (1212.21)
Worked for Pay,-0.12*** (0.03),-0.09** (0.03),0.01 (0.01)


### Results

These results show that having three or more children significantly reduces labor supply for women, but has no significant effect for men. For the sample of all women, the number of weeks worked per year decreases by 5.50 weeks, hours worked per week drops by 4.56 hours, and labor income declines by $1,711.45. Married women experience similar reductions. In contrast, husbands show no significant changes in labor supply across any of the outcomes.