# Impact of 401(k) on Financial Assets

## Data explanation
1991 Survey of Income and Program Participation

- **net_tfa** — *Net Total Financial Assets*.
  Calculated as the sum of all liquid and interest-earning assets (IRA balances, 401(k) balances, checking accounts, U.S. savings bonds, other interest‐earning accounts, stocks, mutual funds, etc.) **minus** non‐mortgage debts.

- **e401** — *401(k) Eligibility Indicator*.
  Equals 1 if the individual's employer offers a 401(k) plan; otherwise 0.

- **p401** — *401(k) Participation Indicator*.
  Equals 1 if the individual participate in 401(k) plan; otherwise 0.

- **age** — *Age*.
  Age of the individual in years.

- **inc** — *Annual Income*.
  Annual income of the individual, measured in U.S. dollars for the year 1990.

- **educ** — *Years of Education*.
  Number of completed years of formal education.

- **fsize** — *Family Size*.
  Total number of persons living in the household.

- **marr** — *Marital Status*.
  Equals 1 if the individual is married; otherwise 0.

- **twoearn** — *Two-Earner Household*.
  Equals 1 if there are two wage earners in the household; otherwise 0.

- **db** — *Defined-Benefit Pension Plan*.
  Equals 1 if the individual is covered by a defined-benefit pension plan; otherwise 0.

- **pira** — *IRA Participation*.
  Equals 1 if the individual contributes to an Individual Retirement Account (IRA); otherwise 0.

- **hown** — *Home Ownership*.
  Equals 1 if the household owns its home; 0 if renting.

We download it with fetch_401K function from doubleML.datasets

This dataset has a problem. confounders were measured in the same year as treatment and outcome,
 like they are demographic factors that do not change over time. So inference might still be biased.

In [1]:
from doubleml.datasets import fetch_401K
df = fetch_401K(return_type='DataFrame')

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# don't know what they mean
df = df.drop(columns=['tw', 'nifa'])
# eligibility for 401k, drop because it's an instrument
df = df.drop(columns=['e401'])

df.columns

Index(['net_tfa', 'age', 'inc', 'fsize', 'educ', 'db', 'marr', 'twoearn',
       'p401', 'pira', 'hown'],
      dtype='object')

In [3]:
from causalis.data_contracts import CausalData
causal_data = CausalData(df=df, treatment='p401',
                         outcome='net_tfa',
                         confounders=['age', 'inc', 'fsize', 'educ', 'db', 'marr', 'twoearn', 'pira', 'hown'])

## EDA

In [5]:
causal_data

CausalData(df=(9915, 11), treatment='p401', outcome='net_tfa', confounders=['age', 'inc', 'fsize', 'educ', 'db', 'marr', 'twoearn', 'pira', 'hown'])

In [None]:
# 1) Outcome shared by treatment
from causalis.statistics.functions import outcome_stats
outcome_stats(causal_data)

Participants’ average net worth is ≈28k higher, but this gap cannot be causally attributed
solely to participation. The classes are imbalanced—only 26% are treated.

In [None]:
eda.outcome_hist()

In [None]:
eda.outcome_boxplot()

Our outcome has large right tail

In [None]:
# Shows means of confounders for control/treated groups, absolute differences, and SMD values
confounders_balance_df = eda.confounders_means()
display(confounders_balance_df)

Treatment and control are unbalanced on all confounders except age and fsize; nonetheless,
we retain age and fsize in the model to gain efficiency

In [None]:
# Propensity model fit
ps_model = eda.fit_propensity()

# ROC AUC - shows how predictable treatment is from confounders
roc_auc_score = ps_model.roc_auc
print("ROC AUC from PropensityModel:", round(roc_auc_score, 4))

In [None]:
# Positivity check - assess overlap between treatment groups
positivity_result = ps_model.positivity_check()
print("Positivity check from PropensityModel:", positivity_result)

In [None]:
# SHAP values - feature importance for treatment assignment from confounders
shap_values_df = ps_model.shap
display(shap_values_df)

In [None]:
# Propensity score overlap graph
ps_model.plot_m_overlap()

In [None]:
# Outcome model fit
outcome_model = eda.outcome_fit()

# RMSE and MAE of regression model
print(outcome_model.scores)

In [None]:
# 2) SHAP values - feature importance for outcome prediction from confounders
shap_outcome_df = outcome_model.shap
display(shap_outcome_df)

## Inference

In [None]:
from causalis.scenarios.unconfoundedness.ate import dml_ate

# Estimate Average Treatment Effect (ATE)
ate_result = dml_ate(causal_data, n_folds=4, alpha=0.05)

In [None]:
print(ate_result.get('coefficient'))
print(ate_result.get('p_value'))
print(ate_result.get('confidence_interval'))

Average Treatment Effect is significant and equals 11385 dollars in CI bounds (8674, 14096)

## Refutation

### Overlap validation

In [None]:
from causalis.scenarios.unconfoundedness.refutation import *
rep = run_overlap_diagnostics(res=ate_result)
rep["summary"]

We find no evidence of a violation of the overlap (positivity) assumption.

### Score validation

In [None]:
from causalis.scenarios.unconfoundedness.refutation.score.score_validation import run_score_diagnostics
rep_score = run_score_diagnostics(res=ate_result)
rep_score["summary"]

We see that psi_p99_over_med and psi_kurtosis are RED. That's because large tail in outcome. We find no evidence of anomaly score behavior

### SUTVA

In [None]:
print_sutva_questions()

1.) Yes\
2.) Yes\
3.) No. We have problems with design\
4.) Yes\
In conclusion confounders are measured not before treatment. So treatment affected confounders

### Uncofoundedness

In [None]:
from causalis.scenarios.unconfoundedness.refutation.uncofoundedness.uncofoundedness_validation import run_uncofoundedness_diagnostics

rep_uc = run_uncofoundedness_diagnostics(res=ate_result)
rep_uc['summary']

We see w_tail_ratio_treated and ess_treated_ratio are RED. It's ok. These tests are unstable due to small sample

In [None]:
from causalis.scenarios.unconfoundedness.refutation.uncofoundedness.uncofoundedness_validation import (
    sensitivity_analysis, sensitivity_benchmark
)

sensitivity_analysis(ate_result, cf_y=0.01, cf_d=0.01, rho=1.0, level=0.95, use_signed_rr=True)

Even when we have unobserved confounder with these parameters our CI bounds > 0

In [None]:
sensitivity_benchmark(ate_result, benchmarking_set =['inc'])

And even when unobserved confounder is strong as 'inc' - income our estimate has CI bounds > 0

## Conclution

There are problems with design: confounders are measured not before treatment. So treatment affected confounders.
However estimate is robust and in real life participation in 401k is increasing net financial assets. To keep in mind real CI bounds may differ from our estimation