# Reproducibility Checklist

This notebook installs dependencies and re-runs the full ESG pipeline using the provided scripts. Use it to regenerate data, analysis outputs, and figures in a clean environment.


## Prerequisites
- Python 3.8+ and network access for Kaggle/Yahoo/FRED downloads.
- Kaggle API credentials in `~/.kaggle/kaggle.json` **or** set `KAGGLE_USERNAME` / `KAGGLE_KEY` env vars.
- Optional: `FRED_API_KEY` in `.env` for automated risk-free retrieval (manual fallback available).
- Run from the `notebooks/` directory so relative paths to `../scripts` and `../data` work.


In [17]:
# Install project requirements (idempotent)
!python3 -m pip install -r ../requirements.txt




In [18]:
# Optional: verify Kaggle credentials are available
!ls -l ~/.kaggle/kaggle.json || echo "kaggle.json not found; ensure env vars are set"


ls: /Users/ayush/.kaggle/kaggle.json: No such file or directory
kaggle.json not found; ensure env vars are set


## Run pipeline scripts
Order: download ‚Üí process ‚Üí feature engineering ‚Üí analysis ‚Üí figures.


In [19]:
# 1) Download raw data (Kaggle, Yahoo Finance, FRED)
!python ../scripts/download_data.py


[INFO] Loaded environment variables from /Users/ayush/Desktop/DATA 512/esg-stock-performace-impact/notebooks/../.env

ESG STOCK PERFORMANCE ANALYSIS - DATA DOWNLOAD


### STEP 1/4: Kaggle Dataset ###
Downloading S&P 500 ESG and Stocks Data from Kaggle

[INFO] Using Kaggle credentials from environment variables

üì• Downloading dataset: rikinzala/s-and-p-500-esg-and-stocks-data-2023-24
üìÅ Saving to: data/raw

Dataset URL: https://www.kaggle.com/datasets/rikinzala/s-and-p-500-esg-and-stocks-data-2023-24
License(s): GPL-3.0
Downloading s-and-p-500-esg-and-stocks-data-2023-24.zip to data/raw


[OK] Found: sp500_esg_data.csv (0.05 MB)
[OK] Found: sp500_price_data.csv (3.03 MB)

[OK] Kaggle dataset downloaded successfully!


### STEP 2/4: FRED Treasury Rate ###
Downloading U.S. 3-Month Treasury Rate from FRED

[FETCHING] Using FRED API to fetch data...
[DATE] Date range: 2023-09-01 to 2024-08-31

[OK] Data saved to: data/raw/DGS3MO.csv
[INFO] Records downloaded: 261
[STATS] Date range: 20

In [20]:
# 2) Process data (clean ESG, prices, returns; align dates; risk-free)
!python ../scripts/process_data.py



ESG STOCK PERFORMANCE ANALYSIS - DATA PROCESSING


### STEP 1/6: Clean ESG Data ###
Cleaning ESG Data

[LOADING] Loading data from: data/raw/sp500_esg_data.csv
[OK] Loaded 426 records

Columns: ['Symbol', 'Full Name', 'GICS Sector', 'GICS Sub-Industry', 'environmentScore', 'socialScore', 'governanceScore', 'totalEsg', 'highestControversy', 'percentile', 'ratingYear', 'ratingMonth', 'marketCap', 'beta', 'overallRisk']

[OK] Using 'Symbol' as ticker column

[PROCESSING] Standardizing ticker symbols...
[PROCESSING] Removing duplicate tickers...

[INFO] Identified ESG columns: ['environmentScore', 'socialScore', 'governanceScore']

[CHECKING] Checking for missing values...
	environmentScore: 0 (0.0%)
	socialScore: 0 (0.0%)
	governanceScore: 0 (0.0%)

[STATS] Rows with any missing ESG scores: 0 (0.0%)
	Strategy: Dropping rows with missing values (< 5% threshold)

[CHECKING] Validating ESG score ranges...
	environmentScore: 0.00 - 24.98
	socialScore: 0.76 - 22.48
	governanceScore: 2.96 - 19

In [21]:
# 3) Feature engineering (performance & risk metrics, controls)
!python ../scripts/run_feature_engineering.py



ESG STOCK PERFORMANCE ANALYSIS - FEATURE ENGINEERING

### Loading Master Dataset ###
[OK] Loaded 106500 records from data/final/master_dataset.csv
	Tickers: 426
	Date range: 2023-09-05 00:00:00+00:00 to 2024-08-30 00:00:00+00:00


### STEP 1/4: Performance Metrics ###
Calculating Performance Metrics

[INFO] Input data:
	Total records: 106500
	Tickers: 426

[PROCESSING] Calculating metrics for each ticker...

[OK] Calculated metrics for 426 tickers

[STATS] Performance Metrics Summary:

Sharpe Ratio:
	Mean: 0.5715
	Median: 0.5999
	Std: 0.8359
	Min: -1.8942
	Max: 2.8160

Annualized Excess Return:
	Mean: 13.06%
	Median: 14.49%
	Min: -87.70%
	Max: 98.04%

Cumulative Return:
	Mean: 18.54%
	Median: 17.28%
	Min: -60.52%
	Max: 146.08%

[INFO] Sample results:
  Ticker  Sharpe_Ratio  ...  Cumulative_Return  Trading_Days
0      A      0.526943  ...           0.172340           250
1    AAL     -0.739090  ...          -0.276567           250
2   AAPL      0.726335  ...           0.208698         

In [22]:
# 4) Analysis (OLS regressions for RQ1-3)
!python ../scripts/run_analysis.py



ESG STOCK PERFORMANCE ANALYSIS - STATISTICAL ANALYSIS

### Loading Analysis Dataset ###
[OK] Loaded 426 companies from data/final/analysis_dataset.csv

Columns: 33
Sample: ['Ticker', 'environmentScore', 'socialScore', 'governanceScore', 'totalEsg', 'ratingYear', 'ratingMonth', 'Trading_Days', 'Mean_Daily_Excess_Return', 'Annualized_Excess_Return']...


RESEARCH QUESTION 1
RQ1: ESG Score ‚Üí Sharpe Ratio

[INFO] Model specification:
	DV: Sharpe_Ratio
	IV: totalEsg
	Controls: Log_Market_Cap + 11 sector dummies
	Sample size: 419 (dropped 7 due to missing values)

  return np.sqrt(eigvals[0]/eigvals[-1])
                            OLS Regression Results                            
Dep. Variable:           Sharpe_Ratio   R-squared:                       0.199
Model:                            OLS   Adj. R-squared:                  0.176
Method:                 Least Squares   F-statistic:                     8.417
Date:                Mon, 08 Dec 2025   Prob (F-statistic):           2.78e

In [23]:
# 5) Figures (diagnostic and result plots)
!python ../scripts/create_diagnostic_plots.py



REGRESSION DIAGNOSTICS VISUALIZATION

### Loading Data ###
Loaded 426 companies

RQ1: ESG Score ‚Üí Sharpe Ratio

[INFO] Creating diagnostic plots for RQ1 Sharpe Ratio...
   Saved to: outputs/figures/diagnostics/rq1_sharpe_ratio_diagnostics.png
  return 1 - self.ssr/self.centered_tss

[INFO] Creating VIF plot for RQ1 Sharpe Ratio...
   Saved to: outputs/figures/diagnostics/rq1_sharpe_ratio_vif.png

RQ2: ESG Score ‚Üí Volatility

[INFO] Creating diagnostic plots for RQ2 Volatility...
   Saved to: outputs/figures/diagnostics/rq2_volatility_diagnostics.png
  return 1 - self.ssr/self.centered_tss

[INFO] Creating VIF plot for RQ2 Volatility...
   Saved to: outputs/figures/diagnostics/rq2_volatility_vif.png

DIAGNOSTIC PLOTS COMPLETE

Plots saved to: outputs/figures/diagnostics

Generated files:
  - rq1_sharpe_ratio_diagnostics.png (4-panel diagnostic)
  - rq1_sharpe_ratio_vif.png (multicollinearity)
  - rq2_volatility_diagnostics.png (4-panel diagnostic)
  - rq2_volatility_vif.png (multic

## Verify key artifacts
- Final analysis dataset: `../data/final/analysis_dataset.csv`
- Tables: `../outputs/tables/`
- Figures: `../outputs/figures/`


In [24]:
import pandas as pd
from pathlib import Path

analysis_path = Path("../data/final/analysis_dataset.csv")
if analysis_path.exists():
    df = pd.read_csv(analysis_path)
    display(df.head())
    print(f"Rows: {len(df)}, Columns: {len(df.columns)}")
else:
    print("analysis_dataset.csv not found. Re-run scripts above and verify Kaggle/FRED access.")


Unnamed: 0,Ticker,environmentScore,socialScore,governanceScore,totalEsg,ratingYear,ratingMonth,Trading_Days,Mean_Daily_Excess_Return,Annualized_Excess_Return,...,Sector_Consumer Cyclical,Sector_Consumer Defensive,Sector_Energy,Sector_Financial Services,Sector_Healthcare,Sector_Industrials,Sector_Real Estate,Sector_Technology,Sector_Unknown,Sector_Utilities
0,A,1.12,6.42,6.1,13.64,2023.0,9.0,250,0.000575,0.144821,...,0,0,0,0,1,0,0,0,0,0
1,AAL,9.94,11.65,4.76,26.35,2023.0,9.0,250,-0.001182,-0.297876,...,0,0,0,0,0,1,0,0,0,0
2,AAPL,0.46,7.39,9.37,17.22,2023.0,9.0,250,0.000646,0.162807,...,0,0,0,0,0,0,0,1,0,0
3,ABBV,2.38,17.19,10.36,29.93,2023.0,9.0,250,0.000986,0.248576,...,0,0,0,0,1,0,0,0,0,0
4,ABT,2.27,14.24,8.33,24.83,2023.0,9.0,250,0.000245,0.061825,...,0,0,0,0,1,0,0,0,0,0


Rows: 426, Columns: 33


## RQ Interpretations (quick reference)
- **RQ1 ‚Äì Returns (Sharpe):** ESG coefficient -0.0043 (p=0.535, R¬≤‚âà0.20) ‚Üí no evidence that higher ESG improves risk-adjusted returns.
- **RQ2 ‚Äì Volatility:** ESG coefficient +0.0016 (p=0.018, R¬≤‚âà0.15) ‚Üí higher ESG associates with slightly higher volatility; contrary to de-risking claims.
- **RQ3 ‚Äì Pillars:** Governance drives the volatility effect (+0.0063, p=0.013); Environment/Social not significant for returns or volatility.
- **Implication:** ESG scores alone do not deliver near-term risk-adjusted outperformance in this window and may coincide with marginally higher risk; governance is the only pillar with a detectable volatility link.
