# ESG Stock Performance Impact Analysis
## Evaluating the Relationship Between ESG Performance and Stock Market Returns

**Course:** DATA 512 — Human-Centered Data Science  
**Author:** Ayush Mall  
**Date:** December 2024  
**Repository:** `esg-stock-performace-impact`

This notebook contains the full written report and key code cells required for the final deliverable. It integrates narrative, data provenance, methods, findings, and reproducibility steps.


## Abstract
This study evaluates whether stronger Environmental, Social, and Governance (ESG) profiles for S&P 500 firms during September 2023–August 2024 are associated with superior risk-adjusted performance and lower risk. Using ESG scores from Sustainalytics (via Kaggle/Yahoo Finance), daily stock prices, and FRED risk-free rates, we construct a firm-level analysis dataset of 419 companies and estimate cross-sectional OLS models of Sharpe ratio and volatility, controlling for firm size and sector effects.

**Key Findings**
- **RQ1 (Risk-Adjusted Returns):** No statistically significant relationship between ESG scores and Sharpe ratios (coef = -0.0043, p = 0.535, R² = 0.204).
- **RQ2 (Volatility):** Higher ESG scores are associated with slightly **higher** volatility (coef = +0.0016, p = 0.018, R² = 0.149), contrary to de-risking claims.
- **RQ3 (Pillars):** Governance is the only significant pillar driving the volatility relationship (+0.0063, p = 0.013); Environmental and Social pillars show no significant effects.

ESG scores alone do not deliver near-term risk-adjusted performance premia in this window and may correlate with marginally higher risk, underscoring the need for deeper factor controls, longer horizons, and transparent evaluation of ESG investment claims.


## 1. Introduction
### 1.1 Motivation
ESG investing is expected to exceed $53T by 2025 (Bloomberg Intelligence, 2021). Asset managers often market ESG as delivering both impact and superior returns, yet empirical evidence is mixed and sometimes opaque. This project provides a transparent, reproducible test of those claims for S&P 500 firms.

### 1.2 Research Objectives
1. Test whether higher ESG scores relate to superior risk-adjusted returns.  
2. Test whether higher ESG scores reduce stock return volatility.  
3. Identify which ESG pillar (E, S, G) drives any observed effects.

### 1.3 Scope
- Universe: S&P 500 constituents (as of Sept 2023)
- Window: Sept 2023 – Aug 2024 (~12 months)
- Design: Cross-sectional firm-level regressions

### 1.4 Why This Matters
- **Investors:** Need evidence on risk/return claims.  
- **Companies:** Need realistic expectations about ESG impacts on stock performance.  
- **Researchers/Policymakers:** Benefit from transparent, reproducible pipelines and documented limitations.


## 2. Background & Related Work
- **Meta-analyses:** NYU Stern & Rockefeller (2021) reviewed 1,000+ studies; ~58% positive, 13% negative, 29% mixed/neutral ESG–performance findings.
- **Rating disagreement:** CFA Institute highlights large divergences across ESG vendors (e.g., Sustainalytics, MSCI, Refinitiv).
- **Data quality:** ESG scores rely on self-reported corporate data; methodology differences drive variance.
- **Contribution here:** Recent window (2023–24), full reproducibility, risk-adjusted focus (Sharpe), and pillar decomposition (E, S, G).


## 3. Research Questions
1. **RQ1:** Do companies with higher ESG scores earn higher risk-adjusted returns over the following 12 months?  
2. **RQ2:** Do higher ESG scores reduce stock return volatility?  
3. **RQ3:** Which ESG pillar (E, S, G) drives risk-adjusted returns and volatility the most?


## 4. Data & Licensing
- **S&P 500 ESG & prices (2023–24):** Kaggle dataset by R. Zala — GPL-3; underlying prices/metadata from Yahoo Finance via `yfinance`, for academic/research use. ESG scores from Sustainalytics as exposed in `yfinance` (`data/raw/sp500_esg_data.csv`, `sp500_price_data.csv`).
- **Risk-free rate:** FRED DGS3MO — Public Domain (`data/raw/DGS3MO.csv`).
- **Market & metadata:** Yahoo Finance via `yfinance` — academic/research use (`data/raw/sp500_index.csv`, `company_info.csv`).
- **Docs:** See `data/README.md` for full provenance, transformations, and licensing.
- **Final analysis dataset:** `data/final/analysis_dataset.csv` (419 firms, 33 variables) used for all regressions.


In [1]:
# Quick data check
import pandas as pd
from pathlib import Path

analysis_path = Path('../data/final/analysis_dataset.csv')
if analysis_path.exists():
    df = pd.read_csv(analysis_path)
    display(df.head())
    print(f"Rows: {len(df)}, Columns: {len(df.columns)}")
else:
    print("analysis_dataset.csv not found. Please run the pipeline or place the file in data/final/.")


Unnamed: 0,Ticker,environmentScore,socialScore,governanceScore,totalEsg,ratingYear,ratingMonth,Trading_Days,Mean_Daily_Excess_Return,Annualized_Excess_Return,...,Sector_Consumer Cyclical,Sector_Consumer Defensive,Sector_Energy,Sector_Financial Services,Sector_Healthcare,Sector_Industrials,Sector_Real Estate,Sector_Technology,Sector_Unknown,Sector_Utilities
0,A,1.12,6.42,6.1,13.64,2023.0,9.0,250,0.000575,0.144821,...,0,0,0,0,1,0,0,0,0,0
1,AAL,9.94,11.65,4.76,26.35,2023.0,9.0,250,-0.001182,-0.297876,...,0,0,0,0,0,1,0,0,0,0
2,AAPL,0.46,7.39,9.37,17.22,2023.0,9.0,250,0.000646,0.162807,...,0,0,0,0,0,0,0,1,0,0
3,ABBV,2.38,17.19,10.36,29.93,2023.0,9.0,250,0.000986,0.248576,...,0,0,0,0,1,0,0,0,0,0
4,ABT,2.27,14.24,8.33,24.83,2023.0,9.0,250,0.000245,0.061825,...,0,0,0,0,1,0,0,0,0,0


Rows: 426, Columns: 33


## 5. Methodology
### 5.1 Design
Cross-sectional OLS regressions at the firm level (one observation per company) using ESG scores measured at Sept 2023 to explain realized outcomes over Sept 2023–Aug 2024.

### 5.2 Models
- **Model 1 (RQ1):** `Sharpe_Ratio ~ totalEsg + Log_Market_Cap + Sector dummies`
- **Model 2 (RQ2):** `Volatility ~ totalEsg + Log_Market_Cap + Sector dummies`
- **Model 3a (RQ3):** `Sharpe_Ratio ~ environmentScore + socialScore + governanceScore + controls`
- **Model 3b (RQ3):** `Volatility ~ environmentScore + socialScore + governanceScore + controls`

### 5.3 Controls
- **Size:** Log market cap
- **Industry:** 11 sector dummies

### 5.4 Diagnostics
- Heteroskedasticity: Breusch-Pagan (HC3 robust SE if needed)
- Multicollinearity: VIF
- Residual normality: Jarque-Bera (not binding with n>400)

### 5.5 Rationale
- Standard cross-sectional design in finance for factor effects
- Risk-adjusted focus via Sharpe ratio; volatility for risk
- Transparent, reproducible pipeline (scripts + notebooks)


## 6. Findings (Summary)
- **RQ1 (Sharpe ratio):** ESG coef = -0.0043, p = 0.535, R² = 0.204 → **no evidence** higher ESG improves risk-adjusted returns.
- **RQ2 (Volatility):** ESG coef = +0.0016, p = 0.018, R² = 0.149 → higher ESG associates with **slightly higher volatility**.
- **RQ3 (Pillars):** Governance drives volatility (+0.0063, p = 0.013); E and S not significant; no pillar significant for Sharpe.
- **Economic magnitude:** Effects are small; governance-driven volatility increase is modest but statistically robust.

Key tables (already generated in `outputs/tables/`):
- `analysis_summary.txt`
- `rq1_results.txt`, `rq2_results.txt`
- `rq3_sharpe_results.txt`, `rq3_volatility_results.txt`

Key figures (in `outputs/figures/`):
- `esg_vs_sharpe.png`, `esg_vs_volatility.png`
- `correlation_heatmap.png`, `esg_distribution.png`, `esg_by_sector.png`
- Flowcharts: `data_processing_pipeline.png`, `data_pipeline_visualization.png`


In [2]:
# Display regression summaries from outputs/tables
from pathlib import Path

def print_file(path):
    p = Path(path)
    if p.exists():
        print(f"\n=== {p.name} ===")
        print(p.read_text())
    else:
        print(f"Missing: {p}")

for f in [
    '../outputs/tables/analysis_summary.txt',
    '../outputs/tables/rq1_results.txt',
    '../outputs/tables/rq2_results.txt',
    '../outputs/tables/rq3_sharpe_results.txt',
    '../outputs/tables/rq3_volatility_results.txt'
]:
    print_file(f)



=== analysis_summary.txt ===
ESG STOCK PERFORMANCE ANALYSIS - SUMMARY OF FINDINGS

RQ1: Do companies with higher ESG scores earn higher risk-adjusted returns?
  ESG Coefficient: -0.004308
  P-value: 0.5350
  Significant at 5%: NO
  R-squared: 0.2038

RQ2: Do higher ESG scores reduce stock return volatility?
  ESG Coefficient: 0.001592
  P-value: 0.0185
  Significant at 5%: YES
  R-squared: 0.1485

RQ3: Which ESG pillar drives risk-adjusted returns and volatility?
  Sharpe Ratio - Dominant: G
  Volatility - Dominant: G


=== rq1_results.txt ===
                            OLS Regression Results                            
Dep. Variable:           Sharpe_Ratio   R-squared:                       0.204
Model:                            OLS   Adj. R-squared:                  0.180
Method:                 Least Squares   F-statistic:                     8.659
Date:                Sun, 30 Nov 2025   Prob (F-statistic):           9.73e-15
Time:                        15:50:37   Log-Likelihood

## 7. Discussion
- **Interpretation:** ESG scores do not deliver near-term risk-adjusted return premia and may coincide with marginally higher volatility, potentially reflecting sector composition or transition/governance transparency effects.
- **Implications for investors:** Use ESG for values/impact, not expected outperformance or de-risking; combine ESG with traditional factors; governance merits attention.
- **Implications for companies:** ESG initiatives may not translate to short-term stock benefits; governance quality matters for risk management; communicate costs/benefits transparently.
- **Research implications:** Extend horizons, add factor controls, compare ESG providers, explore non-linear/sector-specific effects.
- **Policy angle:** ESG ratings vary; governance transparency may increase short-term repricing; encourage methodological transparency from rating providers.


## 8. Limitations
- Single 12-month window; results may differ in bear markets or longer horizons.
- S&P 500 only; survivorship bias (constituents as of Sept 2023).
- Single ESG provider (Sustainalytics via Yahoo); rating dispersion unobserved.
- Low explained variance (R² ~15–20%); ESG and controls explain modest variation.
- Cross-sectional; associations only, no causal claims; no dynamic/panel effects.
- Volatility models show non-normal residuals (robust SE mitigate, but do not remove distributional issues).



## 9. Conclusion
In this S&P 500 sample, higher ESG scores did **not** improve risk-adjusted returns and were associated with slightly higher volatility, with governance as the only pillar showing a significant relationship (higher governance → modestly higher volatility). ESG appears financially neutral-to-slightly-risk-adding over this short window; investors should combine ESG with broader risk controls and test across periods. Future work should extend horizons, compare ESG providers, and incorporate richer factor controls.


## 10. Reproducibility
- One-click rerun: `notebooks/00_reproducibility.ipynb` (installs requirements, runs pipeline scripts, previews `analysis_dataset.csv`).
- Pipeline scripts: `scripts/download_data.py`, `scripts/process_data.py`, `scripts/run_feature_engineering.py`, `scripts/run_analysis.py`, `scripts/create_diagnostic_plots.py`.
- Data: Sample/analysis dataset at `data/final/analysis_dataset.csv` (main file); larger `master_dataset.csv` also available.
- Figures: All charts in `outputs/figures/` (PNG).
- Tables: Regression outputs in `outputs/tables/` (TXT/CSV).

**Quick start**
```bash
python scripts/download_data.py
python scripts/process_data.py
python scripts/run_feature_engineering.py
python scripts/run_analysis.py
python scripts/create_diagnostic_plots.py
```



## 11. References
- Zala, R. (2024). S&P 500 ESG and Stocks Data (2023-24). Kaggle. https://www.kaggle.com/datasets/rikinzala/s-and-p-500-esg-and-stocks-data-2023-24
- Federal Reserve Bank of St. Louis. DGS3MO. FRED. https://fred.stlouisfed.org/series/DGS3MO
- NYU Stern Center for Sustainable Business & Rockefeller Asset Management (2021). ESG and Financial Performance. https://www.stern.nyu.edu/sites/default/files/assets/documents/NYU-RAM_ESG-Paper_2021%20Rev_0.pdf
- CFA Institute. The Role and Rise of ESG Ratings. https://www.cfainstitute.org/insights/articles/the-role-and-rise-of-esg-ratings
- Bloomberg Intelligence (2021). ESG Assets May Hit $53 Trillion by 2025. https://www.bloomberg.com/professional/blog/esg-assets-may-hit-53-trillion-by-2025-a-third-of-global-aum/
- Sustainalytics (via Yahoo Finance). ESG Risk Ratings Methodology. https://www.sustainalytics.com/esg-ratings
- Yahoo Finance Terms of Service. https://policies.yahoo.com/us/en/yahoo/terms/products/finance/index.htm
