# Macroeconomic Predictor

This notebook will explore the effects of macroecnonomic factors on the stock market. For my analysis, I intend to specifically focus on the impacts on the S&P 500 index as that broadly generalizes the economic state of the top United States companies.

In [1]:
import matplotlib.pyplot as plt
from alpaca.data import StockHistoricalDataClient
from alpaca.data.requests import StockBarsRequest
from alpaca.data.timeframe import TimeFrame, TimeFrameUnit
from datetime import datetime

from fredapi import Fred

import yfinance as yf

import pandas as pd
import numpy as np



## Task 1: Find S&P 500 Data

Given my initial research, I believe that Alpaca's python API should be able to give us all of the historical S&P 500 data we will need for this task.

In [2]:
client = StockHistoricalDataClient("PKG10KMED9P1RLMGZAX1",  "5mWRAjDr5hlTOFkfQnpA6tWvfM8e4wdpKjTuWtuS")

request_params = StockBarsRequest(
                        symbol_or_symbols="SPY",
                        timeframe=TimeFrame(1, TimeFrameUnit.Month),
                        start=datetime(2000, 1, 1),
                        end=datetime(2025, 3, 31)
                 )

bars = client.get_stock_bars(request_params)

bars_df = bars.df
bars_df

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,trade_count,vwap
symbol,timestamp,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
SPY,2016-01-01 05:00:00+00:00,200.49,201.90,181.0200,193.7208,3.818766e+09,13463245.0,191.091533
SPY,2016-02-01 05:00:00+00:00,192.53,196.68,181.0900,193.3500,2.982148e+09,11420093.0,189.635941
SPY,2016-03-01 05:00:00+00:00,195.01,210.55,194.4542,205.5200,2.415842e+09,8659580.0,201.753616
SPY,2016-04-01 04:00:00+00:00,204.35,210.92,203.0900,206.3308,1.986942e+09,6822081.0,207.013830
SPY,2016-05-01 04:00:00+00:00,206.92,210.69,202.7800,209.8400,1.894023e+09,6744944.0,206.421586
SPY,...,...,...,...,...,...,...,...
SPY,2024-11-01 04:00:00+00:00,571.32,603.35,567.8900,602.5500,9.017136e+08,8153583.0,591.063999
SPY,2024-12-01 05:00:00+00:00,602.97,609.07,580.9100,586.0800,1.059637e+09,8707117.0,597.726175
SPY,2025-01-01 05:00:00+00:00,589.39,610.78,575.3500,601.8200,9.966060e+08,9472819.0,594.647469
SPY,2025-02-01 05:00:00+00:00,592.67,613.23,582.4400,594.1800,8.703271e+08,9571250.0,600.010040


Unfortunately, I can tell that Alpaca is not the right dataset to use to get S&P 500 data as it does not go all the way back to the index's inception. Instead, I will try using **fredapi**'s built in S&P 500 tracker to see if that goes all the way to the index's origins.

In [3]:
# TODO: Note how you got this key from fredapi website
fred = Fred(api_key='acfd7b282bd1643ea6596cee19f6d857')

In [4]:
# Get S&P 500 index data (daily closes)
sp500 = fred.get_series('SP500')

# Convert to DataFrame
# TODO: Show the original DF and then try and limit it to closes
sp500 = sp500.to_frame(name='SP500_Close')
sp500.index = pd.to_datetime(sp500.index)
sp500

Unnamed: 0,SP500_Close
2015-07-29,2108.57
2015-07-30,2108.63
2015-07-31,2103.84
2015-08-03,2098.04
2015-08-04,2093.32
...,...
2025-07-22,6309.62
2025-07-23,6358.91
2025-07-24,6363.35
2025-07-25,6388.64


FRED is also not the right place as it doesn't go all the way back to the inception of the S&P 500. I have also found that **yfinance** might have the capabilities/data to support the predictive task.

In [5]:
# S&P 500 Index (not an ETF)
sp500 = yf.download("^GSPC", start="1950-01-03", interval="1d")

# Keep only the close price
# sp500 = sp500[['Close']]
# sp500.rename(columns={'Close': 'SP500_Close'}, inplace=True)
sp500

  sp500 = yf.download("^GSPC", start="1950-01-03", interval="1d")
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,^GSPC,^GSPC,^GSPC,^GSPC,^GSPC
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
1950-01-03,16.660000,16.660000,16.660000,16.660000,1260000
1950-01-04,16.850000,16.850000,16.850000,16.850000,1890000
1950-01-05,16.930000,16.930000,16.930000,16.930000,2550000
1950-01-06,16.980000,16.980000,16.980000,16.980000,2010000
1950-01-09,17.080000,17.080000,17.080000,17.080000,2520000
...,...,...,...,...,...
2025-07-23,6358.910156,6360.640137,6317.490234,6326.899902,5642510000
2025-07-24,6363.350098,6381.310059,6360.569824,6368.600098,5282720000
2025-07-25,6388.640137,6395.819824,6368.529785,6370.009766,4470720000
2025-07-28,6389.770020,6401.069824,6375.790039,6397.689941,4565620000


Given that the "**yfinance**" library does have a ticker that has data all the way back since the S&P 500's inception, I will use this dataset to merge into our macroeconomic factors DataFrame.

In [6]:
# data = fred.get_series('GDP', frequency='q')
# data = data.to_frame(name='GDP')
# data.index = pd.to_datetime(data.index)
# data = data.resample('MS').ffill()
# data

## Task 2: Find Macroeconomic Data

In [7]:
start_date="1947-01-01"
series_dict = {
        "FedFunds": "FEDFUNDS",
        "Treasury10Y": "GS10",
        "M2MoneySupply": "M2SL",
        "InflationExpectation": "T10YIE",
        "ConsumerSentiment": "UMCSENT",
        "NonfarmPayrolls": "PAYEMS",
        "AvgHourlyEarnings": "AHETPI",
        "CapacityUtilization": "TCU",
        "HousingStarts": "HOUST",
        "MedianHousePrice": "MSPUS",
        "NetExports": "NETEXP",
        "DollarIndex": "TWEXB",
        "LeadingIndex": "USSLIND",
        "NationalActivityIndex": "CFNAI"
    }

df_list = []

for name, series_id in series_dict.items():
    print(f"Fetching {name} ({series_id})...")
    try:
        data = fred.get_series(series_id)
        data = data.loc[data.index >= pd.to_datetime(start_date)]
        data = data.to_frame(name=name)
        data.index = pd.to_datetime(data.index)
        data = data.resample('MS').ffill()
        df_list.append(data)
    except Exception as e:
        print(f"Failed to fetch {series_id}: {e}")

fred_df = pd.concat(df_list, axis=1)
fred_df.index = fred_df.index.to_period('M').to_timestamp()
fred_df

Fetching FedFunds (FEDFUNDS)...
Fetching Treasury10Y (GS10)...
Fetching M2MoneySupply (M2SL)...
Fetching InflationExpectation (T10YIE)...
Fetching ConsumerSentiment (UMCSENT)...
Fetching NonfarmPayrolls (PAYEMS)...
Fetching AvgHourlyEarnings (AHETPI)...
Fetching CapacityUtilization (TCU)...
Fetching HousingStarts (HOUST)...
Fetching MedianHousePrice (MSPUS)...
Fetching NetExports (NETEXP)...
Fetching DollarIndex (TWEXB)...
Fetching LeadingIndex (USSLIND)...
Fetching NationalActivityIndex (CFNAI)...


Unnamed: 0,FedFunds,Treasury10Y,M2MoneySupply,InflationExpectation,ConsumerSentiment,NonfarmPayrolls,AvgHourlyEarnings,CapacityUtilization,HousingStarts,MedianHousePrice,NetExports,DollarIndex,LeadingIndex,NationalActivityIndex
1947-01-01,,,,,,43535.0,,,,,10.875,,,
1947-02-01,,,,,,43557.0,,,,,10.875,,,
1947-03-01,,,,,,43607.0,,,,,10.875,,,
1947-04-01,,,,,,43499.0,,,,,11.294,,,
1947-05-01,,,,,,43638.0,,,,,11.294,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-03-01,4.33,4.28,21656.9,2.38,57.0,159275.0,30.97,77.6835,1355.0,423100.0,,,,0.15
2025-04-01,4.33,4.28,21804.5,2.33,52.2,159433.0,31.05,77.6002,1398.0,410800.0,,,,-0.41
2025-05-01,4.33,4.42,21883.6,2.25,52.2,159577.0,31.15,77.4766,1263.0,,,,,-0.16
2025-06-01,4.33,4.38,22020.8,2.34,,159724.0,31.24,77.6357,1321.0,,,,,-0.10


## Task 3: Join the Datasets

In [12]:
# Ensure both indexes are datetime and have same alignment
sp500.index = pd.to_datetime(sp500.index).to_period('M').to_timestamp()
fred_df.index = pd.to_datetime(fred_df.index).to_period('M').to_timestamp()
print(sp500.index[0])
print(fred_df.index)

1950-01-01 00:00:00
DatetimeIndex(['1947-01-01', '1947-02-01', '1947-03-01', '1947-04-01',
               '1947-05-01', '1947-06-01', '1947-07-01', '1947-08-01',
               '1947-09-01', '1947-10-01',
               ...
               '2024-10-01', '2024-11-01', '2024-12-01', '2025-01-01',
               '2025-02-01', '2025-03-01', '2025-04-01', '2025-05-01',
               '2025-06-01', '2025-07-01'],
              dtype='datetime64[ns]', length=943, freq='MS')


In [9]:
combined_df = fred_df.join(sp500, how='inner')
combined_df

MergeError: Not allowed to merge between different levels. (1 levels on the left, 2 on the right)

## Task 4: Modeling

In [None]:
from sklearn.linear_model import LassoCV
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

XGBoostError: 
XGBoost Library (libxgboost.dylib) could not be loaded.
Likely causes:
  * OpenMP runtime is not installed
    - vcomp140.dll or libgomp-1.dll for Windows
    - libomp.dylib for Mac OSX
    - libgomp.so for Linux and other UNIX-like OSes
    Mac OSX users: Run `brew install libomp` to install OpenMP runtime.

  * You are running 32-bit Python on a 64-bit OS

Error message(s): ["dlopen(/Users/nathannakamura/Library/Python/3.9/lib/python/site-packages/xgboost/lib/libxgboost.dylib, 0x0006): Library not loaded: @rpath/libomp.dylib\n  Referenced from: <89AD948E-E564-3266-867D-7AF89D6488F0> /Users/nathannakamura/Library/Python/3.9/lib/python/site-packages/xgboost/lib/libxgboost.dylib\n  Reason: tried: '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file)"]
