# Financial Market Analysis: SPY Price Prediction Using Macroeconomic Indicators and ARIMA

This notebook explores the relationship between the SPY (S&P 500 ETF) stock price and key macroeconomic indicators such as unemployment rate, interest rates, and inflation. We will:
- Fetch stock price data for SPY from Yahoo Finance.
- Fetch unemployment data from the Bureau of Labor Statistics (via the FRED API).
- Simulate additional macroeconomic indicators (interest rates and inflation).
- Visualize the relationship between stock price and macroeconomic factors.
- Use the ARIMA model to forecast future stock prices based on historical data.

This analysis can be extended to include real interest rates and inflation data from FRED, or other sources of macroeconomic information.  

---


In [1]:
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from fredapi import Fred
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

# Set your FRED API Key (replace with your actual key)
FRED_API_KEY = 'Your API KEY'  # Replace this with your FRED API key - go to https://fredaccount.stlouisfed.org, then sign up for an account and request your API key
# NB: it is a bad idea to post any of your API keys publically, use best practices for storing API Keys in Github repositories instead
fred = Fred(api_key=FRED_API_KEY)

### Download SPY Stock Data
We will use Yahoo Finance to download daily stock price data for SPY (S&P 500 ETF) over the period from January 2020 to January 2024.


In [8]:
def download_stock_data(ticker, start_date, end_date):
    stock = yf.Ticker(ticker)
    stock_data = stock.history(start=start_date, end=end_date)
    return stock_data

# Parameters
ticker = 'SPY'
start_date = '2020-09-27'
end_date = '2024-09-27'

# Download SPY stock data
stock_data = download_stock_data(ticker, start_date, end_date)
stock_data.head()


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Capital Gains
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-09-28 00:00:00-04:00,314.424647,316.066493,313.414994,315.339935,64584600,0.0,0.0,0.0
2020-09-29 00:00:00-04:00,315.132285,315.887149,312.914832,313.622528,51304000,0.0,0.0,0.0
2020-09-30 00:00:00-04:00,314.30194,319.208643,314.103793,316.000427,104081100,0.0,0.0,0.0
2020-10-01 00:00:00-04:00,318.642506,319.633269,316.113679,318.029175,88698700,0.0,0.0,0.0
2020-10-02 00:00:00-04:00,312.990457,318.000944,312.509214,315.009735,89431100,0.0,0.0,0.0


### Download Unemployment Rate Data from FRED
The unemployment rate data will be fetched from FRED (Federal Reserve Economic Data) using the `fredapi` package. We will limit the data to the same time frame as the stock data.


In [9]:
# Set your valid FRED API Key (replace with your actual key)
FRED_API_KEY = '9fa08a80b2356aa126ed2254c6bb9e8c '  # Replace this with your FRED API key - go to https://fredaccount.stlouisfed.org, then sign up for an account and request your API key
# NB: it is a bad idea to post any of your API keys publically, use best practices for storing API Keys in Github repositories instead

fred = Fred(api_key=FRED_API_KEY)

def download_unemployment_data():
    unemployment_data = fred.get_series('UNRATE', observation_start='2020-09-27', observation_end='2024-09-27')
    unemployment_data = pd.DataFrame(unemployment_data, columns=['unemployment_rate'])
    unemployment_data.index = pd.to_datetime(unemployment_data.index)
    return unemployment_data

# Download unemployment data from FRED
unemployment_data = download_unemployment_data()
unemployment_data.tail(10)



Unnamed: 0,unemployment_rate
2023-11-01,3.7
2023-12-01,3.7
2024-01-01,3.7
2024-02-01,3.9
2024-03-01,3.8
2024-04-01,3.9
2024-05-01,4.0
2024-06-01,4.1
2024-07-01,4.3
2024-08-01,4.2


In [12]:
def download_unemployment_data():
    unemployment_data = fred.get_series('UNRATE', observation_start='2020-09-27', observation_end='2024-09-27')
    unemployment_data = pd.DataFrame(unemployment_data, columns=['unemployment_rate'])
    unemployment_data.index = pd.to_datetime(unemployment_data.index)
    return unemployment_data

# Download unemployment data from FRED
unemployment_data = download_unemployment_data()
unemployment_data.tail(10)


Unnamed: 0,unemployment_rate
2023-11-01,3.7
2023-12-01,3.7
2024-01-01,3.7
2024-02-01,3.9
2024-03-01,3.8
2024-04-01,3.9
2024-05-01,4.0
2024-06-01,4.1
2024-07-01,4.3
2024-08-01,4.2


### Generate Simulated Macroeconomic Data
We simulate additional macroeconomic data such as interest rates and inflation for the purpose of this analysis. These could be replaced with real data from other sources like FRED.


In [14]:
def download_macro_data():
    dates = pd.date_range('2020-09-27', '2024-09-27', freq='ME')
    macro_data = pd.DataFrame(index=dates)
    macro_data['interest_rate'] = [2 + (i % 10) / 10 for i in range(len(dates))]  # Simulated interest rate
    macro_data['inflation'] = [1.5 + (i % 5) / 10 for i in range(len(dates))]  # Simulated inflation
    return macro_data

# Download simulated macroeconomic data
macro_data = download_macro_data()
macro_data.head()


Unnamed: 0,interest_rate,inflation
2020-09-30,2.0,1.5
2020-10-31,2.1,1.6
2020-11-30,2.2,1.7
2020-12-31,2.3,1.8
2021-01-31,2.4,1.9


### Merge Stock Data, Macroeconomic Data, and Unemployment Rate
We will now merge the stock data, unemployment data, and simulated macroeconomic indicators into a single DataFrame, resampling the stock data to monthly values for easier comparison.


In [22]:
print("Stock Data Index Range:", stock_data.index.min(), "to", stock_data.index.max())
print("Macro Data Index Range:", macro_data.index.min(), "to", macro_data.index.max())
print("Unemployment Data Index Range:", unemployment_data.index.min(), "to", unemployment_data.index.max())


Stock Data Index Range: 2020-09-28 00:00:00 to 2024-09-26 00:00:00
Macro Data Index Range: 2020-09-30 00:00:00 to 2024-08-31 00:00:00
Unemployment Data Index Range: 2020-09-01 00:00:00 to 2024-08-01 00:00:00


In [None]:
print(stock_data.index.tz)  # If it prints None, it's tz-naive

In [24]:
# Aligning Data by findding the common range among them
merged_data = pd.merge(stock_data, macro_data, left_index=True, right_index=True, how='outer')
merged_data = pd.merge(merged_data, unemployment_data, left_index=True, right_index=True, how='outer')

In [28]:
# Drop Missing Values (typically you'd try all other options, but in this case we will drop)
merged_data.dropna(inplace=True)  # Remove rows with any NaN values


In [30]:
def merge_data(stock_data, macro_data, unemployment_data):
    # Convert all DatetimeIndexes to timezone-naive if necessary
    stock_data.index = stock_data.index.tz_localize(None)
    macro_data.index = macro_data.index.tz_localize(None)
    unemployment_data.index = unemployment_data.index.tz_localize(None)

    # Resample stock data to monthly averages
    stock_data_resampled = stock_data.resample('M').mean()

    # Merge stock data with macroeconomic and unemployment data (outer join to keep all data)
    merged_data = pd.merge(stock_data_resampled, macro_data, left_index=True, right_index=True, how='outer')
    merged_data = pd.merge(merged_data, unemployment_data, left_index=True, right_index=True, how='outer')

    return merged_data

# Merge all data
merged_data = merge_data(stock_data, macro_data, unemployment_data)
print(merged_data.tail(10))

                  Open        High         Low       Close        Volume  \
2024-04-30  507.656465  509.692718  503.879362  506.385384  7.240791e+07   
2024-05-01         NaN         NaN         NaN         NaN           NaN   
2024-05-31  518.775922  520.733132  516.748240  519.148117  5.242111e+07   
2024-06-01         NaN         NaN         NaN         NaN           NaN   
2024-06-30  536.929581  539.120503  534.973843  537.665858  4.678543e+07   
2024-07-01         NaN         NaN         NaN         NaN           NaN   
2024-07-31  550.744800  553.334584  548.065747  550.330164  4.720298e+07   
2024-08-01         NaN         NaN         NaN         NaN           NaN   
2024-08-31  544.537935  548.239312  540.933080  544.880518  5.657268e+07   
2024-09-30  558.921830  561.218318  555.059151  558.415877  5.218906e+07   

            Dividends  Stock Splits  Capital Gains  interest_rate  inflation  \
2024-04-30   0.000000           0.0            0.0            2.3        1.8   
202

  stock_data_resampled = stock_data.resample('M').mean()


In [31]:
print(stock_data.index.tz)  # If it prints None, it's tz-naive

None
