# Volatility In IBM


This notebook analyzes the volatility of IBM stock data using a GARCH model. We'll fetch the data, preprocess it, and fit the model to make predictions.

## Importing Libraries

In [47]:
# Import necessary libraries
import sqlite3
import os

import pandas as pd
import requests
from arch.univariate.base import ARCHModelResult
from settings import DB_NAME, MODEL_DIRECTORY

# AlphaVantage API Class

In [11]:
# Import the AlphaVantageAPI class
from alphavantageapi import AlphaVantageAPI

# Create an instance of the AlphaVantageAPI class
av = AlphaVantageAPI()

print("av type:", type(av))

av type: <class 'alphavantageapi.AlphaVantageAPI'>


In [16]:
# Define the stock symbol we want to retrieve data for
symbol = 'IBM'

# Use the AlphaVantageAPI object (av) to get daily time series data for the specified symbol
df_ibm = av.get_daily(symbol=symbol)

print("df_ibm shape:", df_ibm.shape)

df_ibm.head()

df_ibm shape: (6192, 5)


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-06-11,169.98,170.0,166.81,169.32,2951251.0
2024-06-10,169.55,170.76,168.88,170.38,3444684.0
2024-06-07,168.18,171.305,168.06,170.01,3475495.0
2024-06-06,167.38,168.44,166.8,168.2,2207263.0
2024-06-05,166.41,167.79,165.78,167.38,3049377.0


In [8]:
# Check if df_ibm is a pandas DataFrame
assert isinstance(df_ibm, pd.DataFrame)

# Ensure that the DataFrame has 5 columns
assert df_ibm.shape[1] == 5

# Verify that the index is a DatetimeIndex
assert isinstance(df_ibm.index, pd.DatetimeIndex)

# Confirm that the index name is "date"
assert df_ibm.index.name == "date"

# Check if the column names match the expected list
assert df_ibm.columns.to_list() == ['open', 'high', 'low', 'close', 'volume']

# Validate that all columns have float data types
assert all(df_ibm.dtypes == float)

# SQL Repository Class

To optimize our application's performance, we shouldn't retrieve data from the AlphaVantage API each time we need to explore or model our data. Instead, we'll store the data in a database. Given that our data is highly structured, with each DataFrame from AlphaVantage consistently containing the same five columns, a SQL database is an ideal choice.

We'll use SQLite for this purpose. For consistency, the database will always have the same name, which is specified in our .env file.

In [10]:
connection = sqlite3.connect(database=DB_NAME, check_same_thread=False)

print("connection type:", type(connection))

connection type: <class 'sqlite3.Connection'>


In [12]:
# Import the SQLRepository class
from sqlrepository import SQLRepository

# Create an instance of the SQLRepository class
repo = SQLRepository(connection=connection)

# Check if the repo object has a "connection" attribute
assert hasattr(repo, "connection")

# Verify that the value of the "connection" attribute is of type sqlite3.Connection
assert isinstance(repo.connection, sqlite3.Connection)

In [15]:
response = repo.insert_table(table_name=symbol, records=df_ibm, if_exists="replace")
print(response)

# Verify that the response variable holds a dictionary
assert isinstance(response, dict)

# Check if the response dictionary contains the expected keys
assert sorted(list(response.keys())) == ["records_inserted", "transaction_successful"]

{'transaction_successful': True, 'records_inserted': 6192}


In [18]:
df_IBM = repo.read_table(table_name=symbol, limit=4000)

# Check if df_ibm is a pandas DataFrame
assert isinstance(df_IBM, pd.DataFrame)

# Ensure that the DataFrame has 5 columns and 4000 rows
assert df_IBM.shape == (4000, 5)

# Verify that the index is a DatetimeIndex
assert isinstance(df_IBM.index, pd.DatetimeIndex)

# Confirm that the index name is "date"
assert df_IBM.index.name == "date"

# Check if the column names match the expected list
assert df_IBM.columns.to_list() == ['open', 'high', 'low', 'close', 'volume']

# Validate that all columns have float data types
assert all(df_IBM.dtypes == float)

# Print `df_IBM` info
print("df_IBM shape:", df_IBM.shape)
print()
print(df_IBM.info())
df_IBM.head()

df_IBM shape: (4000, 5)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4000 entries, 2024-06-11 to 2008-07-22
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    4000 non-null   float64
 1   high    4000 non-null   float64
 2   low     4000 non-null   float64
 3   close   4000 non-null   float64
 4   volume  4000 non-null   float64
dtypes: float64(5)
memory usage: 187.5 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-06-11,169.98,170.0,166.81,169.32,2951251.0
2024-06-10,169.55,170.76,168.88,170.38,3444684.0
2024-06-07,168.18,171.305,168.06,170.01,3475495.0
2024-06-06,167.38,168.44,166.8,168.2,2207263.0
2024-06-05,166.41,167.79,165.78,167.38,3049377.0


# Model Class

In [23]:
# Import the GarchModel class
from model import GarchModel

# Create an instance of the GarchModel class
model = GarchModel(symbol=symbol ,repo=repo, model_directory=MODEL_DIRECTORY, use_new_data=False)

# Verifies that the model's attributes are correctly set
assert model.symbol == symbol
assert model.repo == repo
assert not model.use_new_data
assert model.model_directory == MODEL_DIRECTORY

# Check that model doesn't have `data` attribute yet
assert not hasattr(model, "data")

In [25]:
# Wrangle data
model.wrangle_data(n_observations=4000)

# Check if the model object has a "data" attribute
assert hasattr(model, "data")

# Verify that the "data" attribute holds a pandas Series object
assert isinstance(model.data, pd.Series)

# Ensure the data Series has the expected shape (4000 rows)
assert model.data.shape == (4000,)

model.data.head()

date
2008-07-22    1.041505
2008-07-23   -0.369231
2008-07-24    0.370599
2008-07-25   -1.130769
2008-07-28   -1.773905
Name: return, dtype: float64

In [28]:
# Train the GARCH model with parameters p=1 and q=1
model.fit(p=1, q=1)

# Ensure model has a "model" attribute after training
assert hasattr(model, "model")

# Verify the trained model is of type ARCHModelResult
assert isinstance(model.model, ARCHModelResult)

# Check if the model parameter names are as expected
assert model.model.params.index.tolist() == ["mu", "omega", "alpha[1]", "beta[1]"]

# Print a summary of the trained GARCH model
model.model.summary()

0,1,2,3
Dep. Variable:,return,R-squared:,0.0
Mean Model:,Constant Mean,Adj. R-squared:,0.0
Vol Model:,GARCH,Log-Likelihood:,-6914.9
Distribution:,Normal,AIC:,13837.8
Method:,Maximum Likelihood,BIC:,13863.0
,,No. Observations:,4000.0
Date:,"Wed, Jun 12 2024",Df Residuals:,3999.0
Time:,16:26:09,Df Model:,1.0

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
mu,0.0255,2.175e-02,1.173,0.241,"[-1.711e-02,6.813e-02]"

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
omega,0.1494,4.026e-02,3.712,2.060e-04,"[7.051e-02, 0.228]"
alpha[1],0.0969,2.371e-02,4.085,4.402e-05,"[5.039e-02, 0.143]"
beta[1],0.8342,3.328e-02,25.068,1.106e-138,"[ 0.769, 0.899]"


In [29]:
# Predict volatility for the next 5 days (horizon=5) using the trained model in model
prediction = model.predict_volatility(horizon=5)

# Verify the prediction is returned as a dictionary
assert isinstance(prediction, dict)

# Ensure all keys in the prediction dictionary are strings
assert all(isinstance(k, str) for k in prediction.keys())

# Ensure all values in the prediction dictionary are floats
assert all(isinstance(v, float) for v in prediction.values())

prediction

{'2024-06-12T00:00:00': 1.1390845320453762,
 '2024-06-13T00:00:00': 1.1650803585321614,
 '2024-06-14T00:00:00': 1.1887719487589206,
 '2024-06-17T00:00:00': 1.2104123865448637,
 '2024-06-18T00:00:00': 1.2302178002464368}

In [33]:
# Save the trained model
filename = model.dump()

# Verify the filename is a string
assert isinstance(filename, str)

# Check if the model's symbol is present in the filename
assert model.symbol in filename

# Ensure the saved model file exists at the specified location
assert os.path.exists(filename)

# Print the filename (optional)
filename

'models\\2024-06-12T16_31_13.852610_IBM.pkl'

In [34]:
model.load()

model.model.summary()

0,1,2,3
Dep. Variable:,return,R-squared:,0.0
Mean Model:,Constant Mean,Adj. R-squared:,0.0
Vol Model:,GARCH,Log-Likelihood:,-6914.9
Distribution:,Normal,AIC:,13837.8
Method:,Maximum Likelihood,BIC:,13863.0
,,No. Observations:,4000.0
Date:,"Wed, Jun 12 2024",Df Residuals:,3999.0
Time:,16:26:09,Df Model:,1.0

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
mu,0.0255,2.175e-02,1.173,0.241,"[-1.711e-02,6.813e-02]"

0,1,2,3,4,5
,coef,std err,t,P>|t|,95.0% Conf. Int.
omega,0.1494,4.026e-02,3.712,2.060e-04,"[7.051e-02, 0.228]"
alpha[1],0.0969,2.371e-02,4.085,4.402e-05,"[5.039e-02, 0.143]"
beta[1],0.8342,3.328e-02,25.068,1.106e-138,"[ 0.769, 0.899]"


# Main Class

interactive applications using FastAPI app

## "/fit" Path

Our first endpoint will allow users to fit a model to stock data by making a POST request to our server. Users can choose to use new data from AlphaVantage or existing data from our database. Upon making a request, users will receive a response indicating whether the operation was successful or if there was an error.

A crucial aspect of building an API is ensuring that users provide the correct parameters. Incorrect parameters can cause the app to crash. FastAPI, in combination with the pydantic library, excels at verifying that each request contains the appropriate parameters and data types. This is achieved through the use of specially defined data classes. For our "/fit" endpoint, which takes user input and returns a response, we need to define two classes: one for the input and one for the output.

In [40]:
# URL of `/fit` path
url = "http://127.0.0.1:8008/fit"

# Data to send to path
json = {
    'symbol': 'IBM',
    'use_new_data': False,
    'n_observations': 4000,
    'p': 1,
    'q': 1
}
# Response of post request
response = requests.post(url=url, json=json)
# Inspect response
print("response code:", response.status_code)
response.json()

response code: 200


{'symbol': 'IBM',
 'use_new_data': False,
 'n_observations': 4000,
 'p': 1,
 'q': 1,
 'success': True,
 'message': 'Trained and saved models\\2024-06-12T17_17_30.066173_IBM.pkl. AIC: 13837.79103150345, BIC: 13862.967230063858'}

## "/predict" Path

For our "/predict" endpoint, users can make a POST request with the ticker symbol for which they want a prediction and the number of days they wish to forecast into the future. Our application will return a forecast or, in case of an error, a message explaining the issue.

The setup will be similar to our "/fit" endpoint. We'll begin by defining data classes for the input and output.

In [45]:
# URL of `/predict` path
url = "http://localhost:8008/predict"
# Data to send to path
json = {
    'symbol': 'IBM',
    'n_days': 5
}
# Response of post request
response = requests.post(url=url, json=json)
# Response JSON to be submitted to grader
submission = response.json()
# Inspect JSON
submission

{'symbol': 'IBM',
 'n_days': 5,
 'success': True,
 'forecast': {'2024-06-12T00:00:00': 1.1390845320453762,
  '2024-06-13T00:00:00': 1.1650803585321614,
  '2024-06-14T00:00:00': 1.1887719487589206,
  '2024-06-17T00:00:00': 1.2104123865448637,
  '2024-06-18T00:00:00': 1.2302178002464368},
 'message': ''}