---
title: "Option Pricing with Neuron Network"
author: "Xipeng Du"
date: "2024-03-13"
categories: [option pricing, project]
---


## Introduction: Basic Idea of Option Pricing

Option is a financial contract that give the buyer the right to buy or sell certain quantity of assets as specific strike price on or before maturity date, and the buyer needs to pay premium or "option price" in this context to the seller of this contract. Option pricing is a way to evaluate the fair value of an option which corresponds to its striking price, maturity time and risk involved with the stock. Traditionally, traders use models like Black-Scholes model to estimate the price of an option, but these models have their limitations and base assumptions that are not certainly sound. With the power of machine learning, we are deriving a model that will evaluate the fair value of an option based on current stock price, quality, greeks of the stock, implied volatility of the stock. We will utilize pytorch to deploy a machine learning model to fit from past option price data for the S&P 500 index. 


## Data Acquisition

For the first step of our project is acquiring the data. We found a dataset [link] on kaggle, and traced to the source which is optiondx[link]. This websites contains a text based datasets with end of date data for free. The following is the features. Then, realizing we might need real time stock price data for target, we utilize a package called yfinance which calls yahoo finance's unofficial api to download data from a time range by `yf.download` function, which gets us an adjusted closed data at the end of each trading date. 

In [None]:
# getting stock prices for target evaluation
import pandas as pd
import yfinance as yf

target = pd.DataFrame(yf.download(['SPY'], start="2023-06-01", end="2023-12-31")['Adj Close'])
target

 ## Data Cleaning

In [None]:
# importing neccessary packages
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sqlite3
import math

In [None]:
# Read in the data from Jan 2023 to May 2023
df_2023_h1 = pd.DataFrame()
for i in [202301, 202302, 202303, 202304,  202305]:
    df_2023_h1 = pd.concat([df_2023_h1, pd.read_table(f'data/spy_eod_{i}.txt', sep=',')], ignore_index=True)
df_2023_h1.columns = df_2023_h1.columns.str.strip()

# also drop expiration date later than 2024
df_2023_h1 = df_2023_h1[df_2023_h1['[EXPIRE_DATE]'] <= ' 2023-12-31']
df_2023_h1 = df_2023_h1[df_2023_h1['[EXPIRE_DATE]'] >= ' 2023-06-01']
df_2023_h1 = df_2023_h1.reset_index()

We strip away all the spaces of the column names in this step for easier access, and removed the entries that target cannot be calculated. 

In [None]:
# change the string dates to datetime64
df_2023_h1['[QUOTE_DATE]'] = df_2023_h1['[QUOTE_DATE]'].apply(np.datetime64)
df_2023_h1['[EXPIRE_DATE]'] = df_2023_h1['[EXPIRE_DATE]'].apply(np.datetime64)

# merge our adj close stock data on EXPIRE_DATE
target['[EXPIRE_DATE]'] = target.index
target['[EXPIRE_DATE]'].astype('datetime64[ns]')

df_2023_h1 = pd.merge(df_2023_h1, target, on = '[EXPIRE_DATE]')


## Setting targets

Then, we have to set a target for our machine learning model. We first utilized a naive estimation of the option price, then start to focus on the intrinsic value of call option. with price = (K - S). 
<br>
Later, we set our target as intrinsic value of price based on payoff of call and put options, based on the function of discounted price = (K - S) * e^(-rt) where r is a risk free investment rate. 

In [None]:
# Add new cols for the target, namely -rt and price diff
df_2023_h1['-rt'] = -0.04*(df_2023_h1['[EXPIRE_UNIX]'] - df_2023_h1['[QUOTE_UNIXTIME]'])/(3600*365*24)
df_2023_h1['price_diff'] = df_2023_h1['[STRIKE]'] - df_2023_h1['Adj Close']
df_2023_h1['-rt'] = pd.to_numeric(df_2023_h1['-rt'])
df_2023_h1['exp(-rt)'] = df_2023_h1['-rt'].apply(lambda x: math.exp(x))
df_2023_h1 = df_2023_h1.loc[:, ~df_2023_h1.columns.str.contains('^Unnamed')]   
df_2023_h1['discounted_price'] = df_2023_h1['price_diff'] * df_2023_h1['exp(-rt)']

We normalize all numerical columns with a standard scaler. 

In [None]:
df_2023_h1 = df_2023_h1[['[EXPIRE_UNIX]', '[QUOTE_DATE]', '[EXPIRE_DATE]', '[STRIKE]', '[UNDERLYING_LAST]', '[C_DELTA]', '[C_GAMMA]', '[C_VEGA]',
       '[C_THETA]', '[C_RHO]', '[C_IV]', '[C_VOLUME]','[C_BID]', '[C_ASK]', '[P_DELTA]', '[P_GAMMA]', '[P_VEGA]', '[P_THETA]',
       '[P_RHO]', '[P_IV]', '[P_VOLUME]', '[P_BID]', '[P_ASK]', 'Adj Close']]

df_2023_h1 = df_2023_h1.replace(r'^\s*$', 0, regex=True)

# Basic normalization and standardization
# run block of code and catch warnings
import warnings
from sklearn.preprocessing import StandardScaler
with warnings.catch_warnings():
	# ignore all caught warnings
	warnings.filterwarnings("ignore")
	# execute code that will generate warnings
	numeric_cols = ['[QUOTE_UNIX]', '[EXPIRE_UNIX]', '[STRIKE]', '[UNDERLYING_LAST]', '[C_DELTA]', '[C_GAMMA]', '[C_VEGA]',
       '[C_THETA]', '[C_RHO]', '[C_IV]', '[C_VOLUME]','[C_BID]', '[C_ASK]', '[P_DELTA]', '[P_GAMMA]', '[P_VEGA]', '[P_THETA]',
       '[P_RHO]', '[P_IV]', '[P_VOLUME]', '[P_BID]', '[P_ASK]']  # not sure about all this
	scaler = StandardScaler()
	df_2023_h1[numeric_cols] = scaler.fit_transform(df_2023_h1[numeric_cols])


## Data Structures




## Baseline model: Linear Regression

We need to set a target for the neuron network to beat, if any model cannot beat a linear approximation of stock market in the long run it is a failure. 


## Autogluon Models


## Neuron Network Attempt 1




## Neuron Network Attempt 2

