# Heston model machine learning (almost)

I attempted to use the Heston model to fit some implied volatility data and ran into some performance issues and suspicious risk-free rate values. Since fitting the parameters was taking so long, I abandoned this model for SABR.

## Import data

In [7]:
import os
import pandas as pd
import datetime as dt


raw_data = pd.read_csv(os.path.join('/mnt/c/Users/Steve/implied_vol_machine_learning', 'options_20220824.csv'))

# For simplicity, let's only analyze call options
# Also clean out some bad data
call_data = raw_data.loc[(raw_data["Type"] == "call") & (raw_data["Ask"] < 99000.0)].copy()

# Add some columns
call_data.loc[:, "moneyness"] = call_data["Strike"] / call_data["UnderlyingPrice"]
call_data.loc[:, "implied_vol"] = call_data["IV"]
call_data.loc[:, "maturity"] = (pd.to_datetime(call_data["Expiration"]) - pd.to_datetime(call_data[" DataDate"])).dt.days / 365
call_data.loc[:, "ticker"] = call_data["UnderlyingSymbol"]
call_data.loc[:, "Mid"] = (call_data["Bid"]+call_data["Ask"])/2
print(f"Quote dates: {call_data[' DataDate'].unique()}")
print(f"Moneyness: min={call_data['moneyness'].min()}, max={call_data['moneyness'].max()}")
print(f"Maturity: min={call_data['maturity'].min()}, max={call_data['maturity'].max()}")

# Drop the columns we don't need
# To avoid errors later, limit the value of moneyness
deep_out_of_money = call_data.loc[call_data["moneyness"] > 2.5, ["Mid"]]
print(f"Deep out of money OptionValues: min={deep_out_of_money['Mid'].min()} max={deep_out_of_money['Mid'].max()}, count={deep_out_of_money.shape[0]}")
model_input_data = call_data.loc[call_data["moneyness"] <= 2.5, ["ticker", "moneyness", "maturity", "implied_vol"]].copy() # we don't need other columns for this exercise


Quote dates: ['08/24/2022 16:00']
Moneyness: min=0.005609846402405502, max=8750.0
Maturity: min=-0.0027397260273972603, max=5.315068493150685
Deep out of money OptionValues: min=0.0 max=68.7, count=34451


## Create functions

One to produce a Heston volatility surface from the model parameters, maturity, and moneyness. A second function to perform array-like MSE calculations on the first function, given the expected value of the implied volatility. One open question I have, unrelated to the exercise at hand, is how Heston can produce a volatility surface without any option prices. SABR is somehow capable of that, as well. I'd love to learn how.

In [14]:
import QuantLib as ql
import numpy as np

today = ql.Date(24, 8, 2022)
calendar = ql.NullCalendar()
day_count = ql.Actual365Fixed()
spot_quote = ql.QuoteHandle(ql.SimpleQuote(1))

# Create model
# Not sure how to update some of these QuantLib objects, so just re-creating them every time
def heston_vol_surface(maturity, moneyness, r, y, v0, kappa, theta, rho, sigma):
    riskFreeCurve = ql.FlatForward(today, r, day_count)
    flat_ts = ql.YieldTermStructureHandle(riskFreeCurve)
    dividend_ts = ql.YieldTermStructureHandle(ql.FlatForward(today, y, day_count))
    heston_process = ql.HestonProcess(flat_ts, dividend_ts, spot_quote, v0, kappa, theta, sigma, rho)
    heston_model = ql.HestonModel(heston_process)
    heston_handle = ql.HestonModelHandle(heston_model)
    heston_vol_surface = ql.HestonBlackVolSurface(heston_handle)
    return np.array([heston_vol_surface.blackVol(float(mat), float(money)) for mat, money in zip(maturity, moneyness)])

# Kick the tires (basic test)
vol1 = heston_vol_surface(
    maturity = [0.5, 1, 1,5],
    moneyness = [1.0, 1.0, 1.0],
    r = 0.01,
    y = 0.0,
    v0 = 0.01,
    kappa = 1.0,
    theta = 0.04,
    rho = -0.3,
    sigma = 0.4
)
print(f'Test value: {vol1}')

# Create MSE function
def heston_vol_mse(model_params, maturity, moneyness, implied_vol):
    r, y, v0, kappa, theta, rho, sigma = model_params
    val = heston_vol_surface(maturity, moneyness, r, y, v0, kappa, theta, rho, sigma) - implied_vol
    return np.sqrt((val * val).sum())

Test value: [0.11589544 0.13062149 0.13062149]


## Fit model parameters

A single ticker's model fitting took 13s to perform, which was prohibitive for this exercise. Perhaps through asyncio or reproducing the QuantLib calculations myself in a more efficient manner for this purpose, I may be able to make Heston feasible. For now, I have abandoned it for SABR.

In addition, I found that when solving for the risk-free rate as I would any other parameter, the solver typically went to the boundaries. I attempted to get the risk-free rate independently (and failed), which you can see in RiskFreeRate.ipynb. For lack of a better option within the scope of this exercise, I just set the risk-free rate to zero.

In [15]:
import time
from scipy.optimize import minimize

r = 0.0; y = 0.0; v0 = 0.01; kappa = 0.5; theta = 0.3; rho = -0.5; sigma = 0.8
start_values = [r, y, v0, kappa, theta, rho, sigma]
bounds = [(-1.0, 1.0), (-1.0, 1.0), (0.001, 1.0), (0.001, 10.0), (0.001, 1.0), (-1.0, 1.0), (0.001, 1.0)]

model_params_by_ticker = {}
error_tickers = []
start = time.time()
for ticker, ticker_data in model_input_data.groupby('ticker'):
    res = minimize(heston_vol_mse, start_values, bounds=bounds, args=(ticker_data['maturity'], ticker_data['moneyness'], ticker_data['implied_vol']), tol=1e-3, method="Powell")
    if res.success:
        model_params_by_ticker[ticker] = res.x
    else:
        error_tickers.append(ticker)
    break
end = time.time()

print(end-start)
print(error_tickers)
print(model_params_by_ticker)

19.17981719970703
[]
{'A': array([-0.11961292, -0.24831387,  0.098184  ,  0.40740982,  0.25454315,
       -0.52045099,  0.6914804 ])}
