# Prototyping trading strategies in Python

This workshop will introduce a simple idea for finding inefficient closing auctions. The idea is based on the assumption that some market participants enter their positions in the morning and exit them in the closing auction. If more participants enter the same positions (long or short) it might make the closing price inefficient. We could exploit that by entering a position in the closing auction and exiting the next day in the opening auction.

We will learn how to:

- download data from [finance.google.com](https://finance.google.com);
- transform data using the [Pandas](https://pandas.pydata.org) library;
- construct a simple strategy;
- test and optimise it on historical data;
- plot and interpret the results.

<hr>

# Disclaimer #1 (from our legal team with ❤️)

LEGAL NOTICE / DISCLAIMER

This presentation (the “Presentation”) has been prepared by WOOD & Company Financial Services, a.s., (“WOOD”) and Quantlane s.r.o. (“Quantlane”) solely for educational purposes. The Presentation may not be distributed, reproduced, or used for any other purposes. The Presentation is of purely educational nature and has no legal value.

No representation or warranty, express or implied, is made and no responsibility is or will be accepted by WOOD or Quantlane for the accuracy, reliability or completeness of any data used or presented in this Presentation. The Presentation does not purport to contain all information which may be material for the subject herein.

Neither the receipt of the Presentation, nor any information subsequently provided in connection with it shall be relied upon as constituting advice, especially investment advice, to the recipients. Neither WOOD and/or Quantlane, nor its respective directors, agents or employees, accept any liability to any person in relation to the distribution or possession of the Presentation.

The Presentation is neither an offer nor the solicitation of an offer to sell or purchase any investment. All estimates, opinions and other information contained herein are subject to change without notice and are provided in good faith but without legal responsibility or liability. Opinion may be personal to the author and may not reflect the opinions of WOOD or Quantlane.

# Disclaimer #2

**Don't trade this at home.** We are making a lot of simplifying assumptions here. The purpose of this workshop is to give you a taste of what it's like to verify trading ideas in Python. The purpose is _not_ to give you a working strategy, or even a working idea.

# Let's come up with a trading strategy 💡✨

## First, some background on stock market auctions
Every trading day stock markets enter regimes called _auctions_. There is usually one opening and one closing auction per stock per day. Sometimes there are scheduled intraday auctions, and sometimes unscheduled auctions are triggered by unusual market circumstances. 

### Opening auction
The opening auction provides market participants with means to place buy and sell orders knowing that they will be (potentially) matched at some predefined time, e.g. 9.00am. There is no trading possible before the auction. Standard continuous trading usually starts immediatelly after the opening auction.

### Closing auction
There is a closing auction at the end of every trading day in each stock and exchange. This means that every market participant can enter buy and sell orders such that all the orders are matched at a certain, possibly randomised time.

### Matching orders
Orders are matched at a price that will maximise the total quantity traded in the auction. Orders are filled if they have a counterparty:

- If you want to buy for a price that is below the auction price you are not filled.
- If you want to buy for a price that is above the auction price you are filled.
- If you want to buy for a price that is exactly the auction price you might be filled or not, depending on your queue position in the market.

### Recommended reading
- https://www.investopedia.com/articles/investing/091113/auction-method-how-nyse-stock-prices-are-set.asp
- https://www.nyse.com/publicdocs/nyse/markets/nyse/NYSE_Opening_and_Closing_Auctions_Fact_Sheet.pdf


## Our simple strategy

**Remember that there should be a fundamental, robust idea behind every strategy.** Blind data mining yields only shortlived results.

The fundamental idea is based on an assumption that some market participants enter their positions in the morning and exit them in the closing auction. (Momentum traders that do not want to hold overnight positions.) If more participants enter the same positions (long or short) it might make the closing price inefficient. We could exploit that by entering a position in the closing auction and exiting the next day in the opening auction.

What we need for this for this strategy is to figure out some efficient price. We decided to use Volume Weighted Average Price (VWAP) in a time interval before the closing time. Let $T$ be the closing time. Our VWAP shall span from $T - \delta_1$ to $T - \delta_2$ where $\delta_1 > \delta_2$.

$$
VWAP = \frac{\sum_{\delta \in (\delta_1, \delta_2)} p_{T-\delta} \cdot v_{T-\delta}}{\sum_{\delta \in (\delta_1, \delta_2)} v_{T-\delta}}
$$

where $p_t$ / $v_t$ is price / volume traded at time $t$.

We do not use data in $(T-\delta_2, T)$ because the same those market participants that might move the price in the closing auction might also move it in last few moments before the auction.

To keep this tutorial simple, we pick a fixed $\delta_1$ and $\delta_2$.

The strategy enters a long trade when the price in the closing auction is below the VWAP by some margin, but not too much. Similarly it enters short trades. It exits its positions in next day opening auctions. These margins (minimal and maximal price inefficiency) are our parameters of choice.

In [None]:
import inefficiency_example
inefficiency_example.generate_and_plot()

# Setting up

In [None]:
%load_ext autoreload
%autoreload 2

from IPython.core.display import display, HTML
display(HTML("<style>.container { min-width: 90% !important; }</style>"))

In [None]:
from typing import Any, Iterator, List, Set

import concurrent.futures
import csv
import datetime
import itertools
import os

from plotly.offline import init_notebook_mode, iplot
import numpy as np
import pandas as pd
import plotly.graph_objs as go
import requests
import scipy
import statsmodels.stats.api

In [None]:
init_notebook_mode()

# Getting data

There are several places to get free data. You can use finance.yahoo.com, finance.google.com and many more to get _daily_ (end-of-day) data. 

Getting free _minute_ data seems to be harder. We found free data for you only at finance.google.com, though the (undocumented) API only returns the last 15 business days.

In [None]:
def download_google_finance_data_for_instrument(instrument: str, lookback_days: int) -> List[str]:
    # Be aware that changing TIME_INTERVAL will most likely result in an unpredictable result (weirdly looking data).
    TIME_INTERVAL = 60
    url = 'http://finance.google.com/finance/getprices'
    params = {
        'i': TIME_INTERVAL,
        'p': f'{lookback_days}d',
        'f': 'd,o,h,l,c,v', # datetime OR timedelta, open, high, low, close, volume
        'q': instrument,
    }
    response = requests.get(url, params = params)
    response.raise_for_status()
    return response.text.splitlines()

In [None]:
aapl_test = download_google_finance_data_for_instrument('AAPL', 2)
aapl_test

In [None]:
def parse_google_finance_data_for_instrument(instrument: str, lines: List[str]) -> Iterator[List[Any]]:
    '''
    We expect each CSV row to be in the form [timestamp or timedelta, 'open', 'high', 'low', 'close', 'volume'].
    We skip all rows until we find the starting timestamp in the form 'a1512570600'.
    '''
    starting_datetime: datetime.datetime = None
    for row in csv.reader(lines):
        if row[0][0] == 'a':
            starting_datetime = datetime.datetime.fromtimestamp(float(row[0][1:]))
            yield [starting_datetime, instrument] + row[1:]
        elif starting_datetime is not None and not row[0].startswith('TIMEZONE_OFFSET='):
            timedelta = datetime.timedelta(minutes = float(row[0]))
            yield [starting_datetime + timedelta, instrument] + row[1:]

In [None]:
aapl_test_parsed = list(parse_google_finance_data_for_instrument('AAPL', aapl_test))
aapl_test_parsed

In [None]:
def parsed_google_finance_data_to_dataframe(rows: Iterator[str]) -> pd.DataFrame:
    dataframe = pd.DataFrame(data = list(rows), columns = ('time', 'instrument', 'open', 'high', 'low', 'close', 'volume'))
    dataframe = dataframe.sort_values(['instrument', 'time'])
    dataframe[['open', 'high', 'low', 'close']] = dataframe[['open', 'high', 'low', 'close']].astype(float)
    dataframe['volume'] = dataframe['volume'].astype(int)
    return dataframe

In [None]:
parsed_google_finance_data_to_dataframe(aapl_test_parsed)

In [None]:
def get_google_finance_data_in_parallel(instruments: Set[str], lookback_days: int) -> pd.DataFrame:
    pipeline = lambda instrument: parse_google_finance_data_for_instrument(
        instrument,
        download_google_finance_data_for_instrument(instrument, lookback_days)
    )
    with concurrent.futures.ThreadPoolExecutor() as executor:
        rows = itertools.chain.from_iterable(executor.map(pipeline, instruments))
        # Key performance trick: we only convert to DataFrame once we have _all_ rows downloaded and parsed.
        return parsed_google_finance_data_to_dataframe(rows)

In [None]:
aapl_and_msft_test = get_google_finance_data_in_parallel({'AAPL', 'MSFT'}, 2)
aapl_and_msft_test

In [None]:
aapl_and_msft_test.to_csv('data/test.csv')

In [None]:
! head 'data/test.csv'

In [None]:
def load_offline_data(instruments: Set[str], lookback_days: int) -> pd.DataFrame:
    '''
    Loads data from any CSVs stored in the 'data' folder and transforms them 
    to the same type we get from :func:`parsed_google_finance_data_to_dataframe`.
    '''
    all_dataframes = [
        pd.read_csv(os.path.join('data', filename), index_col = 0)
        for filename in os.listdir('data')
        if filename.endswith('.csv')
    ]
    combined = pd.concat(all_dataframes)
    # Reorder columns if needed:
    combined = combined[['time', 'instrument', 'open', 'high', 'low', 'close', 'volume']]
    combined['time'] = combined['time'].map(pd.Timestamp)
    combined = combined.sort_values(['instrument', 'time']).drop_duplicates().reset_index()
    
    cutoff_date = pd.Timestamp(datetime.date.today() - datetime.timedelta(days = lookback_days))
    filtered = combined[
        (combined['instrument'].isin(instruments))
        & (combined['time'] >= cutoff_date)
    ]
    return filtered

In [None]:
# load_offline_data({'AAPL'}, 1)

## Now, let's get the entire S&P 500

In [None]:
sp500_instruments = {
    'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI', 'ADM', 'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 
    'AES', 'AET', 'AFL', 'AGN', 'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE', 'ALXN',
    'AMAT', 'AMD', 'AME', 'AMG', 'AMGN', 'AMP', 'AMT', 'AMZN', 'ANDV', 'ANSS', 'ANTM', 'AON', 'AOS', 'APA', 'APC', 
    'APD', 'APH', 'ARE', 'ARNC', 'ATVI', 'AVB', 'AVGO', 'AVY', 'AWK', 'AXP', 'AYI', 'AZO', 'B', 'BA', 'BAC', 'BAX',
    'BBT', 'BBY', 'BCR', 'BDX', 'BEN', 'BHF', 'BHGE', 'BIIB', 'BK', 'BLK', 'BLL', 'BMY', 'BR', 'BSX', 'BWA', 'BXP',
    'C', 'CA', 'CAG', 'CAH', 'CAT', 'CB', 'CBG', 'CBOE', 'CBS', 'CCI', 'CCL', 'CDNS', 'CELG', 'CERN', 'CF', 'CFG',
    'CHD', 'CHKCVX', 'CHRW', 'CHTR', 'CI', 'CINF', 'CL', 'CLX', 'CMA', 'CMCSA', 'CME', 'CMG', 'CMI', 'CMS', 'CNC', 
    'CNP', 'COF', 'COG', 'COL', 'COO', 'COP', 'COST', 'COTY', 'CPB', 'CRM', 'CSCO', 'CSRA', 'CSX', 'CTAS', 'CTL', 
    'CTSH', 'CTXS', 'CVS', 'CXO', 'D', 'DAL', 'DE', 'DFS', 'DG', 'DGX', 'DHI', 'DHR', 'DIS', 'DISCA', 'DISCK', 
    'DISH', 'DLPH', 'DLR', 'DLTR', 'DOV', 'DPS', 'DRE', 'DRI', 'DTE', 'DUK', 'DVA', 'DVN', 'DWDP', 'DXC', 'EA', 
    'EBAY', 'ECL', 'ED', 'EFX', 'EIX', 'EL', 'EMN', 'EMR', 'EOG', 'EQIX', 'EQR', 'EQT', 'ES', 'ESRX', 'ESS', 'ETFC', 
    'ETN', 'ETR', 'EVHC', 'EW', 'EXC', 'EXPD', 'EXPE', 'EXR', 'F', 'FAST', 'FB', 'FBHS', 'FCX', 'FDX', 'FE', 'FFIV',
    'FIS', 'FISV', 'FITB', 'FL', 'FLIR', 'FLR', 'FLS', 'FMC', 'FOX', 'FOXA', 'FRT', 'FTI', 'FTV', 'GD', 'GE', 'GGP',
    'GILD', 'GIS', 'GLW', 'GM', 'GOOG', 'GOOGL', 'GPC', 'GPN', 'GPS', 'GRMN', 'GS', 'GT', 'GWW', 'HAL', 'HAS', 'HBAN',
    'HBI', 'HCA', 'HCN', 'HCP', 'HD', 'HES', 'HIG', 'HLT', 'HOG', 'HOLX', 'HON', 'HP', 'HPE', 'HPQ', 'HRB', 'HRL', 
    'HRS', 'HSIC', 'HST', 'HSY', 'HUM', 'IBM', 'ICE', 'IDXX', 'IFF', 'ILMN', 'INCY', 'INFO', 'INTC', 'INTU', 'IP', 
    'IPG', 'IQV', 'IR', 'IRM', 'ISRG', 'IT', 'ITW', 'IVZ', 'JBHT', 'JCI', 'JEC', 'JNJ', 'JNPR', 'JPM', 'JWN', 'K',
    'KEY', 'KHC', 'KIM', 'KLAC', 'KMB', 'KMI', 'KMX', 'KO', 'KORS', 'KR', 'KSS', 'KSU', 'L', 'LB', 'LEG', 'LEN', 'LH',
    'LKQ', 'LLL', 'LLY', 'LMT', 'LNC', 'LNT', 'LOW', 'LRCX', 'LUK', 'LUV', 'LYB', 'M', 'MA', 'MAA', 'MAC', 'MAR', 
    'MAS', 'MAT', 'MCD', 'MCHP', 'MCK', 'MCO', 'MDLZ', 'MDT', 'MET', 'MGM', 'MHK', 'MKC', 'MLM', 'MMC', 'MMM', 'MNST',
    'MO', 'MON', 'MOS', 'MPC', 'MRK', 'MRO', 'MS', 'MSFT', 'MSI', 'MTB', 'MTD', 'MU', 'MYL', 'NAVI', 'NBL', 'NCLH',
    'NDAQ', 'NEE', 'NEM', 'NFLX', 'NFX', 'NI', 'NKE', 'NLSN', 'NOC', 'NOV', 'NRG', 'NSC', 'NTAP', 'NTRS', 'NUE',
    'NVDA', 'NWL', 'NWS', 'NWSA', 'O', 'OKE', 'OMC', 'ORCL', 'ORLY', 'OXY', 'PAYX', 'PBCT', 'PCAR', 'PCG', 'PCLN',
    'PDCO', 'PEG', 'PEP', 'PFE', 'PFG', 'PG', 'PGR', 'PH', 'PHM', 'PKG', 'PKI', 'PLD', 'PM', 'PNC', 'PNR', 'PNW',
    'PPG', 'PPL', 'PRGO', 'PRU', 'PSA', 'PSX', 'PVH', 'PWR', 'PX', 'PXD', 'PYPL', 'QCOM', 'QRVO', 'RCL', 'RE', 'REG',
    'REGN', 'RF', 'RHI', 'RHT', 'RJF', 'RL', 'RMD', 'ROK', 'ROP', 'ROST', 'RRC', 'RSG', 'RTN', 'SBAC', 'SBUX', 'SCG',
    'SCHW', 'SEE', 'SHW', 'SIG', 'SJM', 'SLB', 'SLG', 'SNA', 'SNI', 'SNPS', 'SO', 'SPG', 'SPGI', 'SRCL', 'SRE', 'STI',
    'STT', 'STX', 'STZ', 'SWK', 'SWKS', 'SYF', 'SYK', 'SYMC', 'SYY', 'T', 'TAP', 'TDG', 'TEL', 'TGT', 'TIF', 'TJX',
    'TMK', 'TMO', 'TPR', 'TRIP', 'TROW', 'TRV', 'TSCO', 'TSN', 'TSS', 'TWX', 'TXN', 'TXT', 'UA', 'UAA', 'UAL', 'UDR',
    'UHS', 'ULTA', 'UNH', 'UNM', 'UNP', 'UPS', 'URI', 'USB', 'UTX', 'V', 'VAR', 'VFC', 'VIAB', 'VLO', 'VMC', 'VNO', 
    'VRSK', 'VRSN', 'VRTX', 'VTR', 'VZ', 'WAT', 'WBA', 'WDC', 'WEC', 'WFC', 'WHR', 'WLTW', 'WM', 'WMB', 'WMT', 'WRK', 
    'WU', 'WY', 'WYN', 'WYNN', 'XEC', 'XEL', 'XL', 'XLNX', 'XOM', 'XRAY', 'XRX', 'XYL', 'YUM', 'ZBH', 'ZION', 'ZTS'
}

In [None]:
# raw_sp500_data_online = get_google_finance_data_in_parallel(sp500_instruments, 15)
# len(raw_sp500_data_online)

In [None]:
raw_sp500_data_offline = load_offline_data(sp500_instruments, 60)

In [None]:
# raw_sp500_data = pd.concat([raw_sp500_data_online, raw_sp500_data_offline]).drop_duplicates().sort_values(['time', 'instrument']).reset_index(drop = True)
raw_sp500_data = raw_sp500_data_offline

In [None]:
raw_sp500_data['date'] = raw_sp500_data['time'].map(lambda timestamp: timestamp.date())

In [None]:
raw_sp500_data.head()

# Data transformation

In [None]:
def transform_data(raw_minute_data: pd.DataFrame, delta_1: int, delta_2: int) -> pd.DataFrame:
    '''
    Gets vwap from "delta_1" seconds before close up to "delta_2" seconds before yesterday close
    and joins it with the yesterday closing price and opening price today open price.
    '''
    aggregated = raw_minute_data.groupby(['date', 'instrument']).apply(
        lambda df_per_day_and_instrument: 
            summarise_data_per_day_and_instrument(
                minute_data = df_per_day_and_instrument, 
                delta_1 = delta_1, 
                delta_2 = delta_2
            )
    )
    return aggregated.reset_index().groupby('instrument').apply(prev_day_info).dropna()


def summarise_data_per_day_and_instrument(minute_data: pd.DataFrame, delta_1: int, delta_2: int) -> pd.Series:
    '''
    Extract the first (opening) price, last (closing) price, and VWAP from the interval
    (closing_time - delta_1 seconds, closing_time - delta_2) from ``minute_data``.
    
    We know that the ``minute_data`` dataframe only contains minute data for one instrument 
    and one day.
    
    :param minute_data: dataframe with columns 'open', 'close' and 'time' at least. 
    :param delta_1: number of seconds before closing price, has to be higher than delta_2
    :param delta_2: number of seconds before closing price, has to be smaller than delta_1
    '''
    if delta_1 <= delta_2:
        raise ValueError('delta_1 cannot be smaller or equal than delta_2')

    delta_1 = pd.Timedelta(delta_1, unit = 's')
    delta_2 = pd.Timedelta(delta_2, unit = 's')    
    closing_time = minute_data.iloc[-1].time
    vwap_interval = minute_data.loc[
        (minute_data.time > closing_time - delta_1) 
        & (minute_data.time < closing_time - delta_2)
    ]
    vwap = (vwap_interval.close * vwap_interval.volume).sum() / vwap_interval.volume.sum()
    transformation = {
        'opening_price': minute_data.open.iloc[0],
        'vwap': vwap,
        'closing_price': minute_data.close.iloc[-1],
    }
    return pd.Series(transformation)


def prev_day_info(summarised_daily_data: pd.DataFrame) -> pd.DataFrame:
    '''
    Returns dataframe where for given date it gets the VWAP and closing price 
    from the previous day.
    
    :param summarised_daily_data: dataframe over only one instrument
    '''
    transformation = {
        'date': summarised_daily_data.date,
        'instrument': summarised_daily_data.instrument,
        'prev_day_closing_price': summarised_daily_data.closing_price.shift(),
        'prev_day_vwap': summarised_daily_data.vwap.shift(),
        'opening_price': summarised_daily_data.opening_price
        
    }
    return pd.DataFrame(transformation)

In [None]:
transformed_sp500_data = transform_data(raw_sp500_data, delta_1 = 1800, delta_2 = 300)
transformed_sp500_data['relative_diff'] = transformed_sp500_data['prev_day_closing_price'] / transformed_sp500_data['prev_day_vwap'] - 1
transformed_sp500_data['over_night_change'] = transformed_sp500_data['opening_price'] / transformed_sp500_data['prev_day_closing_price'] - 1
transformed_sp500_data.head()

In [None]:
transformed_sp500_data.sort_values(['instrument', 'date']).head()

# Analysing potential profits

We look at profits in relative terms (returns).

Be aware that we do not account for dividends in the profit analysis below.

## Let's see what the usual overnight returns are

In [None]:
import returns_plot
returns_plot.plot_afternoon_returns_on_morning_returns_error_bar(
    results = transformed_sp500_data,
    epsilon = 0.001,
    width = 0.02,
    empirical_distribution_returns = True
)

## Now let's see what our expected profit would be if we only traded some of those situations

In [None]:
def generate_signals_and_calculate_profit(data: pd.DataFrame, min_relative_inefficiency: float, max_relative_inefficiency: float) -> None:
    data['signal'] = 'None'
    data.loc[
        (data.relative_diff > -max_relative_inefficiency) & (data.relative_diff < -min_relative_inefficiency), 
        'signal'
    ] = 'Buy'
    data.loc[
        (data.relative_diff > min_relative_inefficiency) & (data.relative_diff < max_relative_inefficiency), 
        'signal'
    ] = 'Sell'
    data['profit'] = np.nan
    data.loc[data.signal == 'Buy', 'profit'] = data[data.signal == 'Buy'].over_night_change
    data.loc[data.signal == 'Sell', 'profit'] = -data[data.signal == 'Sell'].over_night_change

### First, we pick the parameters by hand

In [None]:
min_relative_inefficiency = 0.0025
max_relative_inefficiency = 0.0250

In [None]:
test_data = transformed_sp500_data.copy()
generate_signals_and_calculate_profit(test_data, min_relative_inefficiency, max_relative_inefficiency)
test_data

In [None]:
def plot_profit_per_day(data: pd.DataFrame) -> None:
    profit_per_day = data[~pd.isnull(data['profit'])].sort_values('date').groupby('date')['profit'].mean()
    trace = go.Bar(
        x = profit_per_day.index,
        y = profit_per_day.values
    )
    iplot({'data': [trace], 'layout': {'title': 'Profit'}})

In [None]:
plot_profit_per_day(test_data)

### Second, we try to optimise the parameters

#### In Sample / Out Of Sample split

We keep the first two thirds of data In Sample and the remaining one third Out Of Sample.

In [None]:
unique_sorted_dates = list(transformed_sp500_data['date'].sort_values().drop_duplicates())
last_in_sample_date = unique_sorted_dates[round(len(unique_sorted_dates) * 2 / 3)]

In [None]:
in_sample_transformed_sp500_data = transformed_sp500_data[transformed_sp500_data['date'] <= last_in_sample_date]
out_of_sample_transformed_sp500_data = transformed_sp500_data[transformed_sp500_data['date'] > last_in_sample_date]

In [None]:
print('In Sample starting date: {} and ending date: {}'.format(
    in_sample_transformed_sp500_data['date'].min().date(), in_sample_transformed_sp500_data['date'].max().date()
))
print('Out Of Sample starting date: {} and ending date: {}'.format(
    out_of_sample_transformed_sp500_data['date'].min().date(), out_of_sample_transformed_sp500_data['date'].max().date()
))

#### Objective function

In [None]:
def calc_profit(min_relative_inefficiency: float, max_relative_inefficiency: float, data: pd.DataFrame, fee: float) -> float:
    '''
    Calculate total profit for the given input, adjusted for fees. Liquidity, short fees, risk etc. are blithely ignored.
    '''
    buy_profits = data[
        (data.relative_diff > -max_relative_inefficiency) & (data.relative_diff < -min_relative_inefficiency)
    ].over_night_change - fee
    sell_profits = -data[
        (data.relative_diff > min_relative_inefficiency) & (data.relative_diff < max_relative_inefficiency)
    ].over_night_change - fee
    
    if (buy_profits.count() + sell_profits.count()) > 25:
        return buy_profits.sum() + sell_profits.sum()
    else:
        # Ignore parameters for which we get too few observations.
        return float('NaN')

This is what the optimisation space looks like:

In [None]:
min_relative_inefficiencies: List[float] = []
max_relative_inefficiencies: List[float] = []
profits: List[float] = []

for min_relative_inefficiency in np.linspace(0, 0.02, num = 100):
    for max_relative_inefficiency in np.linspace(min_relative_inefficiency, 0.02, num = 100):
        min_relative_inefficiencies.append(min_relative_inefficiency)
        max_relative_inefficiencies.append(max_relative_inefficiency)
        profit = calc_profit(
            min_relative_inefficiency = min_relative_inefficiency, 
            max_relative_inefficiency = max_relative_inefficiency, 
            data = in_sample_transformed_sp500_data, 
            fee = 0.0025
        )
        profits.append(profit)

optimization_space = go.Scatter3d(
    x = min_relative_inefficiencies,
    y = max_relative_inefficiencies,
    z = profits,
    mode = 'markers',
    marker = {
        'size': 3,
        'line': {
            'color': 'rgba(217, 217, 217, 0.4)',
            'width': 0.1
        }
    }
)
layout = go.Layout(margin = {'l': 0, 'r': 0, 'b': 0, 't': 0})
iplot(go.Figure(data = [optimization_space], layout = layout))

#### Brute force optimization

In [None]:
optimisation_result = scipy.optimize.brute(
    func = lambda x: -calc_profit(
        min_relative_inefficiency = x[0], 
        max_relative_inefficiency = x[1], 
        data = in_sample_transformed_sp500_data,
        fee = 0.0025
    ),
    ranges = (slice(0, 0.03, 0.001), slice(0, 0.03, 0.001)), 
    full_output = True,
    finish = scipy.optimize.fmin
)
optimisation_result[0]

#### Sequential Least Squares Programming optimization

In [None]:
constraints = (
    {'type': 'ineq', 'fun': lambda x:  x[0] - x[1]},
)

optimisation_result = scipy.optimize.minimize(
    fun = lambda param: -calc_profit(
        min_relative_inefficiency = param[0], 
        max_relative_inefficiency = param[1], 
        data = in_sample_transformed_sp500_data,
        fee = 0.0025
    ),
    x0 = (0.0050, 0.0200),
    method = 'SLSQP', 
    bounds = ((0, 0.03), (0, 0.03)),
    constraints = constraints,
)
optimisation_result.x

### Testing the optimised parameters on the Out Of Sample dataset

In [None]:
min_relative_inefficiency, max_relative_inefficiency = optimisation_result.x

In [None]:
out_of_sample_transformed_sp500_data = out_of_sample_transformed_sp500_data.copy()
generate_signals_and_calculate_profit(out_of_sample_transformed_sp500_data, min_relative_inefficiency, max_relative_inefficiency)
out_of_sample_transformed_sp500_data[out_of_sample_transformed_sp500_data['signal'] != 'None']

In [None]:
plot_profit_per_day(out_of_sample_transformed_sp500_data)

In [None]:
oos_profits = out_of_sample_transformed_sp500_data[~pd.isnull(out_of_sample_transformed_sp500_data.profit)].profit
oos_profits.describe()

We can calculate confidence intervals too, but be aware that we have few observations to draw from.

In [None]:
print('95% confidence interval from empirical distribution. In percentages. OOS.')
print((oos_profits * 100).quantile([0.025, 0.975]).values)
print('\n95% confidence interval from normal distribution. In percentages. OOS.')
print(statsmodels.stats.api.DescrStatsW(oos_profits * 100).zconfint_mean(alpha = 0.05))