<a href="https://colab.research.google.com/github/Camgamez/AlgorithmsUN2024II/blob/main/lab5/icgamezc_lab5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Quantiacs Laboratory:

## Setup the Enviroment

The following cells are the setup for the quantiacs tools on Colab, these cells will not be moved to the quantiacs stratgy.

In [1]:
###DEBUG###

! pip install git+https://github.com/quantiacs/toolbox.git 2>/dev/null

# decrease height
from IPython.display import Javascript
display(Javascript('google.colab.output.setIframeHeight(0, true, {maxHeight: 100})'))

Collecting git+https://github.com/quantiacs/toolbox.git
  Cloning https://github.com/quantiacs/toolbox.git to /tmp/pip-req-build-3dpimwyw
  Resolved https://github.com/quantiacs/toolbox.git to commit 568159460bcffea00317cd4a8ef57ad1b0eb4141
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting scipy>=1.14.0 (from qnt==0.0.408)
  Downloading scipy-1.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Collecting xarray==2024.6.0 (from qnt==0.0.408)
  Downloading xarray-2024.6.0-py3-none-any.whl.metadata (11 kB)
Collecting progressbar2<4,>=3.55 (from qnt==0.0.408)
  Downloading progressbar2-3.55.0-py2.py3-none-any.whl.metadata (11 kB)
Collecting cftime==1.6.4 (from qnt==0.0.408)
  Downloading cftime-1.6.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.7 kB)
Collecting plotly==5.22.0 (from qnt==0.0.408)
  Down

<IPython.core.display.Javascript object>

In [2]:
###DEBUG###
import os

os.environ['API_KEY'] = "de356974-b758-445a-9891-d13b0524d442"
os.environ['NONINTERACT'] = 'True'

## The Strategy:

As per the lab's requirements, this part will be first tranining the model to process the information and also setting up a strategy that can help improvde the sharp ratio of the strategy.

Let's begin by importing all the necesary libraries:

In [3]:
# This cell will import all the necesarry libraries for the data manipulation, AI training and technical analisis.

import logging

import xarray as xr  # xarray for data manipulation

import qnt.data as qndata     # functions for loading data
import qnt.backtester as qnbt # built-in backtester
import qnt.ta as qnta         # technical analysis library
import qnt.stats as qnstats   # statistical functions

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

np.seterr(divide = "ignore")

from qnt.ta.macd import macd
from qnt.ta.rsi  import rsi
from qnt.ta.stochastic import stochastic_k, stochastic, slow_stochastic

from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.metrics import explained_variance_score
from sklearn.metrics import mean_absolute_error
from sklearn.neural_network import MLPRegressor

NOTICE: The environment variable DATA_BASE_URL was not specified. The default value is 'https://data-api.quantiacs.io/'
NOTICE: The environment variable CACHE_RETENTION was not specified. The default value is '7'
NOTICE: The environment variable CACHE_DIR was not specified. The default value is 'data-cache'


Now let's load the data from the S&P500 market. We are working with the information from January 1st, 2006 as required by the Quantiacs Contest:

We will not be choosing any specific asset, thus we are working with all the titles in the market:

In [4]:
# loading S&P500 stock data
data = qndata.stocks.load_spx_data(min_date="2006-06-01")

| |#                                              | 15975 Elapsed Time: 0:00:00
| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3362808 Elapsed Time: 0:00:01


fetched chunk 1/12 4s


| |          #                                  | 3391895 Elapsed Time: 0:00:01


fetched chunk 2/12 7s


| |          #                                  | 3328518 Elapsed Time: 0:00:01


fetched chunk 3/12 9s


| |          #                                  | 3157863 Elapsed Time: 0:00:01


fetched chunk 4/12 11s


| |          #                                  | 3466659 Elapsed Time: 0:00:01


fetched chunk 5/12 14s


| |         #                                   | 3475668 Elapsed Time: 0:00:00


fetched chunk 6/12 17s


| |          #                                  | 3413308 Elapsed Time: 0:00:01


fetched chunk 7/12 19s


| |          #                                  | 3513956 Elapsed Time: 0:00:01


fetched chunk 8/12 22s


| |             #                               | 3475422 Elapsed Time: 0:00:01


fetched chunk 9/12 25s


| |         #                                   | 3572967 Elapsed Time: 0:00:00


fetched chunk 10/12 28s


| |           #                                 | 3789447 Elapsed Time: 0:00:01


fetched chunk 11/12 31s


| |        #                                    | 2532239 Elapsed Time: 0:00:00


fetched chunk 12/12 33s
Data loaded 34s


In the next block of code, we are building the learning strategy for the lab, using custom indicators:

In [47]:
def get_features(data):
    """Builds the features used for learning:
       * a trend indicator;
       * the moving average convergence divergence;
       * a volatility measure;
       * the stochastic oscillator;
       * the relative strength index;
       * the logarithm of the closing price.
       These features can be modified and new ones can be added easily.
    """
    # Moving Avarages:
    close     = data.sel(field="close")
    sma_slow  = qnta.sma(close, 20)
    sma_fast  = qnta.sma(close, 5)
    weights   = xr.where(sma_slow < sma_fast, 1, -1)

    # trend:
    trend = qnta.roc(qnta.lwma(data.sel(field="close"), 60), 1)

    # moving average convergence  divergence (MACD):
    macd = qnta.macd(data.sel(field="close"))
    macd2_line, macd2_signal, macd2_hist = qnta.macd(data, 12, 26, 9)

    # volatility:
    volatility = qnta.tr(data.sel(field="high"), data.sel(field="low"), data.sel(field="close"))
    volatility = volatility / data.sel(field="close")
    volatility = qnta.lwma(volatility, 14)

    # the stochastic oscillator:
    k, d = qnta.stochastic(data.sel(field="high"), data.sel(field="low"), data.sel(field="close"), 14)

    # the relative strength index:
    rsi = qnta.rsi(data.sel(field="close"))

    # the logarithm of the closing price:
    price = data.sel(field="close").ffill("time").bfill("time").fillna(0) # fill NaN
    price = np.log(price)

    # combine the features:
    result = xr.concat(
        [weights, trend, macd2_signal.sel(field="close"), volatility,  d, rsi, price],
        pd.Index(
            ["weight", "trend",  "macd", "volatility", "stochastic_d", "rsi", "price"],
            name = "field"
        )
    )

    return result.transpose("time", "field", "asset")

Once the strategy has been created, we will display the table of all the assets per date:

In [14]:
# displaying the features:
my_features = get_features(data)
display(my_features.sel(field="trend").to_pandas())

asset,NAS:AAL,NAS:AAPL,NAS:ABNB,NAS:ACGL,NAS:ADBE,NAS:ADI,NAS:ADP,NAS:ADSK,NAS:AEP,NAS:AKAM,...,NYS:WMB,NYS:WMT,NYS:WRB,NYS:WST,NYS:WY,NYS:XOM,NYS:XYL,NYS:YUM,NYS:ZBH,NYS:ZTS
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-01,,,,,,,,,,,...,,,,,,,,,,
2006-06-02,,,,,,,,,,,...,,,,,,,,,,
2006-06-05,,,,,,,,,,,...,,,,,,,,,,
2006-06-06,,,,,,,,,,,...,,,,,,,,,,
2006-06-07,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-01-16,0.569163,-0.130814,-0.066394,-0.040792,-0.389489,-0.053140,-0.004229,-0.093188,0.044422,-0.198263,...,0.234734,0.089404,0.023133,0.150646,-0.038029,-0.094003,-0.108694,-0.193919,0.044165,-0.090313
2025-01-17,0.544742,-0.106117,0.007788,-0.054083,-0.361635,0.013001,-0.005509,-0.077692,0.061599,-0.171672,...,0.225143,0.106371,-0.023777,0.134483,-0.013623,-0.060482,-0.082505,-0.213656,0.061768,-0.149722
2025-01-21,0.597469,-0.204481,-0.041558,-0.068457,-0.310216,0.049379,0.015324,-0.018030,0.082709,-0.034909,...,0.305123,0.140584,-0.039390,0.209629,0.014603,-0.081688,-0.015147,-0.201165,0.088853,-0.110754
2025-01-22,0.579249,-0.185700,-0.054508,-0.119008,-0.297738,0.090028,-0.005745,0.030345,0.072732,-0.014830,...,0.212376,0.139154,-0.047693,0.231232,-0.056225,-0.134267,-0.018488,-0.185555,0.061156,-0.121178


The following function (provided by the colab with the original streategy) identifies if the price of the required asset went up or down at the determined interval of time:

In [33]:
def get_target_classes(data):
    """ Target classes for predicting if price goes up or down."""

    price_current = data.sel(field="close")
    price_future  = qnta.shift(price_current, -1)

    class_positive = 1 # prices goes up
    class_neutral = 0 # price change is less than 0.01%
    class_negative = -1 # prices goes down
    price_change = (price_future - price_current) / price_current
    price_change = price_change.fillna(0)
    price_change = xr.where(price_change > 0.0001, class_positive, price_change)
    price_change = xr.where(price_change < -0.0001, class_negative, price_change)
    price_change = xr.where((price_change <= 0.0001) & (price_change >= -0.0001), class_neutral, price_change)
    price_change = price_change.transpose("time", "asset")

    return price_change

In [34]:
# displaying the target classes:
my_targetclass = get_target_classes(data)
display(my_targetclass.to_pandas())

asset,NAS:AAL,NAS:AAPL,NAS:ABNB,NAS:ACGL,NAS:ADBE,NAS:ADI,NAS:ADP,NAS:ADSK,NAS:AEP,NAS:AKAM,...,NYS:WMB,NYS:WMT,NYS:WRB,NYS:WST,NYS:WY,NYS:XOM,NYS:XYL,NYS:YUM,NYS:ZBH,NYS:ZTS
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-01,0.0,-1.0,0.0,1.0,1.0,1.0,-1.0,-1.0,1.0,-1.0,...,1.0,-1.0,-1.0,-1.0,-1.0,1.0,0.0,-1.0,1.0,0.0
2006-06-02,0.0,-1.0,0.0,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,-1.0,0.0
2006-06-05,0.0,-1.0,0.0,-1.0,1.0,-1.0,1.0,1.0,1.0,1.0,...,-1.0,-1.0,-1.0,1.0,-1.0,1.0,0.0,1.0,1.0,0.0
2006-06-06,0.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,...,-1.0,0.0,1.0,-1.0,-1.0,-1.0,0.0,1.0,1.0,0.0
2006-06-07,0.0,1.0,0.0,1.0,-1.0,1.0,-1.0,-1.0,1.0,-1.0,...,-1.0,1.0,-1.0,1.0,-1.0,1.0,0.0,-1.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-01-16,-1.0,1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,1.0,...,-1.0,1.0,-1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,-1.0
2025-01-17,1.0,-1.0,-1.0,-1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,1.0,1.0
2025-01-21,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,1.0,-1.0,1.0,...,-1.0,1.0,-1.0,1.0,-1.0,-1.0,-1.0,1.0,-1.0,-1.0
2025-01-22,-1.0,-1.0,1.0,-1.0,0.0,1.0,-1.0,-1.0,-1.0,1.0,...,1.0,1.0,-1.0,-1.0,1.0,1.0,-1.0,1.0,1.0,1.0


The next two cells define the function `get_model()` constructor for the kind of model to be utilized, in this case we will use a Stochastic Gradient Descend Model as this model works well with large datasets.

Then, the function `get_model()`is used to initiate and train our model

In [35]:
def get_model():
    """This is a constructor for the ML model (Bayesian Ridge) which can be easily
       modified for using different models.
    """

    # model = linear_model.BayesianRidge()
    model = MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), random_state=1, max_iter=25000, tol=0.0001)
    return model

In [36]:
# Create and train the models working on an asset-by-asset basis.

asset_name_all = data.coords["asset"].values

models = dict()

for asset_name in asset_name_all:

        # drop missing values:
        target_cur   = my_targetclass.sel(asset=asset_name).dropna("time", how= "any")
        features_cur = my_features.sel(asset=asset_name).dropna("time", how= "any")

        # align features and targets:
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join="inner")

        if len(features_cur.time) < 10:
            # not enough points for training
                continue

        model = get_model()

        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

        except:
            logging.exception("model training failed")

print(models)

{'NAS:AAL': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:AAPL': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ABNB': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ACGL': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ADBE': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ADI': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ADP': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ADSK': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:AEP': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:AKAM': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'NAS:ALGN': MLPRegressor(hidden_layer_sizes=(8, 4, 8, 4), max_iter=25000, random_state=1), 'N

Here we evaluate the importance of each of the features in the prediction of the training model:

In [37]:
# Showing which features are more important in predicting:

importance = models["NAS:AAPL"].coefs_
print(importance)


[array([[-0.15888823,  0.21752375, -0.64838667, -0.24541528, -0.08784521,
        -0.43618406, -0.32425792, -0.32295647],
       [-0.14507247,  0.09704977, -0.06056719,  0.2471648 , -0.04951138,
         0.4921657 , -0.62271344,  0.16716478],
       [-0.144504  ,  0.04980399, -0.3780136 , -0.37494546,  0.05257744,
         0.64733339, -0.22776016,  0.15088392],
       [ 0.47108601,  0.4276378 , -0.56358726, -0.58098473, -0.07143466,
         0.48438436, -0.46137754, -0.12610467],
       [ 0.59566325,  0.04350736,  0.18744619, -0.24038088,  0.00683198,
         0.41411861, -0.58113411,  0.29456361],
       [ 0.63875676,  0.29602909, -0.32745171,  0.32499141, -0.12210045,
        -0.05833765,  0.51076542, -0.2932828 ],
       [-0.21728313, -0.43211349, -0.66322985,  0.16440625, -0.04541653,
        -0.30046713, -0.09535925, -0.58930173]]), array([[-6.66294727e-01,  3.28026488e-02,  2.38527822e-01,
         2.85543745e-02],
       [ 5.99042554e-01,  1.08713642e-01,  5.66770227e-01,
      

Now, we choose those features that were evaluated less than 1 to generate the prediction and calculate the sharp ratio:

In [38]:
# Performs prediction and generates output weights:

asset_name_all = data.coords["asset"].values
weights = xr.zeros_like(data.sel(field="is_liquid"))

for asset_name in asset_name_all:
    if asset_name in models:
        model = models[asset_name]
        features_all = my_features
        features_cur = features_all.sel(asset=asset_name).dropna("time", how="any")
        if len(features_cur.time) < 1:
            continue
        try:
            weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = model.predict(features_cur.values)
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception("model prediction failed")

print(weights)

<xarray.DataArray 'stocks_s&p500' (time: 4692, asset: 516)> Size: 19MB
array([[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       ...,
       [-6.45552103e-02,  3.40896878e-01,  3.42462685e-02, ...,
         1.15848341e-01, -3.93524019e-03,  4.72009276e-02],
       [-5.91992067e-02,  2.88396707e-02,  8.99333023e-02, ...,
         1.09035092e-01, -1.47104487e-04,  2.25763755e-02],
       [-2.68766144e-02, -3.44854596e-02,  5.09035298e-02, ...,
         1.02080556e-01,  4.48506024e-03,  5.52511579e-02]])
Coordinates:
  * time     (time) datetime64[ns] 38kB 2006-06-01 2006-06-02 ... 2025-01-23
    field    <U9 36B 'is_liquid'
  * asset    (asset) <U9 19kB 'NAS:AAL' 'NAS:AAPL

In [39]:
def get_sharpe(stock_data, weights):
    """Calculates the Sharpe ratio"""
    rr = qnstats.calc_relative_return(stock_data, weights)
    sharpe = qnstats.calc_sharpe_ratio_annualized(rr).values[-1]
    return sharpe

sharpe = get_sharpe(data, weights)
sharpe

0.45123415004383083

In [40]:
import qnt.graph as qngraph

statistics = qnstats.calc_stat(data, weights)

display(statistics.to_pandas().tail())

performance = statistics.to_pandas()["equity"]
qngraph.make_plot_filled(performance.index, performance, name="PnL (Equity)", type="log")

display(statistics[-1:].sel(field = ["sharpe_ratio"]).transpose().to_pandas())

# check for correlations with existing strategies:
qnstats.print_correlation(weights,data)



field,equity,relative_return,volatility,underwater,max_drawdown,sharpe_ratio,mean_return,bias,instruments,avg_turnover,avg_holding_time
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-01-16,3.723361,0.007079,0.164569,-0.0129,-0.462762,0.444941,0.073224,0.714532,516.0,0.469194,4.179586
2025-01-17,3.738051,0.003945,0.164554,-0.009005,-0.462762,0.446265,0.073435,0.709464,516.0,0.469197,4.1796
2025-01-21,3.77482,0.009836,0.164551,0.0,-0.462762,0.449606,0.073983,0.63747,516.0,0.469183,4.179586
2025-01-22,3.774191,-0.000167,0.164533,-0.000167,-0.462762,0.449496,0.073957,0.634855,516.0,0.469174,4.179676
2025-01-23,3.793616,0.005147,0.16452,0.0,-0.462762,0.451234,0.074237,0.63623,516.0,0.469164,4.181109


time,2025-01-23
field,Unnamed: 1_level_1
sharpe_ratio,0.451234


NOTICE: The environment variable ENGINE_CORRELATION_URL was not specified. The default value is 'https://quantiacs.io/referee/submission/forCorrelation'
NOTICE: The environment variable STATAN_CORRELATION_URL was not specified. The default value is 'https://quantiacs.io/statan/correlation'
NOTICE: The environment variable PARTICIPANT_ID was not specified. The default value is '0'



Ok. This strategy does not correlate with other strategies.


In [41]:
"""R2 (coefficient of determination) regression score function."""
r2_score(my_targetclass, weights, multioutput="variance_weighted")

0.0022673039874484004

In [42]:
"""The explained variance score explains the dispersion of errors of a given dataset"""
explained_variance_score(my_targetclass, weights, multioutput="uniform_average")

0.003218861741050234

In [43]:
"""The explained variance score explains the dispersion of errors of a given dataset"""
mean_absolute_error(my_targetclass, weights)

0.8978431119198941

In [44]:
def train_model(data):
    """Create and train the model working on an asset-by-asset basis."""

    asset_name_all = data.coords["asset"].values
    features_all   = get_features(data)
    target_all     = get_target_classes(data)

    models = dict()

    for asset_name in asset_name_all:

        # drop missing values:
        target_cur   = target_all.sel(asset=asset_name).dropna("time", how= "any")
        features_cur = features_all.sel(asset=asset_name).dropna("time", how= "any")

        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join="inner")

        if len(features_cur.time) < 10:
                continue

        model = get_model()

        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

        except:
            logging.exception("model training failed")

    return models

In [45]:
def predict_weights(models, data):
    """The model predicts if the price is going up or down.
       The prediction is performed for several days in order to speed up the evaluation."""

    asset_name_all = data.coords["asset"].values
    weights = xr.zeros_like(data.sel(field="close"))

    for asset_name in asset_name_all:
        if asset_name in models:
            model = models[asset_name]
            features_all = get_features(data)
            features_cur = features_all.sel(asset=asset_name).dropna("time", how="any")

            if len(features_cur.time) < 1:
                continue

            try:
                weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = model.predict(features_cur.values)

            except KeyboardInterrupt as e:
                raise e

            except:
                logging.exception("model prediction failed")

    return weights

In [46]:
# Calculate weights using the backtester:
weights = qnbt.backtest_ml(
    train                         = train_model,
    predict                       = predict_weights,
    train_period                  =  2 *365,  # the data length for training in calendar days
    retrain_interval              = 10 *365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit = 1,        # how often retrain models after submission during evaluation (calendar days)
    predict_each_day              = False,    # Is it necessary to call prediction for every day during backtesting?
                                              # Set it to True if you suspect that get_features is looking forward.
    competition_type              = "stocks_s&p500",  # competition type
    lookback_period               = 365,                 # how many calendar days are needed by the predict function to generate the output
    start_date                    = "2005-06-01",        # backtest start date
    analyze                       = True,
    build_plots                   = True  # do you need the chart?  #
)

Run the last iteration...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3750357 Elapsed Time: 0:00:01


fetched chunk 1/2 52s


| |      #                                       | 983191 Elapsed Time: 0:00:00


fetched chunk 2/2 73s
Data loaded 74s


| |    #                                         | 352449 Elapsed Time: 0:00:00


fetched chunk 1/1 7s
Data loaded 7s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Fix liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.


NOTICE: The environment variable OUTPUT_PATH was not specified. The default value is 'fractions.nc.gz'


Write output: fractions.nc.gz


NOTICE: The environment variable OUT_STATE_PATH was not specified. The default value is 'state.out.pickle.gz'


State saved.
---
Run First Iteration...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3056052 Elapsed Time: 0:00:01


fetched chunk 1/2 6s


| |      #                                       | 928366 Elapsed Time: 0:00:00


fetched chunk 2/2 9s
Data loaded 9s
---
Run all iterations...
Load data...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |         #                                   | 3164498 Elapsed Time: 0:00:00


fetched chunk 1/15 2s


| |          #                                  | 3106874 Elapsed Time: 0:00:01


fetched chunk 2/15 5s


| |           #                                 | 3253379 Elapsed Time: 0:00:01


fetched chunk 3/15 7s


| |        #                                    | 2722420 Elapsed Time: 0:00:00


fetched chunk 4/15 10s


| |         #                                   | 3081043 Elapsed Time: 0:00:00


fetched chunk 5/15 12s


| |           #                                 | 3310749 Elapsed Time: 0:00:01


fetched chunk 6/15 15s


| |         #                                   | 3391833 Elapsed Time: 0:00:00


fetched chunk 7/15 17s


| |                  #                          | 3336148 Elapsed Time: 0:00:01


fetched chunk 8/15 21s


| |          #                                  | 3347696 Elapsed Time: 0:00:01


fetched chunk 9/15 23s


| |          #                                  | 3444516 Elapsed Time: 0:00:01


fetched chunk 10/15 26s


| |         #                                   | 3386242 Elapsed Time: 0:00:00


fetched chunk 11/15 28s


| |          #                                  | 3434772 Elapsed Time: 0:00:01


fetched chunk 12/15 31s


| |          #                                  | 3665076 Elapsed Time: 0:00:01


fetched chunk 13/15 33s


| |         #                                   | 3771271 Elapsed Time: 0:00:00


fetched chunk 14/15 36s


| |     #                                        | 846250 Elapsed Time: 0:00:00


fetched chunk 15/15 37s
Data loaded 38s


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3229446 Elapsed Time: 0:00:01


fetched chunk 1/13 3s


| |        #                                    | 3298942 Elapsed Time: 0:00:00


fetched chunk 2/13 5s


| |        #                                    | 3206527 Elapsed Time: 0:00:00


fetched chunk 3/13 7s


| |          #                                  | 3083048 Elapsed Time: 0:00:01


fetched chunk 4/13 10s


| |         #                                   | 3285079 Elapsed Time: 0:00:00


fetched chunk 5/13 12s


| |          #                                  | 3367654 Elapsed Time: 0:00:01


fetched chunk 6/13 14s


| |         #                                   | 3390411 Elapsed Time: 0:00:00


fetched chunk 7/13 17s


| |          #                                  | 3357816 Elapsed Time: 0:00:01


fetched chunk 8/13 20s


| |         #                                   | 3447192 Elapsed Time: 0:00:00


fetched chunk 9/13 22s


| |           #                                 | 3427724 Elapsed Time: 0:00:01


fetched chunk 10/13 25s


| |         #                                   | 3524344 Elapsed Time: 0:00:00


fetched chunk 11/13 27s


| |          #                                  | 3712710 Elapsed Time: 0:00:01


fetched chunk 12/13 30s


| |        #                                    | 1904142 Elapsed Time: 0:00:00


fetched chunk 13/13 32s
Data loaded 33s
Backtest...


| |#                                              | 15975 Elapsed Time: 0:00:00
| |          #                                  | 3249716 Elapsed Time: 0:00:01


fetched chunk 1/13 2s


| |         #                                   | 3320862 Elapsed Time: 0:00:00


fetched chunk 2/13 5s


| |         #                                   | 3229217 Elapsed Time: 0:00:00


fetched chunk 3/13 7s


| |         #                                   | 3098627 Elapsed Time: 0:00:00


fetched chunk 4/13 10s


| |         #                                   | 3308922 Elapsed Time: 0:00:00


fetched chunk 5/13 12s


| |                                           # | 3391649 Elapsed Time: 0:00:04


fetched chunk 6/13 18s


| |          #                                  | 3412015 Elapsed Time: 0:00:01


fetched chunk 7/13 21s


| |          #                                  | 3380368 Elapsed Time: 0:00:01


fetched chunk 8/13 23s


| |         #                                   | 3468124 Elapsed Time: 0:00:00


fetched chunk 9/13 26s


| |          #                                  | 3449454 Elapsed Time: 0:00:01


fetched chunk 10/13 28s


| |          #                                  | 3549355 Elapsed Time: 0:00:01


fetched chunk 11/13 31s


| |          #                                  | 3738218 Elapsed Time: 0:00:01


fetched chunk 12/13 33s


| |       #                                     | 1917696 Elapsed Time: 0:00:00


fetched chunk 13/13 35s
Data loaded 38s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Fix liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.


NOTICE: The environment variable OUTPUT_PATH was not specified. The default value is 'fractions.nc.gz'


Write output: fractions.nc.gz


NOTICE: The environment variable OUT_STATE_PATH was not specified. The default value is 'state.out.pickle.gz'


State saved.
---
Analyze results...
Check...
Check liquidity...
Ok.
Check missed dates...
Ok.
Check the sharpe ratio...
Period: 2006-01-01 - 2025-01-23
Sharpe Ratio = -1.119775783777139


ERROR! The Sharpe Ratio is too low. -1.119775783777139 < 0.7
Improve the strategy and make sure that the in-sample Sharpe Ratio more than 0.7.


---
Align...
Calc global stats...
---
Calc stats per asset...
Build plots...
---
Output:


asset,NAS:AAL,NAS:AAPL,NAS:ABNB,NAS:ACGL,NAS:ADBE,NAS:ADI,NAS:ADP,NAS:ADSK,NAS:AEP,NAS:AKAM
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2025-01-08,0.0,0.003813,0.0,0.002471,0.013328,0.001431,0.001471,-0.000927,0.001749,0.001272
2025-01-10,0.0,0.003803,0.0,0.001681,0.017008,0.002412,0.001467,0.000203,0.002164,0.001195
2025-01-13,0.0,0.003928,0.0,0.004213,0.017433,0.001093,0.001516,-0.000571,0.002236,0.001071
2025-01-14,0.0,0.003978,0.0,-0.000673,0.013718,-0.002438,0.001535,0.001031,0.002264,0.000466
2025-01-15,0.0,0.004081,0.0,0.002258,0.004674,-0.001698,0.001575,-0.002344,0.002323,0.001521
2025-01-16,0.0,0.004108,0.0,0.002255,0.00268,8.1e-05,0.001585,-0.000289,0.002338,0.002026
2025-01-17,0.0,0.004262,0.0,0.001698,-0.001069,0.000769,0.001644,0.000546,0.002426,0.00222
2025-01-21,0.0,0.003573,0.0,0.002207,0.001154,0.000741,0.00169,0.000514,0.002493,-0.001541
2025-01-22,0.0,0.004601,0.0,0.003601,0.00213,0.002616,0.001775,0.000195,0.002619,0.001388
2025-01-23,0.0,0.004648,0.0,0.000851,0.001128,0.002626,0.001794,0.002881,0.002646,0.001826


Stats:


field,equity,relative_return,volatility,underwater,max_drawdown,sharpe_ratio,mean_return,bias,instruments,avg_turnover,avg_holding_time
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-01-08,0.080202,-0.002771,0.109398,-0.920706,-0.92098,-1.105045,-0.120889,0.203435,475.0,0.500693,3.845118
2025-01-10,0.079909,-0.003659,0.109389,-0.920996,-0.920996,-1.106427,-0.121031,0.259946,475.0,0.500666,3.845111
2025-01-13,0.079983,0.000931,0.109378,-0.920922,-0.920996,-1.105943,-0.120966,0.209947,475.0,0.500645,3.845079
2025-01-14,0.079854,-0.001609,0.109367,-0.92105,-0.92105,-1.106503,-0.121015,0.161619,475.0,0.500632,3.845083
2025-01-15,0.079912,0.000727,0.109357,-0.920992,-0.92105,-1.106104,-0.12096,0.184507,475.0,0.500625,3.845142
2025-01-16,0.079873,-0.000491,0.109346,-0.921031,-0.92105,-1.106207,-0.120959,0.166935,475.0,0.500625,3.845154
2025-01-17,0.079986,0.001408,0.109335,-0.92092,-0.92105,-1.105524,-0.120873,0.142895,475.0,0.500609,3.84514
2025-01-21,0.080385,0.004991,0.109331,-0.920525,-0.92105,-1.103314,-0.120627,0.219403,475.0,0.500581,3.845193
2025-01-22,0.080188,-0.002454,0.109321,-0.92072,-0.92105,-1.104214,-0.120714,0.24232,475.0,0.500554,3.84542
2025-01-23,0.080415,0.002839,0.109313,-0.920495,-0.92105,-1.102928,-0.120564,0.278565,475.0,0.500535,3.849115


---


100% (4944 of 4944) |####################| Elapsed Time: 0:12:45 Time:  0:12:45
