# Scoring notebook

This is the scoring notebook for the data driven competition at CMF 2022. You can change cells with `### YOUR CODE HERE` line, all other cells are read-only. However, you can add new cells to organize your code in a convenient way.

In [None]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from sklearn.metrics import mean_squared_error as mse
import matplotlib.pyplot as plt

Let us load the dataset. Columns in the test (public as well as private) dataset are equivalent to the train dataset.

In [None]:
dataset = pd.read_csv('dataset.zip', index_col=0, header=[0, 1])
dataset.rename(
    columns={
        'Unnamed: 209_level_1': 'count',
        'Unnamed: 210_level_1': 'price',
    },
    level = 1,
    inplace = True
)
dataset.head()

In [None]:
class Dataloader():
    def __init__(
        self, 
        dataframe: pd.DataFrame, 
        window_size: int, 
        step_size: int,
        horizon: int,
        first_pred: int
    ):
        self.df = dataframe
        self.window_size = window_size
        self.step_size = step_size
        self.horizon = horizon
        self.first_pred = first_pred
        assert self.first_pred > self.window_size
        feat_idx = []
        target_idx = []
        for i in range(self.first_pred, self.df.shape[0], self.step_size):
            feat_idx.append(range(i-self.horizon-self.window_size+1, i-self.horizon+1))
            target_idx.append(i)
        self.feat_idx = feat_idx
        self.target_idx = target_idx
    
    def __len__(self):
        return len(self.feat_idx)
    
    def __iter__(self):
        self.iter = 0
        return self

    def __next__(self):
        if self.iter < len(self.feat_idx):
            feat = self.df.iloc[self.feat_idx[self.iter]]
            target = self.df.iloc[self.target_idx[self.iter], -1]
            self.iter += 1
            return feat, target
        else:
            raise StopIteration

Column **price** represents the price at moment **t**. The task is to predict **price** values at moment **t+60**.

The forecasting problem is defined as follows. Consider the multivariate time series of features (exogenous variables) $X_0, X_1, \dots $ where $X_i \in \mathbb{R}^d$. Consider the univariate time series of targets (endogenous variables) $y_0, y_1, ...$ where $y_i \in \mathbb{R}$. The task is to predict the $y_{T+h}$ where $T \in \{1000, 1001, \dots\}$ is the last available time stamp and $h = 60$ is the forecasting horizon by the given _sliding window_ over pairs $(X, y)_{T-N+1}, (X, y)_{T-N+2}, \dots, (X, y)_T$ with the selected window size $1 \leq N \leq 1000$. The optimization problem is minimizing the mean squared error between predictions and targets.

Select the window size appropriately to your solution.

In [None]:
window_size = 1

In [None]:
assert (1 <= window_size) and (window_size <= 1000)

The dataloader defines the forecasting problem with the selected window size.

**Remark**: first 1060 observations in both test datasets will not be scored.

In [None]:
loader = Dataloader(
    dataframe=dataset, 
    window_size=window_size, 
    step_size=1, 
    horizon=60, 
    first_pred=1060)

for feat, target in loader:
    break
feat.shape, target

Define your forecasting model. You can install necessary libraries by `!pip install ... `. You can find installed packages in [requirements](https://github.com/vpozdnyakov/EvalAI/blob/master/requirements/worker.txt). Here is also CPU version of `torch==1.10.2`. Do not train the model here, instead download the weights of a pretrained model from your own cloud service, e.g. google drive by `gdown` as follows:

```python
!pip install gdown==4.2.0 -q
url = ...
gdown.download(url, 'model_scripted.pt', fuzzy=True)
model = torch.jit.load('model_scripted.pt')
```

You can change the template by adding additional methods, parameters, etc.

In [None]:
class ForecastingModel():
    def __init__(self):
        from sklearn.linear_model import LinearRegression
        params = [[[-0.0005384770380169127, -0.28350132504267106, 0.08173542855532759, -0.3533556673888432, -0.1825023020817937, 0.19240775895771112, 0.05842814654022345, 0.15589184067572892, -4.0010280924040586e-08, -0.0001560418797431816, -1.1194910087140958e-05, 3.531725891823081e-06, 6.363174597578372e-06, 5.507294423097976e-06, 1.14919389001785e-07, 1.3482157335226e-07, 1.6716546249357115e-06, 7.678973521230929e-05, 0.28135131273018776, -0.2471512015056614, 0.05398961536116234, 0.0010020714592310544, -0.103859796766626, 0.05918314572654251, 0.18913103291298525, 3.060710954431653e-08, 6.484978523007157e-05, 8.911404808897676e-07, -2.684842753048096e-06, -5.291599155613728e-06, -3.7812247085189463e-06, 1.5007968579144881e-06, 4.2780210114657085e-07, 1.444703553926674e-06, -0.00038915349018548914, -0.15842798281258108, 0.10941257716815127, 0.7888248655687403, 0.22769343628718924, -0.22593346557065577, 0.061720647968247735, 0.1398139306781656, 1.7677253771886114e-08, 0.00013291539395106355, -3.1147804856468397e-06, 1.2746090077841998e-07, -1.5185027710251653e-07, 3.828412932366243e-07, 5.341865633224675e-07, -1.2617212609769557e-07, 4.889758800363797e-07, -0.00010610426112480503, -0.45258946314784626, 0.17422322314411015, -0.0203399437335968, 0.06281398853740955, 0.0648509217209372, -0.0005981496922247556, 0.08501933545327886, 1.3411977597965752e-08, -0.0001698995892102078, -6.45503663562752e-07, -1.393201082694473e-07, 3.809091840382972e-07, -5.091388829425725e-07, -3.8974430971450147e-07, -6.880449741897277e-08, 1.6604769474681813e-07, 0.0007078406295252561, 0.40182212054020133, -0.22376893913637133, -0.47231873154221804, 0.06645793963706724, 0.10300136180046879, -0.062219509985710277, 0.0937961131568668, 2.2635667173975804e-08, 6.77014392188053e-05, -9.420240891511167e-07, -1.541225601309204e-07, 1.375844831549844e-06, -3.436281453828016e-07, -1.0039157509050822e-06, 1.4976822988732597e-07, 4.097994571738983e-07, -0.0003176873872619148, 0.32160901020646415, -0.12454792997109718, 0.6229762379974393, 0.058062551452090154, -0.49944341546025073, 0.052901458346109716, 0.2669684818179276, -8.5108378400367e-09, 0.00029107079868526337, -1.1462556463406447e-05, 3.078629020398377e-06, 8.50257098783131e-06, 8.759475457776993e-06, -1.0441604555043102e-06, 1.8069343689017248e-07, -1.1379588559599224e-06, 0.00023448165151132203, -0.34380825760686917, 0.14848044132150676, 0.34534224945183617, 0.13142753333269946, -0.038691876520365286, -0.07665568915612549, 0.07573799302941828, 3.154834378460425e-08, -0.0009297680691022945, -6.80718686777923e-07, -2.1293955702675338e-06, -6.734501524011116e-07, -1.3146534923022448e-06, 1.1637074204731007e-06, 3.822206090676322e-07, -1.581113965368891e-06, 0.0006548779882938217, 0.006733480073707004, -0.0049138947635682505, 0.007017170854113082, 0.057606340377457335, -0.21314800937128972, 0.10696066241883863, 0.2111198038039822, 1.884677217067754e-08, 0.0002757671798122637, -5.82164895043813e-06, 2.1441809075310736e-06, 2.980967497468942e-06, 2.6765840923093087e-06, -6.510601790049586e-07, -1.1155569609733185e-07, -1.7657771585570525e-06, 6.39325209333937e-05, 0.3254742556200609, 0.03637697169559885, 0.08516466841557169, 0.06409811944250465, -0.027415128960870738, -0.11834097821691957, -0.13071804924249464, -4.088533223089996e-08, 7.352573575121306e-05, -1.0419137162834269e-06, 9.469988786897643e-07, 1.6589943249495018e-06, 9.72721292492873e-07, -4.3865355849742604e-07, 3.0982291836928466e-07, -3.7005969327763566e-07, 5.160833553999496e-05, 0.0014113836227053985, -0.055087775431732336, 0.4477988483998684, -0.10039378681282562, -0.12270554570862333, 0.02836585063264479, 0.23743485597970188, 2.300329862703343e-08, -0.0001977352914587934, -4.118991583298981e-07, 2.2525024995223575e-07, -2.439956501589652e-06, -7.920617275253283e-07, 7.920860186430567e-07, -1.981434763487222e-08, -5.876902650567284e-07, -0.0012374938861763056, 0.0016416149319317085, -7.184799793955781, 2.895291919711242, -1.1189090648214244, 1.3749580985262484, 1.7332583785979025, 0.5341053694093885, 0.647269260651076, -0.0012374938854132036, 1.7580006328884795e-05, 5.205414163528553e-05, -0.00013053574702171422, 7.596054580065112e-05, 6.0578355621984146e-05, 5.5354268686951024e-05, -1.1035953491867877e-05, -8.868569403679572e-06, -2.559801988314585e-06, -0.0012975799890633144, -0.0006113355822128571, -0.5305832630992714, 0.42142251394756786, -1.147825202771184, 1.3577861252230532, 2.6837112485106425, -1.9165865941551998, -2.0157749554805195, -0.0012975799834342157, -4.715265640028825e-06, 0.0032049920152303437, 0.0001492836619500764, 4.783248644126079e-06, -0.0002238780778626205, -0.00024595439481092315, 3.313335294858813e-05, 3.491102358778894e-05, 2.273939211525208e-05, 4.036447588368075e-05, 0.05566480250667962]], [0.02875511759157945]]
        model2 = LinearRegression()
        model2.coef_ = np.array(params[0])
        model2.intercept_ = np.array(params[1])
        self.model2 = model2
        
    def forecast(self, feat):
        return self.model2.predict(feat)[0][0]
    

model = ForecastingModel()

In **forecast** function you can do preprocessing, e.g. deletion unnecessary data or aggregation.

In [None]:
def forecast(feat):
    return model.forecast(feat.fillna(0))

Scoring the model.

In [None]:
pred = []
target = []
for feat, _target in loader:
    pred.append(forecast(feat))
    target.append(_target)
mse(pred, target)

Let us draw the forecast visualization.

In [None]:
plt.figure(figsize=(12, 4))
plt.plot(target, label='target')
plt.plot(pred, label='forecast')
plt.title('Price of the asset')
plt.legend()
plt.show()

Example of 1000 forecasts.

In [None]:
plt.figure(figsize=(12, 4))
plt.plot(target[-100000:-99000], label='target')
plt.plot(pred[-100000:-99000], label='forecast')
plt.title('Price of the asset')
plt.legend()
plt.show()