[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crunchdao/quickstarters/blob/master/competitions/mid-one/quickstarters/regression_attacker/regression_attacker.ipynb)

![Banner](https://raw.githubusercontent.com/crunchdao/quickstarters/refs/heads/master/competitions/mid-one/assets/banner.webp)

# Regression Attacker

This notebook demonstrates how to create an `Attacker` described in [attacker.md](https://github.com/microprediction/midone/blob/main/midone/attackers/attacker.md). You may want to glance at this [notebook](../mean_reversion_attacker/mean_reversion_attacker.ipynb) also, if you seek more context or wish to know how these attackers can be used in a new tournament.

Here we'll use the river package to update a running regression.

## Setup

In [None]:
%pip install --upgrade midone

In [None]:
# Get a new token here: https://hub.crunchdao.com/competitions/mid-one/submit/via/notebook

%pip install --upgrade crunch-cli
!crunch setup --notebook mid-one hello --token aaaabbbbccccddddeeeeffff

## Imports

In [1]:
import typing
import collections

import pandas
from midone import EPSILON, HORIZON, Attacker
from midone.accounting.pnlutil import add_pnl_summaries, zero_pnl_summary
from river import linear_model
from tqdm.auto import tqdm

In [None]:
import crunch

crunch = crunch.load_notebook()

## Creating a Momentum based Attacker
We derive from `Attacker` and use `linear_model.LinearRegression` from the river package to maintain a regression estimate of the value `HORIZON` steps ahead. Then, we `buy` if the prediction is considerably higher than `EPSILON` above the current value, and conversely.



In [6]:
class MyAttacker(Attacker):
    """
    An attacker that uses an online linear regression model to predict future values
    and make trading decisions based on the expected profit exceeding EPSILON.
    """

    def __init__(
        self,
        num_lags=5,
        threshold=1.0,
        burn_in=1000,
        **kwargs
    ):
        """
        Initializes the attacker.

        Parameters:
        - num_lags (int): Number of lagged values to use as features.
        """
        super().__init__(**kwargs)

        self.num_lags = num_lags
        self.threshold = threshold
        self.burn_in = burn_in

        # Online linear regression model
        self.model = linear_model.LinearRegression(
            # Initialize intercept to 0
            intercept_init=0.0,

            # Freeze the intercept (no learning)
            intercept_lr=0.0
        )

        # Queue to store input vectors and time indices
        self.input_queue = collections.deque()
        self.current_ndx = 0

    def tick(self, x):
        """
        Processes the new data point.

        - Updates the time index.
        - Maintains a queue of input vectors.
        - When the future value arrives after HORIZON steps, updates the model.

        Parameters:
        - x (float): The new data point.
        """
        # The history is maintained by the parent class; no need to call tick_history()

        self.current_ndx += 1
        X_t = self.get_recent_history(n=self.num_lags)
        if len(X_t) >= self.num_lags:
            self.input_queue.append({
                'ndx': self.current_ndx,
                'X': X_t
            })

        # Check if we can update the model with data from HORIZON steps ago
        while self.input_queue and self.input_queue[0]['ndx'] <= self.current_ndx - HORIZON:
            # Retrieve the input vector and its time index
            past_data = self.input_queue.popleft()
            X_past = past_data['X']

            # The target value y is the data point at time 'time_past + HORIZON'
            # Since we're at 'current_time', and 'current_time = time_past + HORIZON', we can use 'x' as y
            y = x  # Current data point is the target for the input from HORIZON steps ago

            # Prepare the feature dictionary in the form demanded by river package
            X_past_dict = {
                f'lag_{i}': value
                for i, value in enumerate(X_past)
            }

            # Update the model incrementally
            self.model.learn_one(X_past_dict, y)

    def predict(self, horizon=HORIZON):
        """
        Makes a prediction for HORIZON steps ahead and decides whether to buy, sell, or hold.

        Parameters:
        - horizon (int): The prediction horizon (should be HORIZON).

        Returns:
        - int: 1 for buy, -1 for sell, 0 for hold.
        """

        if self.current_ndx < self.burn_in:
            return 0   # Not enough data for model to be reliable

        # Ensure we have enough history to make a prediction
        if len(self.history) < self.num_lags:
            return 0  # Not enough history to make a prediction

        # Create the input vector using the most recent 'lag' values
        X_t = list(self.history)[-self.num_lags:]
        X_t_dict = {
            f'lag_{i}': value
            for i, value in enumerate(X_t)
        }

        # Predict the future value HORIZON steps ahead
        y_pred = self.model.predict_one(X_t_dict)

        # Get the last known value
        last_value = X_t[-1]

        # Calculate the expected profit
        expected_profit = y_pred - last_value

        # Decide based on whether expected profit exceeds a multiple of EPSILON
        if expected_profit > self.threshold * EPSILON:
            return 1  # Buy
        elif expected_profit < -self.threshold * EPSILON:
            return -1  # Sell
        else:
            return 0  # Hold

### Explanation

### `tick` Method

The `tick` method processes a new incoming data point and updates the attacker's state accordingly:

- **Increment the Time Index**: The method updates `self.current_ndx` to track the current observation index.
- **Maintain Input History**: It retrieves the recent history of `num_lags` values and appends the new input vector (`X_t`) to the `input_queue`, associating it with the current index.
- **Update the Model**: The method checks if it has received enough future data (after `HORIZON` steps) to use an earlier input vector as a training example. If so, it pairs the input vector from `HORIZON` steps ago with the current data point `x` (used as the target value `y`) and incrementally updates the online regression model.

### `predict` Method

The `predict` method makes a decision based on the model’s prediction for the value `HORIZON` steps ahead:

- **Burn-in Check**: If the number of processed data points is less than the `burn_in` threshold, the model refrains from making predictions.
- **Prepare Input Features**: It checks if there's enough history to form an input vector of `num_lags` values. If there is, it prepares a dictionary of lagged values (`X_t_dict`) to be used by the model.
- **Prediction**: The method predicts the next value `HORIZON` steps ahead using the online regression model.
- **Decision Logic**: It calculates the expected profit by comparing the predicted future value with the last known value. If the expected profit exceeds a threshold (a multiple of `EPSILON`), it returns:
  - `1` (buy) if the profit is positive,
  - `-1` (sell) if the profit is negative,
  - `0` (hold) if the profit is too small to act upon.


## Run the attacker on mock data
We use `tick_and_predict` from the parent class as this will track profit and loss for us.

In [7]:
attacker = MyAttacker()  # Always reset an attacker

data = [1, 3, 4, 2, 4, 5, 1, 5, 2, 5, 10] * 100
for x in data:
    y = attacker.tick_and_predict(x=x)

## Run the attacker on real data
We reset the attacker every time it encounters a new stream, but track aggregate statistics.

In [None]:
x_train, x_test = crunch.load_streams()

In [None]:
total_pnl = []

for stream in tqdm(x_train):
    attacker = MyAttacker(num_lags=2, threshold=2.0, burn_in=1000)
    pnl = zero_pnl_summary()

    for message in tqdm(stream, leave=False):
        attacker.tick_and_predict(x=message['x'])

    stream_pnl = attacker.pnl.summary()

    pnl = add_pnl_summaries(pnl, stream_pnl)
    if pnl['num_resolved_decisions'] > 0:
        pnl.update({
            'profit_per_decision': pnl['total_profit'] / pnl['num_resolved_decisions']
        })

    total_pnl.append(pnl)

total_pnl = pandas.DataFrame(total_pnl)
total_pnl

## CrunchDAO Code Interface

[Submitting to the CrunchDAO platform requires 2 functions, `train` and `infer`.](https://docs.crunchdao.com/competitions/code-interface) Any line that is not in a function or is not an import will be commented when the notebook is processed.

The content of the function is the same as the example, but the train must save the model to be read in infer.

In [11]:
def train():
    """
    We do not recommend using the train function.
    
    Training should be done before running anything in the cloud environment.
    """

    pass  # no train

In [12]:
def infer(
    stream: typing.Iterator[dict],
):
    """
    Please do not modify the infer function, edit the MyAttacker class directly.

    The core of the attacker logic should be implemented through the attacker classes.
    """

    attacker = MyAttacker(num_lags=2, threshold=2.0, burn_in=1000)
    total_pnl = zero_pnl_summary()

    yield  # mark as ready

    for message in stream:
        yield attacker.tick_and_predict(x=message['x'])

    stream_pnl = attacker.pnl.summary()
    total_pnl = add_pnl_summaries(total_pnl, stream_pnl)

    if total_pnl['num_resolved_decisions'] > 0:
        total_pnl.update({
            'profit_per_decision': total_pnl['total_profit'] / total_pnl['num_resolved_decisions']
        })

    print(total_pnl)

In [None]:
crunch.test()

print("Download this notebook and submit it to the platform: https://hub.crunchdao.com/competitions/mid-one/submit/via/notebook")