<a href="https://colab.research.google.com/github/crunchdao/quickstarters/blob/master/competitions/mid-one/mean_reversion_attacker/mean_reversion_attacker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ![Cover](https://raw.githubusercontent.com/crunchdao/quickstarters/master/competitions/mid-one/cover.jpg)

# Mean Reversion Attacker Tutorial
This notebook demonstrates how to create an `Attacker` described in [attacker.md](https://github.com/microprediction/midone/blob/main/midone/attackers/attacker.md).


## How is `Attacker.predict` different to a forecast?

An `Attacker` tries to predict whether a time-series will go up or down on average - though only when it has a strong opinion. To be precise, our attacker will consume a univariate sequence of numerical data points $x_1, x_2, \dots x_t$ and try to exploit deviations from the [martingale property](https://en.wikipedia.org/wiki/Martingale_(probability_theory)), which is to say that we expect the series $x_t$ to satisfy:

$$ E[x_{t+k}] \approx x_t $$

roughly. Of course, there's no such thing in this world as a perfect martingale and it is your job to indicate when

$$ E[x_{t+k}] > x_t + \epsilon $$

by returning a positive value when the `predict` method is called,
or conversely. Here $\epsilon$ finds interpretation as a trading cost. The attacker will *typically* return `0` meaning that it thinks:

$$  x_t - \epsilon   > E[x_{t+k}] > x_t + \epsilon $$

because trading opportunities are probably on the rare side - though obviously this is problem dependent. The default $\epsilon$ and $k$ (`horizon`) parameters are set [here](https://github.com/microprediction/midone/blob/main/midone/gameconfig.py).

## Setup

In [None]:
%pip install --upgrade midone

In [None]:
# Get a new token here: https://hub.crunchdao.com/competitions/mid-one/submit/via/notebook

%pip install --upgrade crunch-cli
!crunch setup --notebook mid-one hello --token aaaabbbbccccddddeeeeffff

## Imports

In [2]:
import json
import os
import typing

import numpy
import pandas
import scipy.optimize as opt
from midone import HORIZON, Attacker
from tqdm.auto import tqdm

In [None]:
import crunch

crunch = crunch.load_notebook()

## Step 1: Decide what state to maintain

Let's first implement the `tick` method. This should quickly respond to an incoming data point by modifying a rapidly changing `state`. Here we choose to maintain the current value and also an exponentially weighted moving average of historical values. We use a simple dictionary, but other styles are presented [here](https://github.com/microprediction/midone/blob/main/tests/colabexamples/README.md) that you might prefer.

In [4]:
class MyAttacker(Attacker):

    def __init__(
        self,
        a=0.01
    ):
        super().__init__()

        # state
        self.running_avg: float = None
        self.current_value: float = None

        # params
        self.a = a

    def tick(self, x: float):
        # Maintains an expon moving average of the data
        self.current_value = x

        if numpy.isnan(x):
            return

        if self.running_avg is None:
            self.running_avg = x
        else:
            self.running_avg = (1 - self.a) * self.running_avg + self.a * x

    def predict(self, horizon=HORIZON) -> float:
        if self.current_value > self.running_avg + 2:
            return -1  # sell

        if self.current_value < self.running_avg - 2:
            return 1  # buy

        return 0  # hold

### Loading data

In [None]:
x_train, x_test = crunch.load_streams()

### Testing tick
We are half way there. Let's check the state maintenance:

In [None]:
attacker = MyAttacker()

for message in x_train[0]:
    attacker.tick(message["x"])

print(f"After processing the entire stream, the current value is {attacker.current_value} and the moving average is {attacker.running_avg}")

## Making an `up` or `down` decision

Next we call the `predict` which use a mean reversion strategy. Let's check that if the current value is very high we should predict it will fall:

In [None]:
attacker = MyAttacker()

attacker.current_value = 10
attacker.running_avg = 5

# Should sell (-1)
attacker.predict()

## Run the attacker on mock data
Let's put these together to creat an attacker with both `tick` and `predict`

In [9]:
# Always reset an attacker
attacker = MyAttacker()

data = [1, 3, 4, 2, 4, 5, 1, 5, 2, 5, 10] * 100
for x in data:
    y = attacker.tick_and_predict(x=x, horizon=HORIZON)

## Run the attacker on real data

In [None]:
states = []
profits_and_losses = []

for stream in tqdm(x_train):
    attacker = MyAttacker()

    for message in tqdm(stream, leave=False):
        x = message['x']
        y = attacker.tick_and_predict(x=x, horizon=HORIZON)

    states.append({
        "current_value": attacker.current_value,
        "running_avg": attacker.running_avg,
    })

    profits_and_losses.append(attacker.pnl.summary())

states = pandas.DataFrame(states)
profits_and_losses = pandas.DataFrame(profits_and_losses)

print(f"After processing the all streams, here are the current values and moving averages:")
states

## Check the attackers' profits and losses

In [None]:
profits_and_losses

## Train (globally) using many streams

Let's create a function that evaluates the attacker for a choice of parameter `a` when it is run over the entire training set. For this you can use the stream generator generator. Just pull `x` out of each message.

In [None]:
# First define the objective as negative total profit and test it
def negative_attacker_profit(
    a,
    streams: typing.List[typing.Iterable[dict]],
    verbose=True
):
    """
    a: Parameter
    streams: Supplies a collection of individual streams
    """

    total_profit = 0

    for stream in streams:
        # Reset the attacker each stream
        attacker = MyAttacker(a=a)

        # Run it over the stream
        for message in stream:
            x = message['x']
            attacker.tick_and_predict(x=x, horizon=HORIZON)

        pnl = attacker.pnl.summary()
        total_profit += pnl['total_profit']

    if verbose:
        print(f'Using a={a} the total profit is {total_profit}')

    # So smaller is better for the optimizer
    return -total_profit

negative_attacker_profit(a=0.1, streams=x_train)

## CrunchDAO Code Interface

[Submitting to the CrunchDAO platform requires 2 functions, `train` and `infer`.](https://docs.crunchdao.com/competitions/code-interface) Any line that is not in a function or is not an import will be commented when the notebook is processed.

The content of the function is the same as the example, but the train must save the model to be read in infer.

### The `train` function
The canonical way to write a training procedure uses `streams` argument and iterates over all data points in all training streams.

In [None]:
def get_parameter_file_path(model_directory_path: str):
    return os.path.join(model_directory_path, 'params.json')


def train(
    streams: typing.List[typing.Iterable[dict]],
    model_directory_path: str
):
    def training_optimization_objective(a):
        return negative_attacker_profit(a=a, streams=streams)

    result = opt.minimize_scalar(
        training_optimization_objective,
        bounds=(0.001, 0.2),
        method='bounded',
        options={
            'maxiter': 5
        }
    )

    best_a = result.x

    print(f"Optimal value of a: {result.x}")
    print(f"Minimum total profit: {-result.fun}")  # Re-negate to get the actual profit

    # Let's save the best parameter
    parameter_file_path = get_parameter_file_path(model_directory_path)
    with open(parameter_file_path, 'w') as fd:
        json.dump({'a': best_a}, fd)
        print(f'Saved {parameter_file_path}')

    # Check we can load it again!
    with open(parameter_file_path, 'r') as fd:
        params = json.load(fd)
        print(params)


# Here is how you would use it on the training data
train(
    streams=x_train,
    model_directory_path="resources/"
)

## The `infer` function

Your notebook should have an infer function that can yield one prediction at a time.

In [None]:
def infer(
    stream: typing.Iterator[dict],
    model_directory_path: str
):
    # Load the best parameters
    parameter_file_path = get_parameter_file_path(model_directory_path)
    with open(parameter_file_path, 'r') as fd:
        params = json.load(fd)
        a = params['a']

    # Instantiate your attacker
    attacker = MyAttacker(a=a)

    # Signals to the system that your attacker is initialized and ready.
    yield  # Leave this here.

    for message in stream:
        decision = attacker.tick_and_predict(message['x'])

        # Be sure to yield, even if the decision is zero.
        yield decision


# A quick test that indicates how your infer function will be used when you upload this notebook:
messages = [{'x': 2.0}] * 10
for y in infer(messages, model_directory_path="resources/"):
    # the first value is `None`, this is intended
    print(y)

In [None]:
prediction = crunch.test()
display(prediction)

print("Download this notebook and submit it to the platform: https://hub.crunchdao.com/competitions/mid-one/submit/via/notebook")