[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crunchdao/quickstarters/blob/structural-break/competitions/structural-break/quickstarters/random-submission/random-submission.ipynb)

![Banner](https://raw.githubusercontent.com/crunchdao/quickstarters/refs/heads/structural-break/competitions/structural-break/assets/banner.webp)

# Setup

The first steps to get started are:
1. Get the setup command
2. Execute it in the cell below

### >> https://hub.crunchdao.io/competitions/structural-break/submit/notebook

![Reveal token](https://raw.githubusercontent.com/crunchdao/competitions/refs/heads/structural-break/documentation/animations/reveal-token.gif)

In [None]:
# Install the Crunch CLI
%pip install --upgrade crunch-cli

# Setup your local environment
!crunch setup --notebook structural-break hello --token aaaabbbbccccddddeeeeffff

In [None]:
# Staging only, will be removed for production
%env API_BASE_URL=https://api.hub.crunchdao.io/
%env WEB_BASE_URL=https://hub.crunchdao.io/
%env CRUNCH_COMPETITIONS_BRANCH=structural-break

# Your model

## Setup

In [2]:
import os
import random
import typing

# Import your dependencies
import joblib
import pandas as pd
import sklearn.metrics

In [None]:
import crunch

# Load the Crunch Toolings
crunch = crunch.load_notebook()

## Data

The data was downloaded when you setup your local environment and is now available in the `data/` directory.

In [None]:
# Load the data simply
X_train, y_train, X_test = crunch.load_data()

### `X_train`

Index:
- `id`: the ID of the dataset
- `time`: arbitrary amount of time sampled regularely

Columns:
- `value`: the timeseries data
- `period`: if you are in an **initial segment** (0) or an **extension segment** (1)

In [5]:
X_train

Unnamed: 0_level_0,Unnamed: 1_level_0,value,period
id,time,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,0.001858,0
0,1,-0.001664,0
0,2,-0.004386,0
0,3,0.000699,0
0,4,-0.002433,0
...,...,...,...
10000,1890,-0.005903,1
10000,1891,0.007295,1
10000,1892,0.003527,1
10000,1893,0.007218,1


### `y_train`

This is a simple `pandas.Series` that tells if a dataset id has a structural breakpoint or not.

Index:
- `id`: the ID of the dataset

Value:
- `structural_breakpoint`: the value you need to predict

In [6]:
y_train

id
0         True
1         True
2        False
3         True
4        False
         ...  
9996     False
9997      True
9998     False
9999     False
10000     True
Name: structural_breakpoint, Length: 10001, dtype: bool

### `X_test`

This is a **`list` of `pandas.DataFrame`** that have the same format as [`X_train`](#X_train).

It is provided as a list to make sure you are encouraged to read the records **one by one**, __as this will be mandatory in the [`infer()`](#infer) function__.

In [7]:
print("Number of datasets:", len(X_test))

Number of datasets: 101


In [8]:
X_test[0]

Unnamed: 0_level_0,Unnamed: 1_level_0,value,period
id,time,Unnamed: 2_level_1,Unnamed: 3_level_1
10001,0,-0.020657,0
10001,1,-0.005894,0
10001,2,-0.003052,0
10001,3,-0.000590,0
10001,4,0.009887,0
10001,...,...,...
10001,2517,0.005084,1
10001,2518,-0.024414,1
10001,2519,-0.014986,1
10001,2520,0.012999,1


## Implementation

### `train()`

In the training function, users build and train the model to make inferences on the test data. <br />
Your model must be stored in the `model_directory_path`.

In [None]:
def train(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    model_directory_path: str,
):
    model = ...

    joblib.dump(model, os.path.join(model_directory_path, 'model.joblib'))

### `infer()`

In the inference function, the trained model is loaded and used to make inferences on a sample of data that matches the characteristics of the training test.

#### Setup

Once your model is loaded, you must do a `yield` to signal it to the runner. <br />
After that you can start reading data from `X_test`.

#### Iteration

The datasets must be read **one by one** and each value must be returned with a `yield <value>`. <br />
If you try to skip this, you will get an error. <br />
All values are then concatenated into a prediction file.

**Warning: The datasets can only be iterated once!**

#### Cleanup

Code can be executed after the `for` loop if you need to persist state or do some cleanup.

In [10]:
def infer(
    X_test: typing.Iterable[pd.DataFrame],
    model_directory_path: str,
):
    model = joblib.load(os.path.join(model_directory_path, 'model.joblib'))

    yield  # mark as ready

    # X_test can only be iterated once.
    # Before getting the next dataset, you must predict the current one.
    for dataset in X_test:
        # prediction = model.predict(dataset)
        prediction = round(random.random(), 2)

        yield prediction  # send the prediction for the current dataset

## Local testing

To make sure your `train()` and `infer()` function are working properly, you can call the `crunch.test()` function that will reproduce the cloud environment locally. <br />
Even if it is not perfect, it should give you a quick idea if your model is working properly.

In [None]:
crunch.test(
    # Uncomment to disable the train
    # force_first_train=False,

    # Uncomment to disable the determinism check
    # no_determinism_check=True,
)

## Results

Once the local tester is done, you can preview the result stored in `data/prediction.parquet`.

In [None]:
prediction = pd.read_parquet("data/prediction.parquet")
prediction

### Local scoring

You can call the function that the system uses to estimate your score locally.

In [None]:
# Load the targets
target = pd.read_parquet("data/y_test.reduced.parquet")["structural_breakpoint"].astype(float)

# Call the scoring function
sklearn.metrics.roc_auc_score(
    target,
    prediction,
)

# Submit your Notebook

To submit your work, you must:
1. Download your Notebook from Colab
2. Upload it to the platform
3. Create a run to validate it

### >> https://hub.crunchdao.io/competitions/structural-break/submit

![Download and Submit Notebook](https://raw.githubusercontent.com/crunchdao/competitions/refs/heads/structural-break/documentation/animations/download-and-submit-notebook.gif)