# Regression using Cyclic Boosting

First, install the  package and its dependencies

```sh
!pip install cyclic-boosting
```

In [1]:
import logging

logging.getLogger().setLevel(logging.ERROR)
import warnings

warnings.filterwarnings("ignore")

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

Let's load the test dataset from Blue-Yonder-OSS

In [3]:
import urllib.request

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/Blue-Yonder-OSS/cyclic-boosting/main/tests/integration_test_data.csv",
    "data.csv",
)

path = "./data.csv"
df = pd.read_csv(path)

In [None]:
df.head()

# Prepare Data

The variable LAMBDA is potential demand and is normally unobtainable data, therefore it should be deleted.

Categorical variables must be converted to int type and continuous variables to float type.

In [5]:
def drop_LAMBDA(df):
    df = df.drop(columns="LAMBDA")
    return df


def convert_datatype(df, col):
    if df[col].dtype == np.float64:
        df = df.astype({col: np.int64})
    elif df[col].dtype == np.int64:
        df = df.astype({col: np.float64})
    return df


df = drop_LAMBDA(df)
df = convert_datatype(df, col="SCHOOL_HOLIDAY")
df.to_csv("./data_exam.csv", index=False)
df_train, df_test = train_test_split(df, test_size=0.3, shuffle=False)

In [None]:
df_train.head()

# Automated Machine Learning with Tornado
With tornado, you can automatically perform data preparation, feature property setting, hyperparameter tuning, model building, training, evaluation, and plotting! (but, It might take a few minutes. Have a coffee break during execution.)

In [None]:
from cyclic_boosting.tornado import Generator, Manager, Tornado

data_deliverer = Generator.TornadoDataModule(df_train)
manager = Manager.ForwardSelectionManager(is_time_series=True, combination=2, dist="nbinom")
predictor = Tornado.ForwardSelectionModel(data_deliverer, manager)
predictor.fit(target="sales", criterion="COD", verbose=False)

Tornado model is able to point estimation and probability estimation.

In [None]:
# mean point estimation
yhat = predictor.predict(df_test)
print(yhat[0])

# probability estimation with negative binomial distribution
proba = predictor.predict_proba(df_test.head(5), output="pmf")
proba.loc[0, :].plot()