# Regression using Cyclic Boosting

First, install the  package and its dependencies

```sh
!pip install cyclic-boosting
```

In [1]:
import pandas as pd
import numpy as np

Let's use the test dataset from kaggle

For time-series data, a "date" column must be included to indicate the date and time the data was obtained. The column name and format must be consistent. The "dayofweek" column for the day of the week and the "dayofyear" column for the total number of days in the year are automatically created if not already present, but if they are already present, the column names must be correct.

This dataset has data for each week.

In [2]:
df = pd.read_csv("./Walmart_data/Walmart.csv")
df = df.rename(columns={'Date': 'date'})
df["date"] = pd.to_datetime(df["date"], format='%d-%m-%Y')
df.to_csv("./Walmart.csv", index=False)

# Automated Machine Learning with Tornado
With tornado, you can automatically perform data preparation, feature property setting, hyperparameter tuning, model building, training, evaluation, and plotting!

In [3]:
from cyclic_boosting.tornado import Generator, Manager, Trainer

data_deliverler = Generator.TornadoDataModule("./Walmart.csv")
manager = Manager.TornadoVariableSelectionModule()
trainer = Trainer.SqueezeTrainer(data_deliverler, manager)
trainer.run(target="weekly_sales", log_policy="compute_COD", verbose=False)

Data interval is 'weekly'. If not, give
    the data_interval option in the TornadoDataModule.

Auto analysis target ['temperature', 'fuel_price', 'cpi', 'unemployment']
    has_trend: ['temperature', 'fuel_price', 'cpi', 'unemployment']
    has_seasonality: ['temperature', 'fuel_price']
    has_up_monotonicity: []
    has_down_monotonicity: []
    has_linearity: []
    has_missing: []


iter: 5 / 35 Encountered negative change of loss. This might not be a problem, as long as the model converges. Check the LOSS changes in the analysis plots.
iter: 35 / 35 
TRUNCATED
['store', 'unemployment', 'cpi', 'dayofyear', 'temperature', 'holiday_flag', 'dayofweek', 'fuel_price', ('store', 'cpi'), ('store', 'unemployment'), ('store', 'dayofweek'), ('store', 'holiday_flag'), ('unemployment', 'dayofweek'), ('holiday_flag', 'unemployment'), ('temperature', 'dayofweek'), ('cpi', 'dayofweek'), ('holiday_flag', 'cpi'), ('dayofweek', 'dayofyear'), ('holiday_flag', 'dayofyear'), ('fuel_price', 'dayofweek'), ('temperature', 'fuel_price'), ('cpi', 'dayofyear'), ('temperature', 'dayofyear'), ('unemployment', 'dayofyear'), ('cpi', 'unemployment'), ('fuel_price', 'cpi'), ('temperature', 'unemployment'), ('fuel_price', 'unemployment'), ('temperature', 'cpi')]

iter: 0 / 28   ---- The best model was updated in iter 0 ----
    best_features['store']
    MD: 224.44074268557605, MAD: 450947.7551167

# Load the best model and make predictions.

Get the best model path.

In [4]:
import pickle
from pathlib import Path

model_nos = []
for p in sorted(Path("./models/").glob("model*")):
    model_nos.append(str(p)[str(p).find("_") + 1 :])
model_path = f"./models/model_{model_nos[-1]}/model_{model_nos[-1]}.pkl"
print(model_path)

./models/model_20/model_20.pkl


Make predictions with the best model.

In [5]:
data = {
    "store": [5],
    "holiday_flag": [0],
    "temperature": [48.3],
    "fuel_price": [2.976],
    "cpi": [211.9560305],
    'unemployment': [6.634],
    "dayofweek": [4],
    "dayofyear": [7],
}
X = pd.DataFrame(data)

with open(model_path, "rb") as f:
    CB_est = pickle.load(f)
    yhat = CB_est.predict(X.copy())
    print(yhat)

[283081.02621612]
