This notebook will generate 4 different submissions, each one using a different regression model.

Imports

In [24]:
from model import Forecast, FitPredict, Evaluation
from process_data import PreProcessData, PostProcessData

import pandas as pd
import random

Load datasets. Datasets were stored as class vaiables for easy retrieval.

In [25]:
df_train = PreProcessData.train
df_test = PreProcessData.test

Initialize classes

In [26]:
forecast = Forecast()
fit_predict = FitPredict()
evaluator = Evaluation()
preprocessor = PreProcessData(df_train, df_test)
postprocessor = PostProcessData()

Load models

In [27]:
lgbm = forecast.create_lgbm_regressor_forecaster
svr = forecast.create_svr_regresor_forecaster
forest = forecast.create_random_forest_regresor_forecaster
sarimax = forecast.create_sarimax_forecaster

Models are stored in a list in order to loop over them.

In [28]:
model_list = [lgbm, svr, forest, sarimax]

Fit models

The class PreProcessData handles the preprocessing of the raw data and delivers a clean dictionary where a series of pairs of individual, train and test dataframes are stored by keys which are a unique tuples of 'Year', 'Store' and 'Product'

In [29]:
df_dic = preprocessor.clean_dic
df_dic

{('Canada',
  'Discount Stickers',
  'Holographic Goose'): [                id country              store            product  num_sold
  date                                                                      
  2010-01-01       0  Canada  Discount Stickers  Holographic Goose       0.0
  2010-01-02      90  Canada  Discount Stickers  Holographic Goose       0.0
  2010-01-03     180  Canada  Discount Stickers  Holographic Goose       0.0
  2010-01-04     270  Canada  Discount Stickers  Holographic Goose       0.0
  2010-01-05     360  Canada  Discount Stickers  Holographic Goose       0.0
  ...            ...     ...                ...                ...       ...
  2016-12-27  229680  Canada  Discount Stickers  Holographic Goose       0.0
  2016-12-28  229770  Canada  Discount Stickers  Holographic Goose       0.0
  2016-12-29  229860  Canada  Discount Stickers  Holographic Goose       0.0
  2016-12-30  229950  Canada  Discount Stickers  Holographic Goose       0.0
  2016-12-31  2300

A dictionary is created to store the name of each model to use them as file names.

In [30]:
name_models = {lgbm:'lgbm', forest:'forest', svr:'svr', sarimax:'sarimax'}

Creating the Loop

Column to predict and length of the prediction

In [33]:
y_column = 'num_sold'

Length of the prediction is the length of each individual test dataframe in the dictionary. A random dictionary value is taken to isolate one of these dataframes.

In [34]:
dic_sample = random.choice(list(df_dic.values()))
train_sample, test_sample = dic_sample[0], dic_sample[1]
steps = len(test_sample)

For each model in the list:
    A pandas ataframe is created
    For each diferent pair of train, test stored in the dictionary:
        The model is fit to train
        The predictions are obtained
        Predictions are added as a column to test
        The resulting dataset is concatenated to 'submission'
        The dataframe submission is sorted by the 'id' column, index reset and left only with columns 'id' and 'num_sold' to meet the submission criteria.
    Once every pair of datasets of the dictionary are added to the submission dataset, it's saved to disk

From the list of 4 models, 4 different submissions are generated
    
    

In [35]:
for model in model_list:
    print(model)
    submission = pd.DataFrame()
    for train, test in df_dic.values():
        forecaster = model(steps)
        fit_predict.fit_forecaster(forecaster, train, y_column )
        predictions = fit_predict.get_predictions(forecaster, steps)
        test_w_preds = pd.concat([test, predictions], axis=1).rename(columns={'pred':'num_sold'})
        submission = pd.concat([submission, test_w_preds])
        submission = submission.reset_index()[['id', 'num_sold']].sort_values('id')
        print('submission_updated')
    submission.to_csv(f'submission_20250207_au_{name_models[model]}', index=False)
    print('submission_saved')

<bound method Forecast.create_lgbm_regressor_forecaster of <model.Forecast object at 0x00000266F2B43EF0>>
[LightGBM] [Info] Total Bins 0
[LightGBM] [Info] Number of data points in the train set: 1462, number of used features: 0
submission_updated
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.035508 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 279083
[LightGBM] [Info] Number of data points in the train set: 1462, number of used features: 1095
[LightGBM] [Info] Start training from score 699.651163
submission_updated
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.029684 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 278959
[LightGBM] [Info] Number of data points in the train set: 1462, number of used features: 1095
[LightGBM] [Info] Start training from score 577.499316
submission_updated
[LightGBM] [Info] Aut



submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated




submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated




submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_updated
submission_saved
