# 03 — ML-based Time Series Forecasting (LightGBM)

Цель этого ноутбука:

- Преобразовать временной ряд в supervised ML-задачу.
- Выполнить feature engineering для временных данных.
- Обучить модель LightGBM для прогнозирования продаж.
- Сравнить качество ML-модели с baseline-подходами.
- Проанализировать важность признаков и интерпретировать результаты.


In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import lightgbm as lgb

from typing import List

plt.style.use("seaborn-v0_8")
plt.rcParams["figure.figsize"] = (12, 6)

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)


## 1. Data Loading and Aggregation

As in previous notebooks, we aggregate sales data
to obtain total daily sales across all stores and categories.

This aggregated time series will be transformed
into a supervised learning dataset using lag-based features.


In [2]:
DATA_PATH = "../data/raw/train.csv"

df = pd.read_csv(DATA_PATH)

df["date"] = pd.to_datetime(df["date"])
df = df.sort_values("date")

daily_sales = (
    df.groupby("date", as_index=False)["sales"]
    .sum()
    .rename(columns={"sales": "total_sales"})
)

daily_sales.head(), daily_sales.tail()


(        date    total_sales
 0 2013-01-01    2511.618999
 1 2013-01-02  496092.417944
 2 2013-01-03  361461.231124
 3 2013-01-04  354459.677093
 4 2013-01-05  477350.121229,
            date    total_sales
 1679 2017-08-11  826373.722022
 1680 2017-08-12  792630.535079
 1681 2017-08-13  865639.677471
 1682 2017-08-14  760922.406081
 1683 2017-08-15  762661.935939)