# Time Series Analysis: "The Final Project"

`End? No, the journey doesn't end here. Death is just another path. One that we all must take.
-J.R.R. Tolkien, The Return of the King`

---

## Libraries

In [6]:
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
import statsmodels.api as sm
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import lightgbm as lgb
from pandas.plotting import register_matplotlib_converters
from IPython.display import display
from tsa_tools import *  # Hehe

register_matplotlib_converters()
sns.set_style('darkgrid')

np.set_printoptions(precision=4)
pd.set_option('precision', 4)

---

## M5 Forecasting

For this "final project", we will be forecasting the <b><u>level 9</b></u> series (unit sales of all products, aggregated for each store and department).

Load `sales_train_evaluation.csv` and use observations from `d_1 to d_1913` for training and `d_1914 to d_1941` for testing.

In [3]:
df_calendar = pd.read_csv('../data/m5/calendar.csv')
df_sales = pd.read_csv('../data/m5/sales_train_evaluation.csv')
df_weights = pd.read_csv('../data/m5/weights_validation.csv')
# display(df_calendar, df_sales, df_weights)

In [4]:
train_df = (df_sales.set_index([*df_sales.columns[5::-1]]).T
           .set_index(pd.DatetimeIndex(df_calendar.date)[:1941]).iloc[:-28])
test_df = (df_sales.set_index([*df_sales.columns[5::-1]]).T
           .set_index(pd.DatetimeIndex(df_calendar.date)[:1941]).iloc[-28:])
# display(train_df, test_df)

In [5]:
levels = {
    1: None,
    2: "state_id",
    3: "store_id",
    4: "cat_id",
    5: "dept_id",
    6: ["state_id", "cat_id"],
    7: ["state_id", "dept_id"],
    8: ["store_id", "cat_id"],
    9: ["store_id", "dept_id"],
    10: "item_id",
    11: ["state_id", "item_id"],
    12: ["store_id", "item_id"]
}

In [6]:
train_df_9 = timeSeriesFiltering(
    train_df.sum(axis='columns', level=levels[9]), lower=10)
trainOG_df_9 = train_df.sum(axis='columns', level=levels[9])
test_df_9 = test_df.sum(axis='columns', level=levels[9])
weights_df_9 = (df_weights
                .loc[df_weights['Level_id'] == 'Level9']
                .set_index(['Agg_Level_1', 'Agg_Level_2'])[['Weight']])

---

## Part 5. King of the Hill (20 pts.)

Using whatever methods/models you desire, beat the best `WRMSSE` score in Q5.

<b><u>Do not tune your model using the test set!</b></u> If you do, you will not get points for this part.

### Q8. (10 pts.)

Describe your methodology here. 

Points will be awarded for <b>aesthetics</b> (ex. use of diagrams), <b>ease of reading</b>, <b>clarity</b>, and <b>brevity</b>.

Points will be deducted for <b>excessively long</b> walls of text and descriptions.

<img src="Slide1.png" width="800" />
<img src="Slide2.png" width="800" />
<img src="Slide3.png" width="800" />
<img src="Slide4.png" width="800" />
<img src="Slide5.png" width="800" />
<img src="Slide6.png" width="800" />

### Q9. (10 pts.)

This part is for your actual code.

In [7]:
tscv = TimeSeriesSplit(n_splits=3, test_size=28, )
methods = {
    "Seasonal Naive": {
        'meta': 'base', 
        'model':BaseFuncModel(snaivef, m=7)},
    "SES": {
        'meta': 'stat',
        'model': StatsModelsWrapper(ETSModel, trend=None, seasonal=None)},
    "Holt's Linear": {
        'meta': 'stat',
        'model': StatsModelsWrapper(ETSModel, trend='add', seasonal=None)},
    "Additive Holt-Winters": {
        'meta': 'stat',
        'model': StatsModelsWrapper(
            ETSModel, seasonal_periods=7, trend='add', seasonal='add')},
    "RecursiveRegressor(LGBMRegressor)": {
        'meta': 'ml_recursive',
        'model': RecursiveRegressor(
            lgb.LGBMRegressor(random_state=1, w=84, h=28, n_jobs=-1))},
    "MultiOutputRegressor(LGBMRegressor)":{
        'meta': 'ml_direct',
        'model': MultiOutputRegressor(
            lgb.LGBMRegressor(random_state=1, n_jobs=-1),n_jobs=-1)},
    "RegressorChain(LGBMRegressor)": {
        'meta': 'ml_direct',
        'model': RegressorChain(
            lgb.LGBMRegressor(random_state=1, n_jobs=-1))},
    "Combo(LGBMRegressor)": {
        'meta': 'combo',
        'model': [
            RecursiveRegressor(
                lgb.LGBMRegressor(random_state=1, w=84, h=28, n_jobs=-1)),
            MultiOutputRegressor(
                lgb.LGBMRegressor(random_state=1, n_jobs=-1),n_jobs=-1)
        ]}
}

In [None]:
for col in train_df_9:
    print(col)
    methods = evaluate_methods(
        methods,
        X=train_df_9[col],
        XOG=trainOG_df_9[col],
        tscv=tscv,
        col=col,
        w=84,
        h=28)

In [10]:
rmse_scores = score_reveal(methods)
rmse_scores

Unnamed: 0,Unnamed: 1,Seasonal Naive,SES,Holt's Linear,Additive Holt-Winters,RecursiveRegressor(LGBMRegressor),MultiOutputRegressor(LGBMRegressor),RegressorChain(LGBMRegressor),Combo(LGBMRegressor)
CA_1,HOBBIES_1,112.5315,102.1342,102.2342,89.0478,88.1818,91.9518,90.5094,89.2338
CA_1,HOBBIES_2,17.1085,13.3313,13.4002,11.8658,13.7598,14.0885,12.5092,13.3713
CA_1,HOUSEHOLD_1,124.7181,182.9850,183.3727,80.5718,77.1174,78.3647,76.8850,76.4375
CA_1,HOUSEHOLD_2,42.9325,54.9331,54.7421,31.1680,32.8088,34.2279,34.3215,33.1396
CA_1,FOODS_1,96.5581,115.3984,113.5375,92.5809,83.4848,71.9233,74.6178,75.9609
...,...,...,...,...,...,...,...,...,...
WI_3,HOUSEHOLD_1,119.6917,151.5976,151.4656,100.3098,96.1385,96.7119,95.9121,93.5357
WI_3,HOUSEHOLD_2,28.4814,38.4835,38.4900,27.9321,26.6128,29.6885,27.0664,27.0069
WI_3,FOODS_1,85.1284,95.9297,96.3109,86.2128,67.6526,65.7408,64.0030,65.1575
WI_3,FOODS_2,192.7013,167.3461,166.0414,180.1888,109.7245,103.9821,111.8733,104.1259


In [11]:
best_models = rmse_scores.apply(lambda x: x.idxmin(), axis=1)
best_models

CA_1  HOBBIES_1        RecursiveRegressor(LGBMRegressor)
      HOBBIES_2                    Additive Holt-Winters
      HOUSEHOLD_1                   Combo(LGBMRegressor)
      HOUSEHOLD_2                  Additive Holt-Winters
      FOODS_1        MultiOutputRegressor(LGBMRegressor)
                                    ...                 
WI_3  HOUSEHOLD_1                   Combo(LGBMRegressor)
      HOUSEHOLD_2      RecursiveRegressor(LGBMRegressor)
      FOODS_1              RegressorChain(LGBMRegressor)
      FOODS_2        MultiOutputRegressor(LGBMRegressor)
      FOODS_3                       Combo(LGBMRegressor)
Length: 70, dtype: object

In [12]:
best_models.value_counts()

Additive Holt-Winters                  27
Combo(LGBMRegressor)                   10
RegressorChain(LGBMRegressor)          10
RecursiveRegressor(LGBMRegressor)      10
MultiOutputRegressor(LGBMRegressor)     9
SES                                     3
Holt's Linear                           1
dtype: int64

In [15]:
model = ensemble2(col_assignment=best_models.to_dict(), methods=methods, w=84, h=28)
model.fit(train_df_9)

In [None]:
df_pred_9_ensemble2 = model.predict(trainOG_df_9)
df_pred_9_ensemble2

Unnamed: 0_level_0,CA_1,CA_1,CA_1,CA_1,CA_1,CA_1,CA_1,CA_2,CA_2,CA_2,...,WI_2,WI_2,WI_2,WI_3,WI_3,WI_3,WI_3,WI_3,WI_3,WI_3
Unnamed: 0_level_1,HOBBIES_1,HOBBIES_2,HOUSEHOLD_1,HOUSEHOLD_2,FOODS_1,FOODS_2,FOODS_3,HOBBIES_1,HOBBIES_2,HOUSEHOLD_1,...,FOODS_1,FOODS_2,FOODS_3,HOBBIES_1,HOBBIES_2,HOUSEHOLD_1,HOUSEHOLD_2,FOODS_1,FOODS_2,FOODS_3
0,437.5033,39.0957,763.9959,207.237,266.7877,509.9272,1925.101,320.4134,39.31,666.1571,...,320.4731,964.3415,1868.7333,218.7019,29.3937,527.0088,159.2782,271.9992,436.9822,1605.361
1,392.2056,38.7922,654.6803,189.0628,242.2286,405.2741,1777.0478,349.0217,34.758,602.7528,...,333.5456,721.1611,1640.876,208.3781,30.4984,537.2347,139.6441,294.1283,382.8208,1620.5203
2,423.5902,40.4461,619.8833,189.6468,283.8521,348.0841,1754.382,280.4951,41.5079,602.8867,...,343.4532,710.5954,1680.4127,216.6801,31.0335,509.2187,132.3633,242.3489,359.9844,1562.3207
3,406.6826,39.1083,593.2084,197.6096,300.6551,372.3611,1844.6085,350.8923,37.9551,599.6201,...,339.3687,802.8031,1688.3539,224.516,31.2705,552.2354,142.5043,229.3149,354.8844,1490.6989
4,453.2403,43.9026,688.9625,226.5993,344.9472,518.561,2073.3072,427.0443,41.1714,765.073,...,380.7957,713.4821,2152.6675,285.1761,32.4132,661.9132,176.8324,278.5479,468.2096,1740.4981
5,580.0852,50.3332,1028.9997,290.3485,400.7362,592.585,2682.339,467.593,52.6185,1077.7876,...,405.5013,835.2595,2311.1217,314.2479,33.0978,809.0336,206.358,337.2512,538.8162,2519.8603
6,516.6611,52.1988,1084.4108,280.3655,351.8076,662.2098,2926.0347,456.7799,48.7608,1117.9953,...,349.6992,1033.3559,2383.4232,244.084,32.2626,841.5723,206.3033,283.7687,591.0831,2432.8251
7,461.42,39.1899,774.8963,207.5895,266.4044,510.1214,2140.1317,318.742,32.7714,667.6338,...,320.8609,1306.3273,2591.0659,218.81,29.457,625.0542,155.6883,252.6107,579.6351,1792.5438
8,406.5608,38.8864,669.6527,189.4153,274.1547,474.5079,1948.6778,318.5709,37.6144,604.2295,...,333.9334,1380.853,2449.1456,208.4862,30.5617,584.6945,136.9481,278.5121,720.1154,2050.8738
9,439.8678,40.5403,642.2546,189.9993,250.8226,491.0791,1899.5062,319.1154,39.6117,604.3635,...,343.841,1586.7742,2655.3447,216.7883,31.0968,595.5331,141.5213,230.4024,612.6808,1902.4786


In [17]:
df_res_9_esemble2 = rateMyForecast(
    trainOG_df_9, test_df_9, df_pred_9_ensemble2)['RMSSE']
df_res_9_esemble2.index = pd.MultiIndex.from_tuples(
    df_res_9_esemble2.index, names=['Agg_Level_1', 'Agg_Level_2'])
df_res_9_esemble2

Agg_Level_1  Agg_Level_2
CA_1         HOBBIES_1      0.7412
             HOBBIES_2      0.6931
             HOUSEHOLD_1    0.3584
             HOUSEHOLD_2    0.5141
             FOODS_1        0.5867
                             ...  
WI_3         HOUSEHOLD_1    0.7166
             HOUSEHOLD_2    0.7675
             FOODS_1        1.6214
             FOODS_2        0.8506
             FOODS_3        0.4849
Name: RMSSE, Length: 70, dtype: float64

### THE RESULT

In [18]:
df_res_9_esemble2.multiply(weights_df_9.squeeze(), axis=0).sum()

0.7259427322254518