## Time series - Demand - Forecasting
https://www.kaggle.com/c/demand-forecasting-kernels-only

This competition is provided as a way to explore different time series techniques on a relatively simple and clean dataset.

You are given 5 years of store-item sales data, and asked to predict 3 months of sales for 50 different items at 10 different stores.

In [3]:
#Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
from pathlib import Path
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
import plotly.express as px
from dateutil.relativedelta import relativedelta
from sklearn.model_selection import train_test_split
%matplotlib inline

In [4]:
#Set data path
time_series_train_path = Path('../data/time_series_train.csv')
time_series_test_path = Path('../data/time_series_test.csv')

In [5]:
#Load dataframe
train_df = pd.read_csv(time_series_train_path, parse_dates=['date'])
test_df = pd.read_csv(time_series_test_path, parse_dates=['date'], index_col=['id'])

In [6]:
train_df.head()

Unnamed: 0,date,store,item,sales
0,2013-01-01,1,1,13
1,2013-01-02,1,1,11
2,2013-01-03,1,1,14
3,2013-01-04,1,1,13
4,2013-01-05,1,1,10


In [7]:
test_df.head()

Unnamed: 0_level_0,date,store,item
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,2018-01-01,1,1
1,2018-01-02,1,1
2,2018-01-03,1,1
3,2018-01-04,1,1
4,2018-01-05,1,1


In [17]:
len(test_df)

45000

## Data pre-processing

In [8]:
#Function to add date parts
def add_date_parts(dataf):
    
    date_part_df = (dataf
                   .assign(year = lambda d: d['date'].dt.year,
                           month_number = lambda d: d['date'].dt.month,
                           month_name = lambda d: d['date'].dt.strftime('%B'),
                           day_number =  lambda d: d['date'].dt.dayofweek,
                           day_name =  lambda d: d['date'].dt.strftime('%A'),
                           week = lambda d: d['date'].dt.strftime('%W').astype(int) + 1
                          )
                   )
        
    return date_part_df

In [9]:
#Create final train and test df
train_df_final = (train_df
                   .pipe(add_date_parts)
                 )

test_df_final = (test_df
                   .pipe(add_date_parts)
                )

In [10]:
train_df_final.head()

Unnamed: 0,date,store,item,sales,year,month_number,month_name,day_number,day_name,week
0,2013-01-01,1,1,13,2013,1,January,1,Tuesday,1
1,2013-01-02,1,1,11,2013,1,January,2,Wednesday,1
2,2013-01-03,1,1,14,2013,1,January,3,Thursday,1
3,2013-01-04,1,1,13,2013,1,January,4,Friday,1
4,2013-01-05,1,1,10,2013,1,January,5,Saturday,1


In [11]:
train_df_final.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 913000 entries, 0 to 912999
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   date          913000 non-null  datetime64[ns]
 1   store         913000 non-null  int64         
 2   item          913000 non-null  int64         
 3   sales         913000 non-null  int64         
 4   year          913000 non-null  int64         
 5   month_number  913000 non-null  int64         
 6   month_name    913000 non-null  object        
 7   day_number    913000 non-null  int64         
 8   day_name      913000 non-null  object        
 9   week          913000 non-null  int32         
dtypes: datetime64[ns](1), int32(1), int64(6), object(2)
memory usage: 66.2+ MB


## Arima

- ..