## DM2 DMC | Facebook Prophet Baselines: A Single Model for each Item

Let's just use the sales numbers of each single item to produce a very basic forecast.

### ToDos:

* Finish evaluation part
* Verify that predictions are the right ones in the list

### Imports

In [1]:
import pandas as pd
import numpy as np
from fbprophet import Prophet
import logging
import csv
import pickle

### Settings

In [2]:
read_write_prophet_forecasts = 'r'

In [3]:
read_write_prophet_fitted_models = 'r'

### Directories

In [4]:
input_file = 'C:/Users/JulianWeller/Desktop/2018_04_23_a_DM2_DMC_FB_Prophet_Date_Item.xlsx'

In [5]:
test_data_directory = 'C:/Users/JulianWeller/OneDrive - Julian Weller/01_MMDS/03_Semester/04_A_6_Data Mining II/03_DMC/02_Test_Data/DMC_2018_test/'

In [6]:
dump_directory = 'C:/Users/JulianWeller/Desktop/DM2_DMC_Working_Directory/'

### Loading the Data

In [7]:
df = pd.read_excel(input_file)

Let's split into train and test data (test data is not used, as there is test data provided by the chair):

In [8]:
df_train = df.drop(df.tail(31).index)

In [9]:
df_train.tail()

Unnamed: 0,ds,10000XL ( 158-170 ),10001L,100033 (35-38 ),100034 ( 39-42 ),100035 ( 43-46 ),10006XL,10008XL,10013L,10013M,...,2286946,2286947,"2286947,5",22872M ( 140-152 ),22873L,228782XL,22878L,22878M,22878XL,22881S
87,2017-12-27,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,10
88,2017-12-28,0,0,0,0,0,0,0,0,0,...,0,0,10,0,0,10,30,10,20,0
89,2017-12-29,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
90,2017-12-30,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
91,2017-12-31,0,0,0,0,0,0,0,0,0,...,0,0,0,30,0,0,0,0,0,10


Not used, but here's the January data:

In [10]:
df_test = df.drop(df.head(92).index)

In [11]:
df_test.head()

Unnamed: 0,ds,10000XL ( 158-170 ),10001L,100033 (35-38 ),100034 ( 39-42 ),100035 ( 43-46 ),10006XL,10008XL,10013L,10013M,...,2286946,2286947,"2286947,5",22872M ( 140-152 ),22873L,228782XL,22878L,22878M,22878XL,22881S
92,2018-01-01,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,10,0,0
93,2018-01-02,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,10,0,0,0,10
94,2018-01-03,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
95,2018-01-04,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
96,2018-01-05,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's create that 'ds' column FB prophet always needs:

In [12]:
df_ds = df_train.iloc[:,[0]]

In [13]:
df_ds.head()

Unnamed: 0,ds
0,2017-10-01
1,2017-10-02
2,2017-10-03
3,2017-10-04
4,2017-10-05


Let's create a list of dataframes where each such dataframe contains a 'ds' and a 'y' column so that we can easily give it to the prophet and ask for predictions:

In [14]:
list_of_dataframes = []

And let's keep track of the item keys:

In [15]:
list_of_item_keys = []

In [16]:
for i in range(1, 12825):
    df_to_be_appended = df_ds.join(df_train.iloc[:,[i]])
    
    list_of_item_keys.append(df_to_be_appended.columns[1])
    
    df_to_be_appended.columns = ['ds', 'y']
    
    list_of_dataframes.append(df_to_be_appended)

Here's an example of a list entry:

In [17]:
list_of_item_keys[6]

'10008XL'

In [18]:
list_of_dataframes[6].head()

Unnamed: 0,ds,y
0,2017-10-01,0
1,2017-10-02,0
2,2017-10-03,0
3,2017-10-04,0
4,2017-10-05,0


The prophet may shut up while propheting:

In [19]:
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

A list with the prophet's fitted models:

In [20]:
prophet_fitted_models = []

In [21]:
if read_write_prophet_fitted_models == 'w':
    for i in list_of_dataframes:
        prophet_fitted_models.append(Prophet().fit(i))

In [22]:
if read_write_prophet_fitted_models == 'w':
    with open(dump_directory + "prophet_fitted_models", "wb") as fp:
        pickle.dump(prophet_fitted_models, fp)

In [23]:
if read_write_prophet_fitted_models == 'r':
    with open(dump_directory + "prophet_fitted_models", "rb") as fp:
        prophet_fitted_models = pickle.load(fp)

And another one with the prophet's forecasts:

In [24]:
prophet_forecasts = []

In [25]:
if read_write_prophet_forecasts == 'w':
    for i in prophet_fitted_models:
        prophet_forecasts.append(i.predict(i.make_future_dataframe(periods=31))[['ds', 'yhat']].drop(forecast.head(92).index))

In [26]:
if read_write_prophet_forecasts == 'w':
    with open(dump_directory + "prophet_forecasts", "wb") as fp:
        pickle.dump(prophet_forecasts, fp)

In [27]:
if read_write_prophet_forecasts == 'r':
    with open(dump_directory + "prophet_forecasts", "rb") as fp:
        prophet_forecasts = pickle.load(fp)

In [28]:
list_of_item_keys[6]

'10008XL'

In [29]:
prophet_forecasts[6]['yhat'].tolist()

[0.8036191913829713,
 0.03978470266827813,
 0.040921233734018014,
 0.8070217807380953,
 0.043319142347723494,
 0.04453057987808118,
 0.04589960625688036,
 0.8099201771842401,
 0.04608568846929151,
 0.047222219536841525,
 0.8133227665414946,
 0.04962012814985492,
 0.05083156568096536,
 0.052200592060517,
 0.8162211629881572,
 0.05238667427327903,
 0.05352320533966512,
 0.8196237523428969,
 0.05592111395352031,
 0.057132551483850685,
 0.05850157786415361,
 0.8225221487905243,
 0.05868766007726661,
 0.05982419114337528,
 0.8259247381478441,
 0.06222209975804399,
 0.06343353728673332,
 0.0648025636677903,
 0.8288231345933428,
 0.06498864588021153,
 0.06612517694812992]

### Loading the Test Data from the Chair

In [30]:
test_data = []

In [31]:
for i in range(0, 5):
    df = pd.read_csv(test_data_directory + 'test_' + str(i) + '.csv', dtype={'pid': int})
    
    df.drop('Unnamed: 0', axis=1, inplace=True)
    
    ################################
    # Taken from Basil: datamining2/data/datasets_Basil/create_dataset_v0.3.ipynb
    #Creating key for train dataset
    keys3=[]
    i=0
    while i<len(df):
        key=str(df.iloc[i,0]) + str(df.iloc[i,1])
        keys3.append(key)
        i=i+1
        
    df['key']=keys3
    ################################
    
    df.drop(df.columns[0], axis=1, inplace=True)
    df.drop(df.columns[0], axis=1, inplace=True)
    
    df['soldout_day'] = df['sold_out_date'].str[-2:]
    
    df.soldout_day = pd.to_numeric(df.soldout_day, errors='coerce')
    
    df.drop(df.columns[1], axis=1, inplace=True)
    
    test_data.append(df)

In [32]:
test_data[0].head()

Unnamed: 0,stock,key,soldout_day
0,1.0,10001L,7
1,1.0,100035 ( 43-46 ),30
2,4.0,10008XL,30
3,1.0,10013L,24
4,1.0,10013M,22


In [33]:
list_of_item_keys[6]

'10008XL'

In [34]:
test_data[0].loc[df['key'] == list_of_item_keys[6]].iloc[0]['stock']

4.0

In [35]:
test_data[0].loc[df['key'] == list_of_item_keys[6]].iloc[0]['soldout_day']

30

### Evaluation

In [36]:
################################
# Taken from Chung: datamining2/measures/objectives.py
def soldout_day(pred, stock):
    """
    Calculates first day that stock hits 0 in a certain month for an item
    :param pred: Array of predicted sales units for an item
    :param stock: Stock at beginning of month for an item
    :return: Day of month that stock reaches 0
    """
    soldout_day = len(pred)
    for day in range(len(pred)):
        stock -= pred[day]
        #print(stock)
        if stock <= 0:
            soldout_day = day+1
            break
    return soldout_day
################################

In [37]:
soldout_day(prophet_forecasts[6]['yhat'].tolist(), test_data[0].loc[df['key'] == list_of_item_keys[6]].iloc[0]['stock'])

15

Absolute difference:

In [38]:
abs(test_data[0].loc[df['key'] == list_of_item_keys[6]].iloc[0]['soldout_day'] - soldout_day(prophet_forecasts[6]['yhat'].tolist(), test_data[0].loc[df['key'] == list_of_item_keys[6]].iloc[0]['stock']))

15