## Problem Statement

#### Demand forecasts are fundamental to plan and deliver products and services. Accurate forecasting of demand can help the manufacturers to maintain appropriate stock which results in reduction in loss due to product not being sold and also reduces the opportunity cost (i.e. higher demand but less availability => opportunity lost). Despite such relevance, manufacturers have difficulty choosing which forecast model is the best for their use case. In this project, historical sales data corresponding to multiple(25) items sold in 10 stores are provided and participants are expected to come up with a best model to predict the future demand for products which results in maximum profit for the manufacturer. Predict the demand for the next 3 months at the item level (i.e. all the stores combined).


## Minimum Requirements

#### The end objective of the participant is to produce a model that gives the best prediction to the manufacturer. Such a model must include the seasonality of the items sold. 

In [9]:
#Importing required libraries
import time
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import io, base64, os, json, re, glob
import datetime
from datetime import timedelta
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [10]:
# Reading data from cdv file to dataframe
data= pd.read_csv('/content/sale_data.csv',
                     low_memory=False, 
                     parse_dates=['date'], 
                     index_col=['date'])


In [11]:
data.head()

Unnamed: 0_level_0,store,item,sales
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-01-01,1,1,13
2013-01-02,1,1,11
2013-01-03,1,1,14
2013-01-04,1,1,13
2013-01-05,1,1,10


In [12]:
data.shape

(913000, 3)

In [13]:
data.columns

Index(['store', 'item', 'sales'], dtype='object')

In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 913000 entries, 2013-01-01 to 2017-12-31
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype
---  ------  --------------   -----
 0   store   913000 non-null  int64
 1   item    913000 non-null  int64
 2   sales   913000 non-null  int64
dtypes: int64(3)
memory usage: 27.9 MB


In [15]:
data.describe()

Unnamed: 0,store,item,sales
count,913000.0,913000.0,913000.0
mean,5.5,25.5,52.250287
std,2.872283,14.430878,28.801144
min,1.0,1.0,0.0
25%,3.0,13.0,30.0
50%,5.5,25.5,47.0
75%,8.0,38.0,70.0
max,10.0,50.0,231.0


In [16]:
#Cheching if any null values in any column
data.isnull().sum()

store    0
item     0
sales    0
dtype: int64

In [17]:
#getting uniue counts for all columns
data.nunique()

store     10
item      50
sales    213
dtype: int64

In [18]:
data['item'].value_counts()

1     18260
38    18260
28    18260
29    18260
30    18260
31    18260
32    18260
33    18260
34    18260
35    18260
36    18260
37    18260
39    18260
2     18260
40    18260
41    18260
42    18260
43    18260
44    18260
45    18260
46    18260
47    18260
48    18260
49    18260
27    18260
26    18260
25    18260
24    18260
3     18260
4     18260
5     18260
6     18260
7     18260
8     18260
9     18260
10    18260
11    18260
12    18260
13    18260
14    18260
15    18260
16    18260
17    18260
18    18260
19    18260
20    18260
21    18260
22    18260
23    18260
50    18260
Name: item, dtype: int64

In [19]:
# Sorting data by date in ascending order
data = data.sort_values('date', ascending=True)
data.head(10)

Unnamed: 0_level_0,store,item,sales
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-01-01,1,1,13
2013-01-01,7,12,26
2013-01-01,7,46,27
2013-01-01,8,12,54
2013-01-01,9,12,35
2013-01-01,10,12,41
2013-01-01,6,46,23
2013-01-01,1,13,37
2013-01-01,2,13,51
2013-01-01,5,46,20


In [20]:
print("There are totally",data['item'].nunique(),"Unique Items.")

There are totally 50 Unique Items.


In [21]:
# For each day finding sum of sales per item
data=data.groupby(['item','date'])['sales'].sum().reset_index()



In [22]:
data.head()

Unnamed: 0,item,date,sales
0,1,2013-01-01,133
1,1,2013-01-02,99
2,1,2013-01-03,127
3,1,2013-01-04,145
4,1,2013-01-05,149


In [23]:
# Rolling sum of sales for 3 months per date per item
datas=[]
for i in data.item.unique():
    tmp=data.loc[data.item==i,:]
    tmp['sales']=tmp['sales'].rolling(90).sum().shift(-89)
    datas.append(tmp)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


In [24]:
# dropping null values
for i in range(len(datas)):
    datas[i]=datas[i].dropna(axis=0)

In [None]:
# To view insights of items sale in each store

def plot_item(df_raw,i):
    plt.subplots(figsize = (16, 5))
    plt.grid()
    plt.xlabel("Year")
    plt.ylabel("Sale")
    plt.title('Item '+str(i)+' - Sales per item')
    plt.plot(df_raw['date'],df_raw['sales'])

#for i in range(1,df_raw['item'].nunique()+1):

item_input=int(input("Enter the item number to view sales of each item : "))
plot_item(datas[item_input],item_input)

In [None]:

# Date Features
def create_date_features(datas):
    for i in range(len(datas)):
        datas[i]['year'] = datas[i].date.dt.year
        datas[i]['day_of_year'] = datas[i].date.dt.dayofyear
        datas[i]['month'] = datas[i].date.dt.month
        datas[i]['day_of_month'] = datas[i].date.dt.day
        datas[i]['week'] = datas[i].date.dt.weekofyear
        datas[i]['day_of_week'] = datas[i].date.dt.dayofweek
        datas[i]["is_wknd"] = datas[i].date.dt.weekday //5
        datas[i]['is_month_start'] = datas[i].date.dt.is_month_start.astype(int)
        datas[i]['is_month_end'] = datas[i].date.dt.is_month_end.astype(int)
    return datas

datas = create_date_features(datas)
#datas[0].head(10)


In [None]:
# sales per year
sns.barplot(datas[0]['year'],datas[0]['sales'])

#### Above graph implements that there is frequesnt increse in sales each year

In [None]:
# Sales per month
sns.barplot(datas[0]['month'],datas[0]['sales'])

#### From the above graph its clearly visible that the sales are maximum in middle of year

In [None]:
# Sales per week
sns.lineplot(datas[0]['week'],datas[0]['sales'])

In [None]:
# Sales per day of year
sns.lineplot(datas[0]['day_of_year'],datas[0]['sales'])

In [None]:
# Set the color palette
sns.set_palette(sns.color_palette("Paired"))
# Plot the data, specifying a different color for data points in
# each of the day categories (weekday and weekend)
ax = sns.lineplot(x='day_of_month', y='sales', data=datas[0], hue='is_wknd')
# Customize the axes and title
ax.set_title("Sales")
ax.set_xlabel("day")
ax.set_ylabel("total sales")
# Remove top and right borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()

### ML modeling

In [None]:
# Importing sklearn modules for training and testing
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler 
x=datas[0][['date']].values.astype('object')
y=datas[0]['sales']
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.40,random_state=21)
print('Shape of Training Xs:{}'.format(x_train.shape))
print('Shape of Test Xs:{}'.format(x_test.shape))
print('Shape of Training y:{}'.format(y_train.shape))
print('Shape of Test y:{}'.format(y_test.shape))

In [None]:
y_train.isnull().sum()

### Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(random_state=0)
classifier.fit(x_train, y_train)

y_pred = classifier.predict(x_test)



In [None]:
from sklearn.metrics import r2_score
r2_score(y_test[0:278], y_pred[0:278])

In [None]:
x_train1, x_test1, y_train1, y_test1 =train_test_split(x_train,y_train,test_size=0.20,random_state=60)
print('Shape of Training Xs:{}'.format(x_train1.shape))
print('Shape of Test Xs:{}'.format(x_test1.shape))
print('Shape of Training y:{}'.format(y_train1.shape))
print('Shape of Test y:{}'.format(y_test1.shape))

classifier = DecisionTreeClassifier()
classifier.fit(x, y)
y_pred = classifier.predict(x_test1)
y_pred



In [None]:
from sklearn.metrics import r2_score
r2_score(y_test1[0:278], y_pred[0:278])

### Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=10,criterion="entropy")
classifier.fit(x_train, y_train)

y_pred = classifier.predict(x_test)



In [None]:
from sklearn.metrics import r2_score
r2_score(y_test[0:278], y_pred[0:278])

In [None]:
x_train1, x_test1, y_train1, y_test1 =train_test_split(x_train,y_train,test_size=0.30,random_state=60)
print('Shape of Training Xs:{}'.format(x_train1.shape))
print('Shape of Test Xs:{}'.format(x_test1.shape))
print('Shape of Training y:{}'.format(y_train1.shape))
print('Shape of Test y:{}'.format(y_test1.shape))

classifier = DecisionTreeClassifier()
classifier.fit(x, y)
y_pred = classifier.predict(x_test1)
y_pred



In [None]:
from sklearn.metrics import r2_score
r2_score(y_test1[0:278], y_pred[0:278])

### KNN

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5,metric="minkowski",p=2)
knn.fit(x_train, y_train)

y_pred = knn.predict(x_test)



In [None]:
from sklearn.metrics import r2_score
r2_score(y_test[0:278], y_pred[0:278])

In [None]:
x_train1, x_test1, y_train1, y_test1 =train_test_split(x_train,y_train,test_size=0.10,random_state=60)
print('Shape of Training Xs:{}'.format(x_train1.shape))
print('Shape of Test Xs:{}'.format(x_test1.shape))
print('Shape of Training y:{}'.format(y_train1.shape))
print('Shape of Test y:{}'.format(y_test1.shape))

classifier = DecisionTreeClassifier()
classifier.fit(x, y)
y_pred = classifier.predict(x_test1)
y_pred



In [None]:
from sklearn.metrics import r2_score
r2_score(y_test1[0:278], y_pred[0:278])

## Conclusions and Recommendations 

#### Forecasting future demands is a challenge that companies have to face in order to be able to make decisions that allow them to compete by generating better supply chain results. Demand forecasting is an essential activity for business planning, as it results in several benefits, such as: reduced waste, better allocation of resources, increased sales and revenue. This way, it helps organizations to be in the right place, at the right time, with the right product.