# **DEEP LEARNING MODEL WITH TENSORFLOW AND KERAS**
## **ABOUT THE DATA:**
### About Food Demand Forecasting Challenge
- Demand forecasting is a key component to every growing online business. Without proper demand forecasting processes in place, it can be nearly impossible to have the right amount of stock on hand at any given time. A food delivery service has to deal with a lot of perishable raw materials which makes it all the more important for such a company to accurately forecast daily and weekly demand.
 
- Too much inventory in the warehouse means more risk of wastage, and not enough could lead to out-of-stocks — and push customers to seek solutions from your competitors. In this challenge, get a taste of demand forecasting challenge using a real dataset.

### Problem Statement
- Your client is a meal delivery company which operates in multiple cities. They have various fulfillment centers in these cities for dispatching meal orders to their customers. The client wants you to help these centers with demand forecasting for upcoming weeks so that these centers will plan the stock of raw materials accordingly.

- The replenishment of majority of raw materials is done on weekly basis and since the raw material is perishable, the procurement planning is of utmost importance. Secondly, staffing of the centers is also one area wherein accurate demand forecasts are really helpful. Given the following information, the task is to predict the demand for the next 10 weeks (Weeks: 146-155) for the center-meal combinations in the test set:  

- Historical data of demand for a product-center combination (Weeks: 1 to 145)
    - Product(Meal) features such as category, sub-category, current price and discount
    - Information for fulfillment center like center area, city information etc.
 

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = 15,8
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split as splt
# for handling the outliers
from sklearn.ensemble import IsolationForest
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras.optimizers import Adadelta
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from tensorflow.keras.models import load_model

## 1 IMPORTING THE DATA

In [2]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

In [3]:
print(train.head())

In [4]:
print(test.head())

## 2 FEATURE ENGINEERING
- **2.1. CHANGING THE DATA TYPE**

In [5]:
X = train.drop('id,week,num_orders,checkout_price,base_price'.split(','),axis=1)
Y = test.drop('id,week,checkout_price,base_price'.split(','),axis=1)

In [6]:
X = X.astype(str)
Y = Y.astype(str)

In [7]:
# print(X.describe())
# print(Y.describe())

In [8]:
train['center_id,meal_id,emailer_for_promotion,homepage_featured'.split(',')] = X
test['center_id,meal_id,emailer_for_promotion,homepage_featured'.split(',')] = Y

In [9]:
# train.describe(include='all')

In [10]:
# test.describe(include='all')

- **2.2 CREATING DUMMY VARIABLES**

In [11]:
dataset = train
for i in 'center_id,meal_id,emailer_for_promotion,homepage_featured'.split(','):
    data = pd.get_dummies(dataset[i],drop_first=True)
    dataset = pd.concat([dataset,data],axis=1)
train = dataset
train.drop('center_id,meal_id,emailer_for_promotion,homepage_featured'.split(','),axis=1,inplace=True)

In [12]:
dataset = test
for i in 'center_id,meal_id,emailer_for_promotion,homepage_featured'.split(','):
    data = pd.get_dummies(dataset[i],drop_first=True)
    dataset = pd.concat([dataset,data],axis=1)
test = dataset
test.drop('center_id,meal_id,emailer_for_promotion,homepage_featured'.split(','),axis=1,inplace=True)

In [13]:
# train.info()

In [14]:
# test.info()

## 4 EXCLUDING THE OUTLIERS

In [15]:
new_train = train.drop(['id','week'],axis=1)

In [16]:
# # for handling the outliers
# # from sklearn.ensemble import IsolationForest

# clf = IsolationForest(max_samples = 100, random_state = 42)
# clf.fit(new_train)
# y_noano = clf.predict(new_train)
# y_noano = pd.DataFrame(y_noano, columns = ['Top'])
# y_noano[y_noano['Top'] == 1].index.values

# new_train = new_train.iloc[y_noano[y_noano['Top'] == 1].index.values]
# new_train.reset_index(drop = True, inplace = True)
# print("Number of Outliers:", y_noano[y_noano['Top'] == -1].shape[0])
# print("Number of rows without outliers:", new_train.shape[0])

In [17]:
# new_train.info()

## 3 SEPARATING THE INDEPENDENT AND DEPENDENT VARIABLES

In [18]:
X = new_train.drop('num_orders',axis=1)
Y = new_train['num_orders']

In [19]:
# X.head()

In [20]:
new_test = test.drop(['id','week'],axis=1)

In [21]:
# new_test.head()

## 4 FEATURE SCALING

In [22]:
scale = StandardScaler()

In [23]:
data = X
# fitting and transforming the X into feature scaling
X = scale.fit_transform(X)
# converting again into dataframe
X = pd.DataFrame(X,columns=data.columns)
data = new_test
# fitting and transforming the test into feature scaling
new_test = scale.transform(new_test)
# converting again into dataframe
new_test = pd.DataFrame(new_test,columns=data.columns)

In [24]:
# X.shape

In [25]:
# new_test.shape

In [26]:
# print(X.head())
# print(new_test.head())

## 5 SPLITTING THE X DATA INTO TRAIN AND TEST

In [27]:
# X_train, X_test, Y_train, Y_test = splt(X, Y, test_size=0.9781, random_state=42)

In [28]:
# X_train.info()

## 6 SAMPLE MODEL

In [29]:
def build_regressor():
    regressor = Sequential()
    regressor.add(Dense(units=66,kernel_initializer='normal',
                     activation='relu',input_dim=130))
    regressor.add(Dense(units=66,kernel_initializer='normal',
                     activation='relu'))
    regressor.add(Dense(units=66,kernel_initializer='normal',
                     activation='relu'))
    regressor.add(Dense(units=66,kernel_initializer='normal',
                     activation='relu'))
    regressor.add(Dense(units=66,kernel_initializer='normal',
                     activation='relu'))
    regressor.add(Dense(units=1,kernel_initializer='normal',
                     activation='relu'))
    regressor.compile(optimizer=Adadelta(),loss='mean_squared_logarithmic_error')
    return regressor
regressor = KerasRegressor(build_fn= build_regressor,batch_size=10,epochs=100)

In [30]:
regressor.fit(x=X,y=Y)

Train on 456548 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/1

<tensorflow.python.keras.callbacks.History at 0x7efcea4dd0d0>

In [32]:
# regressor.fit(x=X_train,y=Y_train)

Train on 9998 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100

<tensorflow.python.keras.callbacks.History at 0x7f977a322310>

In [38]:
# parameters = {'batch_size': [10,100],
#              'epochs': [100,500]}

In [39]:
# grid_search = GridSearchCV(estimator=regressor,
#                           param_grid=parameters,
#                           cv=8)

In [65]:
# grid_search = grid_search.fit(X=X_train,y=Y_train)

In [83]:
# batch=100,epochs=200,genpact score=58
msle = cross_val_score(estimator=regressor, X=X_train, y=Y_train, cv=8, n_jobs= -1)

In [84]:
msle

array([-1.2108303 , -1.09141175, -1.11844567, -1.14719059, -1.16087241,
       -1.10070157, -1.05073974, -1.13713647])

In [34]:
# batch=10,epochs=100,genpact score=
msle2 = cross_val_score(estimator=regressor, X=X_train, y=Y_train, cv=8, n_jobs= -1)

In [35]:
msle2

array([-1.08299447, -1.07596189, -1.111733  , -1.05212329, -1.10924443,
       -1.09144832, -1.06011402, -1.11257951])

### With Week And Different Tuning get score 103

In [41]:
Y_pred = regressor.predict(new_test)



In [43]:
Y_pred

array([190.6174 , 196.05923, 167.09648, ..., 161.66206, 126.94097,
       153.99043], dtype=float32)

In [None]:
# id,num_orders

In [44]:
submission = pd.DataFrame(Y_pred,columns=['num_orders'])

In [45]:
submission = pd.concat([test['id'],submission],axis=1)

In [46]:
submission.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32573 entries, 0 to 32572
Data columns (total 2 columns):
id            32573 non-null int64
num_orders    32573 non-null float32
dtypes: float32(1), int64(1)
memory usage: 381.8 KB


In [47]:
submission.describe()

Unnamed: 0,id,num_orders
count,32573.0,32573.0
mean,1248476.0,123.600601
std,144158.0,36.218822
min,1000085.0,44.30249
25%,1123969.0,94.348747
50%,1247296.0,120.853554
75%,1372971.0,148.965805
max,1499996.0,243.347885


In [48]:
submission.head()

Unnamed: 0,id,num_orders
0,1028232,190.617401
1,1127204,196.059235
2,1212707,167.096481
3,1082698,140.126144
4,1400926,119.28215


In [49]:
test[['id']].head()

Unnamed: 0,id
0,1028232
1,1127204
2,1212707
3,1082698
4,1400926


In [50]:
test.shape

(32573, 132)

In [51]:
submission.shape

(32573, 2)

In [52]:
submission.to_csv('submission.csv',index=False)

### New Predition (Y_pred2)

In [92]:
Y_pred2 = regressor.predict(new_test)



In [96]:
submission2 = pd.DataFrame(Y_pred2,columns=['num_orders'])

In [97]:
submission2 = pd.concat([test['id'],submission2],axis=1)

In [98]:
submission2.describe()

Unnamed: 0,id,num_orders
count,32573.0,32573.0
mean,1248476.0,207.850876
std,144158.0,233.581116
min,1000085.0,3.809295
25%,1123969.0,51.215984
50%,1247296.0,123.923637
75%,1372971.0,277.77655
max,1499996.0,1996.601929


In [99]:
submission2.head()

Unnamed: 0,id,num_orders
0,1028232,218.885559
1,1127204,184.395081
2,1212707,101.682968
3,1082698,37.129036
4,1400926,32.776661


In [100]:
submission2.to_csv('submission2.csv',index=False)

## How To Save And Reload Model

- **SAVING THE MODEL**

In [38]:
# My model name is regressor and I am using model.save function to the model
# in HDF5 format.
# regressor.model.save('model.h5')

- **RELOADING THE MODEL**

In [52]:
# Reassigning the model after loading the models(model2,model3)
model = load_model('model3.h5')

## PREDICTIONS

- **PREDICTION 3**

In [34]:
Y_pred = model.predict(new_test)



In [49]:
submission = pd.DataFrame({'id':list(test['id']),
                           'num_orders':list(Y_pred.ravel())})

In [50]:
submission.head()

Unnamed: 0,id,num_orders
0,1028232,259.886139
1,1127204,158.027649
2,1212707,137.175491
3,1082698,26.660126
4,1400926,32.790962


In [51]:
submission.to_csv('submission3.csv',index=False)

- **PREDICTION 4**

In [53]:
Y_pred = model.predict(new_test)



In [54]:
submission = pd.DataFrame({'id':list(test['id']),
                           'num_orders':list(Y_pred.ravel())})

In [55]:
submission.to_csv('submission4.csv',index=False)