# Food Demand Forecasting
## Context
Your client is a meal delivery company which operates in multiple cities. They have various fulfillment centers in these cities for dispatching meal orders to their customers. The client wants you to help these centers with demand forecasting for upcoming weeks so that these centers will plan the stock of raw materials accordingly.

The replenishment of majority of raw materials is done on weekly basis and since the raw material is perishable, the procurement planning is of utmost importance. Secondly, staffing of the centers is also one area wherein accurate demand forecasts are really helpful. Given the following information, the task is to predict the demand for the next 10 weeks (Weeks: 146-155) for the center-meal combinations in the test set: - Historical data of demand for a product-center combination (Weeks: 1 to 145) - Product(Meal) features such as category, sub-category, current price and discount - Information for fulfillment center like center area, city information etc.

## Content
Weekly Demand data (train.csv): Contains the historical demand data for all centers

fulfilment_center_info.csv: Contains information for each fulfillment center

meal_info.csv: Contains information for each meal being served

## Reference
https://www.kaggle.com/ghoshsaptarshi/av-genpact-hack-dec2018

# Read data

In [None]:
!pip install seaborn
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# read csv file
train = pd.read_csv('train.csv')
meal = pd.read_csv('meal_info.csv')
fulfilment_center = pd.read_csv('fulfilment_center_info.csv')

In [None]:
# combime data
train_merged = train.merge(meal, on=['meal_id'], how='left').merge(fulfilment_center, on=['center_id'], how='left')

In [None]:
# display
train_merged.head()

# Explatory Data Analysis

## Target variable (num_orders)

In [None]:
# statistics of target variable
train_merged['num_orders'].describe()

In [None]:
# histgram of target variable
sns.distplot(train_merged['num_orders'])

In [None]:
# skewness
train_merged['num_orders'].skew()

In [None]:
# kurtosis
train_merged['num_orders'].kurtosis()

In [None]:
# Num_orders (target variable) is highly skewd so that Log transformation is applied.
train_merged['num_orders_log'] = np.log(train_merged['num_orders'])
sns.distplot(np.log(train_merged['num_orders_log']))

# skewness
print('skewness: {}'.format(np.log(train_merged['num_orders']).skew()))
# kurtosis
print('kurtosis: {}'.format(np.log(train_merged['num_orders']).kurtosis()))

## Explanatory variable

In [None]:
train_merged.describe()

### Week

In [None]:
# lineplot
sns.lineplot(x=train_merged['week'], y=train_merged['num_orders_log'])

### checkout_price

In [None]:
# scatterplot
sns.scatterplot(x=train_merged['checkout_price'], y=train_merged['num_orders_log'])

### base_price

In [None]:
# scatterplot
sns.scatterplot(x=train_merged['base_price'], y=train_merged['num_orders_log'])

## fulfilment_center_info

### center_type

In [None]:
sns.catplot(x='center_type', y='num_orders_log', data=train_merged)

### op_area (km2)

In [None]:
sns.scatterplot(x=train_merged['op_area'], y=train_merged['num_orders_log'])

## meal_info

### Category

In [None]:
sns.catplot(x='category', y='num_orders_log', data=train_merged)
plt.xticks(rotation=80)

### cuisine

In [None]:
sns.catplot(x='cuisine', y='num_orders_log', data=train_merged)
plt.xticks(rotation=80)

In [None]:
sns.catplot(x='category', y='num_orders_log', hue='cuisine', data=train_merged)
plt.xticks(rotation=80)

##  Prepare data

In [None]:
X = train_merged.copy()
X['category_cuisine'] = X['category'] + '_' + X['cuisine']
X.head()

In [None]:
X.info()

In [None]:
# label encoding for lightGBM
from sklearn import preprocessing

X_le = X.copy()

for column in ['week', 'category_cuisine','center_id', 'meal_id', 'city_code', 'region_code', 'center_type']:
    le = preprocessing.LabelEncoder()
    le.fit(X[column])
    X_le[column] = le.transform(X[column])

In [None]:
X_le.head()

In [None]:
X_le.info()

In [None]:
data = X_le.drop(['id', 'category', 'cuisine', 'num_orders'], axis=1)

In [None]:
data.head()

In [None]:
from sklearn.model_selection import train_test_split
data_train, data_dev = train_test_split(data, test_size=0.3, random_state=42)
data_valid, data_test = train_test_split(data_dev, test_size=0.3, random_state=42)

In [None]:
data_train.to_csv('data_train.csv', index=None)
data_valid.to_csv('data_valid.csv', index=None)
data_test.to_csv('data_test.csv', index=None)