# <font color='blue'>REVENUE FORECASTING MODEL WALKTHROUGH</font> 

## 1. Introduction 

This generic solution playbook runs a Demand Forecast model using advanced Machine Learning techniques. The solution depicts the end to end ML life-cycle such as data preprocessing, filteration, feature engineering, HyperParameter Tuning and the trained model. This notebook depicts the possibility to try several Machine Learning models that can be applied to time series data that results in various levels of forecast accuracy. The production ready model is finalised based on the least RMSE

Data: The data used for solution proposition is 3 years of sales information at the product level .

Model: Cnn-1D  

## 2. Data Exploration 

The data is retail company's sales information of grocery products at the UPC level, for which the revenue forecasting has to be done, the details of upc are given below:

**Import libraries**

In [None]:
import numpy as np
import pandas as pd
import pandas_profiling as pdp

from datetime import datetime

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.style.use('seaborn-notebook')
sns.set()

import os
import warnings
warnings.filterwarnings("ignore")

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import adfuller, acf, pacf
import statsmodels.formula.api as smf
import statsmodels.tsa.api as smt
import statsmodels.api as sm
import scipy.stats as scs
from pandas.plotting import autocorrelation_plot
import statistics as st
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf

import fbprophet
from fbprophet.plot import add_changepoints_to_plot
from fbprophet.plot import plot_yearly
from fbprophet.diagnostics import cross_validation
from fbprophet.diagnostics import performance_metrics
from fbprophet.plot import plot_cross_validation_metric

from keras.models import Sequential
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Conv1D
from keras.layers import MaxPooling1D
from keras.layers import GRU
from keras.layers import concatenate

In [None]:
data = pd.read_csv("data.csv", parse_dates = ["date_of_sale"], error_bad_lines=False)
data.head()

In [None]:
data.drop('Unnamed: 0', axis=1, inplace=True)

In [None]:
data.departmentname.unique()

In [None]:
data.categoryname.unique()

In [None]:
data.upc.value_counts()

**For Data Preprocessing the trend and seasonality is captured for a few top UPC's as the first step in Data Exploration. This notebook will show findings in the context of the top selling UPC (Second row in the above; 25097000000)**

In [None]:
data = data[data.upc == 25097000000]
data.head()

**As the data is not in order, sort it based on the date to be considered - in our case it is the date_of_sale**

In [None]:
data = data.sort_values(by='date_of_sale')
data.head(10)

**Set the date column as index**

In [None]:
data.set_index(data["date_of_sale"],inplace=True)

In [None]:
data.index = pd.to_datetime(data.index)

## 3. Data Preprocessing

### Bucketization 
To achieve consistency and observe comprehendable seasonality, trends, and residual patterns, the data can be aggregated or bucketized on a daily or a weekly basis at the product level

In [None]:
daily_bucket = data["net_sales"].resample("D").sum()

In [None]:
daily_bucket.shape

In [None]:
daily_bucket.head()

**Graph of daily sales of best-selling product**

In [None]:
plt.plot(daily_bucket)
plt.title("variation of netsales and date")
fig = plt.gcf() 
fig.set_size_inches(20,10)
plt.show()

**Graph for mean and standard deviation based on daily bucket**

In [None]:
plt.figure(figsize=(16,6))
plt.plot(daily_bucket.rolling(window=7, center=False).mean(), label='Rolling Mean')
plt.plot(daily_bucket.rolling(window=7, center=False).std(), label='Rolling sd')
plt.legend()
plt.show()

**Graph showcasing the trend, seasonality and residual for sales**

In [None]:
db_factors_weekly = sm.tsa.seasonal_decompose(daily_bucket, freq = 7, model = "autoregressive")
db_factors_weekly.plot();

**Weekly bucketization of the data**

In [None]:
weekly_bucket = data['net_sales'].resample('W').sum()
weekly_bucket.head()

**Graph for total sales of 1st sku based on weekly bucket**

In [None]:
plt.figure(figsize=(16,8))
plt.title('Total Sales of UPC = 860')
plt.xlabel('Purchased Date')
plt.ylabel('Sales Quantity')
plt.plot(weekly_bucket)
plt.show()

**Graph showcasing the decomposition of sales when bucketized on a weekly basis**

In [None]:
sm.tsa.seasonal_decompose(weekly_bucket, freq = 2, model = "autoregressive").plot();

## 4. Data Preparation 
### 4.1 AutoCorrelation and Partial AutoCorrelation to check the relationship between consecutive data in time series

**Graph showing small lags, we can see that we have a correlation of 0.73, which can be good for prediction purposes**

In [None]:
plt.figure(figsize=(16,8))
autocorrelation_plot(daily_bucket)
plt.show()

**Graph shows when lag is less, correlation exists for real. Also, you can see slight peaks, implying there is seasonality around 7**

In [None]:
plot_acf(daily_bucket, lags = 50)
plt.show()

**Graph shows the strong partial auto correlation with lag = 1, not much there on**

In [None]:
plot_pacf(daily_bucket, lags = 50)
plt.show()