# Facebook Prophet

### Anomaly Detection for Indica Powder in East Region - Primary Data

This Model can detect anomalies for each distributor, based on his buying behaviour.      

**Data Source**
1. Azure Data Warehouse
2. Excel - ACTDTE for merging the dates for no data.

**Model Design**
1. Taking the raw data from Azure database.           
2. Grouping the number of items(**Billing_Qty**) bought by distributor for each month.            
3. Imputing Zeroes for the month where there was no purchase.                    
4. Creating a dataframe with Multi-Index (**Distributor Code and Code**)                 
5. Renaming the columns to ds and y (**Actdte and Billing_Qty**)          
6. Fit the FB prophet Model to the dataframe by looping each Distributor and Code (**keys**)                 
7. Detect the anomalies from the forecast.            
8. Export the data into .xlsx format.              

**Note**: 
1. If you want to run the model for a sample data, make sure you alter the **keys** to sample data and check whether the grouped data is available in **new data**.

(**Bold** used to represent the column names and variables used in the Notebook )

# Importing packages

In [1]:
from ckpackages import azsql         #Custom Package for Cavinkare
from fbprophet import Prophet
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Data Import

### Azure DW

In [286]:
 Query = """select *,substring(Product_Hierarchy,1,8) as 'Code' 
            from V_AN_PC_UNDERCUTTING_PRI
            where region = 'East' 
            and substring(Product_Hierarchy,1,8) in ('06131253','06131254') ;"""

In [287]:
pri_data = azsql.callstatement(Query)

### Excel

In [288]:
Actual_data = pd.read_excel(r"D:\Analytics\Undercutting\Azure\Data\Default Actdte.xlsx")

### Raw Data cleaning

In [289]:
Distributor = pri_data
Distributor = Distributor[['Distributor_Code','Code','Actdte','Billing_Qty']]

In [290]:
Distributor.head(2)

Unnamed: 0,Distributor_Code,Code,Actdte,Billing_Qty
0,2002742,6131253,2019-03-01,18.0
1,2002987,6131253,2020-02-01,1.0


In [291]:
print ((Distributor['Distributor_Code'].nunique()))

430


### Grouping up the data

In [292]:
Distributor     = Distributor.groupby(['Distributor_Code','Code','Actdte'])["Billing_Qty"].sum().reset_index()

### Removing rows with single transaction as we can't detect anomaly with 1 transaction

In [293]:
rmv_single_rows = Distributor.groupby(["Distributor_Code","Code"]).agg(Count = ("Actdte","count")).reset_index()
rmv_single_rows[rmv_single_rows["Count"] == 1]
Distributor_Clean = pd.merge(Distributor,rmv_single_rows,on = ["Distributor_Code","Code"],how = "left")
Distributor_Clean = Distributor_Clean[Distributor_Clean["Count"] != 1]
len(Distributor_Clean)print ((Distributor['Distributor_Code'].nunique()))

4022

In [294]:
Distributor = Distributor_Clean[['Distributor_Code','Code','Actdte','Billing_Qty']].copy()

In [306]:
print ((Distributor['Distributor_Code'].nunique()))

380


In [310]:
Distributor.head(2)

Unnamed: 0,Distributor_Code,Code,Actdte,Billing_Qty
0,2002018,6131253,2018-05-01,8.0
1,2002018,6131253,2018-07-01,6.0


# Dataset for Prophet Model - newdata and keys

In [295]:
keys = Distributor[["Distributor_Code","Code"]].drop_duplicates().reset_index(drop=True)
keys.head(2)

Unnamed: 0,Distributor_Code,Code
0,2002018,6131253
1,2002019,6131253


In [309]:
keys["Distributor_Code"].nunique()

380

In [330]:
newdata = Distributor.set_index(["Distributor_Code","Code"])
newdata.index.names

FrozenList(['Distributor_Code', 'Code'])

In [336]:
newdata

Unnamed: 0_level_0,Unnamed: 1_level_0,Actdte,Billing_Qty
Distributor_Code,Code,Unnamed: 2_level_1,Unnamed: 3_level_1
0002002018,06131253,2018-05-01,8.0
0002002018,06131253,2018-07-01,6.0
0002002018,06131253,2018-10-01,5.0
0002002018,06131253,2018-12-01,10.0
0002002018,06131253,2019-06-01,15.0
...,...,...,...
0002008366,06131253,2020-06-01,25.0
0002008469,06131253,2020-05-01,59.0
0002008469,06131253,2020-06-01,40.0
0002008541,06131253,2020-05-01,1.0


### FB Prophet Model Fit

In [346]:
def fit_predict_model(dataframe, interval_width = 0.95, changepoint_range = 0.8):
    
    m = Prophet(daily_seasonality = False, 
                yearly_seasonality = False, 
                weekly_seasonality = False,
                seasonality_mode = 'multiplicative', 
                interval_width = interval_width,
                n_changepoints = 20,
                changepoint_range = changepoint_range)
    
    m = m.fit(dataframe)
    
    forecast = m.predict(dataframe)
    forecast['y'] = dataframe['y'].reset_index(drop = True)
    
#     print('Displaying Prophet plot')
#     fig1 = m.plot(forecast)
    return forecast

### Classifying Anomalies - Manual Formula (Does not included under Algorithm output)

In [347]:
def detect_anomalies(forecast):
    forecasted = forecast[['ds','trend', 'yhat', 'yhat_lower', 'yhat_upper', 'y']].copy()
    #forecast['fact'] = df['y']

    forecasted['anomaly'] = 0
    forecasted.loc[forecasted['y'] > forecasted['yhat_upper'], 'anomaly'] = 1
    forecasted.loc[forecasted['y'] < forecasted['yhat_lower'], 'anomaly'] = -1

    #anomaly importances
    forecasted['importance'] = 0
    forecasted.loc[forecasted['anomaly'] == 1, 'importance'] = (forecasted['y'] - forecasted['yhat_upper'])/forecast['y']
    forecasted.loc[forecasted['anomaly'] ==-1, 'importance'] = (forecasted['yhat_lower'] - forecasted['y'])/forecast['yhat_lower']
    
    return forecasted

### Model

In [348]:
%%time
final_dataset = pd.DataFrame()
for x,y in keys.iterrows():
    a                           =    y[0]
    b                           =    y[1]
    temp                        =    newdata.loc[a].loc[b].copy()
    temp                        =    pd.merge(Actual_data,temp,how ='left',on='Actdte').fillna(value = 0)
    temp                        =    temp.rename(columns = {"Actdte":"ds","Billing_Qty":"y"})
    pred                        =    fit_predict_model(temp)
    anomaly                     =    detect_anomalies(pred)
    anomaly["Distributor_Code"] =    y[0]
    anomaly["Code"]             =    y[1]
    final_dataset               =    final_dataset.append(anomaly)
    temp                        =    pd.DataFrame()

Wall time: 18min 18s


### Output

In [349]:
final_dataset.to_excel(r"D:\Analytics\Undercutting\Azure\Data\Indica_East_data.xlsx",index = False)

In [304]:
keys.shape

(384, 2)

In [305]:
Distributor.Distributor_Code.nunique()

380