# Notebook Summary
**Goal:** 
Apply anomaly detection using Prophet for Betacom Data. 

For a column of interest, e.g. **peak_upload_speed** from Customer-DFW, we use past upload data to make prediction for a future data point, and detect whether the actual data falls outside prediction upperbond or lowerbond. In the end, graphs for labeled Anomalies will be plot.



**Prophet Algorithm Documentation:**

Link: https://facebook.github.io/prophet/docs/quick_start.html



**Input 3 Datasets:** 
- Gyan-Database: core_stats (RAN data)
- Gyan-Database: randomized core_stats
- Prometheus Database: **discard** now due to low data quality

**Output:**
1. Anomaly graphs generate for each column.
2. Prediction for featured Dataframe




**Notebook Outline:**
1. Functions using Prophet

 * fit_predict_model:
     - Build a Prophet Model
     - Fit the model
     - Make prediction 
 * detect_anomalies: 

2. Retrieve Data from CORE_stats customer data
3. Retrieve Data from CORE_stats randomized Data
4. Apply prediction and anomaly labeling for multiple columns and save graphs

## Prophet Functions

In [1]:
from prophet import Prophet


def fit_predict_model(dataframe, interval_width=0.99, changepoint_range=0.8):
    '''
        Input: 

        Output: a forecasted dataframe includes

    '''

    m = Prophet(daily_seasonality=True, yearly_seasonality=True, weekly_seasonality=True,
                seasonality_mode='multiplicative',
                interval_width=interval_width,
                changepoint_range=changepoint_range)
    m = m.fit(dataframe)
    forecast = m.predict(dataframe)
    forecast['fact'] = dataframe['y'].reset_index(drop=True)
    return forecast

def detect_anomalies(forecast):
    '''
    What it does:  based on rule: label anomaly data point based on whether the actual data is greater than the upper bond of prediction or smaller than the lower bond of the prediction.

    Input: forecast dataframe from Prophet model.
    Output: forecast dataframe with anomlies labeled. 

    '''
    forecasted = forecast[['ds', 'trend', 'yhat',
                           'yhat_lower', 'yhat_upper', 'fact']].copy()

    forecasted['anomaly'] = 0
    forecasted.loc[forecasted['fact'] >
                   forecasted['yhat_upper'], 'anomaly'] = 1
    forecasted.loc[forecasted['fact'] <
                   forecasted['yhat_lower'], 'anomaly'] = -1

    # anomaly importances
    forecasted['importance'] = 0
    forecasted.loc[forecasted['anomaly'] == 1, 'importance'] = \
        (forecasted['fact'] - forecasted['yhat_upper'])/forecast['fact']
    forecasted.loc[forecasted['anomaly'] == -1, 'importance'] = \
        (forecasted['yhat_lower'] - forecasted['fact'])/forecast['fact']

    return forecasted





def plot_anomalies(forecasted):
    '''

    '''
    import altair as alt
    interval = alt.Chart(forecasted).mark_area(interpolate="basis", color='#7FC97F').encode(
        x=alt.X('ds:T',  title='date'),
        y='yhat_upper',
        y2='yhat_lower',
        tooltip=['ds', 'fact', 'yhat_lower', 'yhat_upper']
    ).interactive().properties(
        title='Anomaly Detection'
    )

    fact = alt.Chart(forecasted[forecasted.anomaly == 0]).mark_circle(size=15, opacity=0.7, color='Black').encode(
        x='ds:T',
        y=alt.Y('fact', title='sales'),
        tooltip=['ds', 'fact', 'yhat_lower', 'yhat_upper']
    ).interactive()

    anomalies = alt.Chart(forecasted[forecasted.anomaly != 0]).mark_circle(size=30, color='Red').encode(
        x='ds:T',
        y=alt.Y('fact', title='PeakUpload Speed'),
        tooltip=['ds', 'fact', 'yhat_lower', 'yhat_upper'],
        size=alt.Size('importance', legend=None)
    ).interactive()

    return alt.layer(interval, fact, anomalies)\
              .properties(width=870, height=450)\
              .configure_title(fontSize=20)

## Load Core_stats from Database

In [8]:
import numpy as np
import mysql.connector
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.read_csv("Existing_rand_data_after_July15.csv")


In [10]:
df=df[30000:].reset_index(drop=True)

### Anomaly Detection Graph

In [11]:
rand_list = ['total_attached_user',
             'total_rejected_user', 'peak_upload_speed', 'peak_download_speed',
             'enodeb_shutdown_count', 'handover_failure_count',
             'bearer_active_user_count', 'bearer_rejected_user_count', 'total_users',
             'total_dropped_packets', 'enodeb_connected_count',
             'enodeb_connection_status']

#### Anomaly Graph V1

In [None]:
for item in rand_list:

    df_rand_prophet = df[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[:5000])

    chart.save('Anomaly_graphs/V1_No_Distribution/Anomaly_{}.html'.format(item))

  components = components.append(new_comp)
  components = components.append(new_comp)
  components = components.append(new_comp)


Anomaly rate is:  -0.001984126984126984


In [None]:
for item in rand_list:

    df_rand_prophet = df[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[5000:10000])

    chart.save('Anomaly_graphs/V1_Long_period_anomalies/Anomaly_{}.html'.format(item))

## Anomaly Graph V2

In [None]:
for item in rand_list:

    df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[:5000])
    
    chart.save('Anomaly_graphs/V1_No_Distribution/Anomaly_{}.html'.format(item))

## Insert Database Option

For items in Interested Columns, 

In [None]:
df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[:5000])

In [95]:

item= "peak_upload_speed"
df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
    columns={"stats_timestamp": "ds", item: "y"})
pred = fit_predict_model(df_rand_prophet)
pred_anomalies = detect_anomalies(pred)
print("Anomaly rate is: ",
      pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

chart = plot_anomalies(pred_anomalies[:5000])

anomaly_df = pred_anomalies[(pred_anomalies["anomaly"]==-1) | (pred_anomalies["anomaly"]==1)]

Anomaly rate is:  0.0033400133600534404


In [96]:
anomaly_df

Unnamed: 0,ds,trend,yhat,yhat_lower,yhat_upper,fact,anomaly,importance
675,2022-07-15 13:25:29,35991.326482,35991.326482,20500.508749,53302.046471,54253,1,0.017528
676,2022-07-15 13:27:29,35995.512518,35995.512518,18951.339387,53536.41559,56518,1,0.052755
679,2022-07-15 13:31:17,36003.465986,36003.465986,18133.423094,53664.654425,72806,1,0.262909
680,2022-07-15 13:33:17,36007.652022,36007.652022,18899.857879,52898.481748,59495,1,0.110875
681,2022-07-15 13:35:17,36011.838058,36011.838058,18774.739125,51968.900169,71954,1,0.277748
682,2022-07-15 13:37:17,36016.024094,36016.024094,20070.09678,53194.739013,62885,1,0.154095
683,2022-07-15 13:39:17,36020.21013,36020.21013,19994.654706,50469.516741,59641,1,0.153778
685,2022-07-15 13:43:17,36028.582202,36028.582202,18894.176614,50716.144441,68011,1,0.254295
686,2022-07-15 13:45:18,36032.803121,36032.803121,19278.213754,52925.689686,58661,1,0.09777
687,2022-07-15 13:47:08,36036.640321,36036.640321,20098.869585,53398.612698,67677,1,0.210978
