# Network AI-tomation Blog series. Interface Traffic Prediction

Javier Antich. 

javier.antich@gmail.com | javier.antich@nokia.com

https://www.linkedin.com/in/javier-antich-romaguera/

In [None]:
#Importing a bunch of useful and common libraries, that I will most likely be using across all the 
#examples and use cases.

%matplotlib inline 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
import random as random
from datetime import datetime
from pandas.plotting import scatter_matrix
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from statsmodels.tsa.seasonal import seasonal_decompose
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from fbprophet import Prophet

We have a dataset of network traffic downloaded from Kaggle. 

https://www.kaggle.com/pattnaiksatyajit/network-analytics-time-series

while we do not have much context about this dataset, for the purpose that we want to ilustrate it is very suitable. You can easily replace this data with your own and use the same methodology used here.

In [None]:
traffic = pd.read_csv('../input/network-analytics-time-series/Network Analytics.csv')

The dataset has 25.631 entries of interface traffic data, time stamps for each sample are mostly every 5 minutes, but not exactly. It does not matter much, as ultimately we will be sub-sampling this.

In [None]:
traffic.info()

In [None]:
traffic.head()

We convert the Timestamp columnt to DateTime format, so python can use it adequately.

In [None]:
traffic['Timestamp']=pd.to_datetime(traffic['Timestamp'])

In [None]:
traffic = traffic.set_index('Timestamp')

In [None]:
traffic.rename(columns={'OutboundUtilzation (%)':'traffic_out'},inplace=True)  

In [None]:
traffic.info()

In [None]:
x = traffic.index
y = traffic['traffic_out']

In [None]:
fig, ax = plt.subplots(figsize=(16, 7))

ax.plot(x, y)

fig.suptitle('Interface traffic evolution- outbound')
ax.set_ylabel('% utilization')

ax.grid(True, axis='y')

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

As data is too noisy, because it is sampled every 5 minutes aprox, we will resample it to 1H periods, using the mean value for each such period.

In [None]:
traffic_1H = traffic.resample('1H').mean()

In [None]:
traffic_1H

In [None]:
x = traffic_1H.index[1500:]
y = traffic_1H[1500:]

In [None]:
fig, ax = plt.subplots(figsize=(16, 7))

ax.plot(x, y)

fig.suptitle('Traffic evolution')
ax.set_ylabel('% utilization')

ax.grid(True, axis='y')

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

Notice the somewhat abnormal traffic values on the first few days of January 2018, which we will pay attention to later on the notebook. 
For now, let´s import Prophet from Facebook and build a model to forecast traffic.

In [None]:
from fbprophet import Prophet

In [None]:
traffic_1H = traffic_1H.reset_index()

In [None]:
df_traffic= traffic_1H.rename(columns={'Timestamp': 'ds', 'traffic_out': 'y'})

In [None]:
model = Prophet()

In [None]:
model.fit(df_traffic)

In [None]:
future = model.make_future_dataframe(periods=20)

In [None]:
future.tail()

In [None]:
forecast = model.predict(future)

In [None]:
forecast.tail()

In [None]:
model.plot(forecast)

The graphs above show the forecasted values for traffic by Prophet. Traffic for the next 20 days looks a bit strange, since it goes down significantly. Below you can see the decomposition: 

In [None]:
model.plot_components(forecast, weekly_start = 1);

The problem is that for some reason, during those first few days of Jaunary, traffic has abnormally gone down, and Prophet has learnt as a new trend, based on which it has forecasted the subsequent values. However, this seems more like a special event, maybe related to Christmas or New Year vacation, that has impacted locally traffic, but it may not necessarily translate into a new trend that sustains over time. Prophet allows us to explicitly specify those special days so the model can learn from them, and even use that information in the future if the same type of even repeats.

In [None]:
special_days = pd.DataFrame({
  'holiday': 'new_year',
  'ds': pd.to_datetime(['2017-12-31']),
  'lower_window': 0,
  'upper_window': 10,
})

We create a new model, using the special_days as input.

In [None]:
special_days = special_days.reset_index()
model2 = Prophet(holidays=special_days)
forecast2 = model2.fit(df_traffic).predict(future)

In [None]:
model2.plot(forecast2)

The new model shows a much more natural set of forecasted values, providing continuity to the previous trend, and minimizing the impact over time of the special event captured. In the decomposition, it can be seen that the trend computed has not been impacted by the special event. 

In [None]:
model2.plot_components(forecast2, weekly_start=1);

So, now it is time for you to start playing with network metric forecasting, using Prophet or any other algorithm/methodology, as part of your Network AI-tomation journey.