## Representing SARIMA Forecast Using Plotly Interactive Graphs

*Here we are making an attempt to come upwith SARIMA Model to forecast ride demand for managers. As discussed in the previous modules. But here we will presented the forecast results in a much more user friendly manner with interactive graph with an option to drill down to as deep as viewing daily data.The main objective of this section is to give the stakeholders a comprehensive overview of the demnad forecast data and makemit more interpretable. The aim is to answer as many question as possible, with the help over interactivity which helps the user to drill down or roll up data plotted in the graph.*

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from statsmodels.tsa.stattools import adfuller

*Importing data from the drive*

In [None]:
import datetime 
# Calling the fromtimestamp() function to
# extract datetime from the given timestamp
from google.colab import drive
drive.mount('/content/drive')
# ride = pd.read_csv("rideshare_kaggle_1.csv")
ride = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/rideshare_kaggle.csv")
ride['timestamp']=pd.to_datetime(ride['timestamp'], unit='s')
ride.dropna(axis=0,inplace=True)
ride["Date"] = ride["timestamp"].dt.date
ride["Time"] = ride["timestamp"].dt.time
ride["Weekday"]=ride["timestamp"].dt.day_name()

Mounted at /content/drive


In [None]:
ride['Date'] = pd.to_datetime(ride['Date'], format='%Y-%m-%d')

In [None]:
filtered_df = ride.loc[(ride['Date'] == '2018-11-27')]

In [None]:
len(filtered_df)

70135

*Dropping NULL values*

In [None]:
ride['datetime']=pd.to_datetime(ride['datetime'])
ride=ride.dropna()

In [None]:
ride['datetime']

0        2018-12-16 09:30:07
1        2018-11-27 02:00:23
2        2018-11-28 01:00:22
3        2018-11-30 04:53:02
4        2018-11-29 03:49:20
                 ...        
693065   2018-12-01 23:53:05
693066   2018-12-01 23:53:05
693067   2018-12-01 23:53:05
693069   2018-12-01 23:53:05
693070   2018-12-01 23:53:05
Name: datetime, Length: 637976, dtype: datetime64[ns]

In [None]:
grouped = ride.groupby(ride['datetime'].dt.floor('h'))['id'].count().reset_index()

grouped.rename(columns={'id': 'total_rides'}, inplace=True)

In [None]:
ride['datetime'].dt.floor('h')

0        2018-12-16 09:00:00
1        2018-11-27 02:00:00
2        2018-11-28 01:00:00
3        2018-11-30 04:00:00
4        2018-11-29 03:00:00
                 ...        
693065   2018-12-01 23:00:00
693066   2018-12-01 23:00:00
693067   2018-12-01 23:00:00
693069   2018-12-01 23:00:00
693070   2018-12-01 23:00:00
Name: datetime, Length: 637976, dtype: datetime64[ns]

In [None]:
grouped.head()

Unnamed: 0,datetime,total_rides
0,2018-11-26 03:00:00,77
1,2018-11-26 04:00:00,390
2,2018-11-26 05:00:00,616
3,2018-11-26 06:00:00,1462
4,2018-11-26 07:00:00,925


*Setting timestamp as the index for corresponding ride demand*

In [None]:
grouped.set_index('datetime',inplace=True)

This is the final dataset to be used in ARIMA forecasting

In [None]:
grouped['total_rides']=grouped['total_rides'].astype(float)

An overview of the dataset we are about to use

In [None]:
grouped

Unnamed: 0_level_0,total_rides
datetime,Unnamed: 1_level_1
2018-11-26 03:00:00,77.0
2018-11-26 04:00:00,390.0
2018-11-26 05:00:00,616.0
2018-11-26 06:00:00,1462.0
2018-11-26 07:00:00,925.0
...,...
2018-12-18 15:00:00,1709.0
2018-12-18 16:00:00,1712.0
2018-12-18 17:00:00,1712.0
2018-12-18 18:00:00,1745.0


In the code block given below, the pieces of code essentially help us
1. Firstly use the dataset to run SARIMA forecasting model
2. Predict(forecast) the ride demand for the last week of December 2018.
3. Create Interactive plot using plotly library and plot forecasted and already known values (both test and train data).
4. The last part of code helps in adding interactivity to the graph by allowing user to select the aggregation level of data ie weekly, daily, yearly etc.
The code helps in defining the buttons and their functions with respect to plotting data in the graph.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error
import plotly.graph_objects as go
from plotly.subplots import make_subplots




# Split into training and test sets
train_size = int(len(grouped) * 0.8)
train, test = grouped[:train_size], grouped[train_size:]

# Fit SARIMA model
model = SARIMAX(train, order=(1, 1, 1), seasonal_order=(0, 1, 1, 24))
model_fit = model.fit()

# Forecast
start_index = len(train)
end_index = len(train) + len(test) - 1
forecast = model_fit.predict(start=start_index, end=end_index)







# Evaluate model
mse = mean_squared_error(test, forecast)
rmse = np.sqrt(mse)
print('Test RMSE: %.3f' % rmse)

# Create interactive plot
fig = make_subplots(rows=1, cols=1, shared_xaxes=True)

fig.add_trace(go.Scatter(x=grouped.index, y=grouped['total_rides'], name='Actual'), row=1, col=1)
fig.add_trace(go.Scatter(x=test.index, y=forecast, name='Forecast'), row=1, col=1)

fig.update_layout(title='Ride Demand Forecast',
                  xaxis_title='Timestamp',
                  yaxis_title='Ride Demand',
                  height=400)

fig.update_layout(xaxis=dict(rangeselector=dict(buttons=list([
                                dict(count=1, label="1d", step="day", stepmode="backward"),
                                dict(count=7, label="1w", step="day", stepmode="backward"),
                                dict(count=1, label="1m", step="month", stepmode="backward"),
                                dict(count=6, label="6m", step="month", stepmode="backward"),
                                dict(count=1, label="YTD", step="year", stepmode="todate"),
                                dict(count=1, label="1y", step="year", stepmode="backward"),
                                dict(step="all")])),
                   rangeslider=dict(visible=True),
                   type="date"))

fig.show()



A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



Test RMSE: 240.850



No supported index is available. Prediction results will be given with an integer index beginning at `start`.



### The Interactive Graph

Above one can see the interactive graoh generated.
The users can view the data at different aggregation levels. Though some might seem irrelevant here due to nature of data ie the dataset used in the project spans for only a period of 2 months.

The Y-Axis depicts the ride demand at a given time and X-Axis depicts the timestamp

If you hover over the plotted lines of the graph, you will get values of timestamp and ride demand at each point on the plotted line.

For the ease of convinience of the user, there's a small window availalble at the bottom of the graph, which can be used to adjust the size of time window between which you want to see and observe the data.

The top right corner of the graph shows some self explanatory icons when you hover in that area. It provides option to download the graph, zoom in or out, reset the graph to original state

Thhis graph aims at given the decision makers a comprehensive overview of the data and ensure that it answers as many question as possible, pictorially.