<center>
    
# Exponential Smoothing algorithm to predict calls volume in a Call Center
    
</center>

This notebook uses Call Center data from 2019. The data is for one year of call logs. The Call Center Managers want to measure its performance based on certain metrics. The managers want to rise the metrics by improving the Workforce Management (WFM) process of scheduling and staffing agents. A machine learning approach is proposed as follows:

1. Predict the daily calls volume 
2. Split the daily calls volume hourly

An Expoential Smoothing algorithm is used to do the predictions.

**Importing libraries and data**

In [1]:
from utils import *

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (18,6)

In [2]:
import warnings
warnings.filterwarnings("ignore")
import logging
logging.disable(logging.CRITICAL)

**Data Preparation**

1. The data in the files is monthly data corresponding to call center logs. The calls are registered as soon as they arrive to the call center. Aggregating by seconds or by minutes is useless since the theory of call centers suggest that the daily behavior of a call center is a Poisson model of hourly variable rate.

2. Another important thing in the theory of modeling call centers is that the calls volume and the rates are also dependent on the day of the week. The present analysis use the day of the month instead to smooth the seasonality (usually weekly) to monthly seasonality. In this way, the missing data imputation is less prompt to bias.

3. A target variable `calls volume` is created to register hourly calls volume. The numerical values of number of abandon calls, prequeue, inqueue, agent_time, postqueue, total_time and global sla achieve per hour are used as covariates (a covariate can be thought as a regressor).

In [None]:
# Load the yearly dataframe for prediction
yearly = get_yearly_frame()
yearly.sample(5, random_state = 42)

## **Using Exponential Smoothing**

Consider the target variable `calls_volume` with historical data of 1 year. The frequency of this variable is hourly (`H`). The exponential smoothing algorithm don't use covariates, for this predictions the feature variables `prequeue`, `inqueue`, `agent_time`, and `postqueue` are left out.

In [None]:
from darts import TimeSeries
from darts.models import ExponentialSmoothing

from darts.metrics import coefficient_of_variation, mae

Using Darts library, create the time series for the `calls_volume` target variable.

In [None]:
# Get the datafram for the year correctly formatted
df = get_formatted_time_series_frame_from(data=yearly, year="2019")

# Split Dataframe in Variables and Features
variable = df['calls_volume']
features = df[['prequeue', 'inqueue', 'agent_time', 'postqueue']] # 'total_time', 'abandon', 'sla'

# Get the darts Time Series objects
ts_variable = TimeSeries.from_series(variable,
                                     freq='H')
# ts_features = TimeSeries.from_dataframe(features, 
#                                         freq='H')

### **Use of Exponential Smoothing to forecast hourly calls volume - Self historical test**

As the above description suggest, the weekly seasonal behavior of 7 days impose to look back to the same amount of time to predict accurately. This is the reason to select $24\times 7=168$ lags past covariates. 

In [None]:
model_exponentialsmoothing = ExponentialSmoothing(seasonal_periods=168)

Train the model in the `ts_variable`

In [None]:
model_exponentialsmoothing.fit(series=ts_variable)

Instead of making new predictions with the model, use it to generate a historical forecast. In this sense, the performance of the algorithm is measure  for one year of data, and the validity of the results will generalize

In [None]:
pred_series = model_exponentialsmoothing.historical_forecasts(series=ts_variable[-1000:], #[-336:],
                                                              forecast_horizon=7,
                                                              stride=1,
                                                              verbose=True)

**Visualizing the historical prediction**

Plot the last week of the actual data against the predicted historical data. It shows an accuracy of around 63 % according to R-RMSE metric.

Exponential smoothing is a proven method. It is widely used for the managers. Having this highe performance in terms of accuracty makes it the natural choice of forecasting algorithm.

In [None]:
plot_prediction_and_test(target = ts_variable[-168:],
                         prediction= pred_series[-168:]);

### **Use of Exponential Smoothing to forecast hourly calls volume - Testing with features**

Making predictions with exponential smoothing is highly dependent on the most recent data. Since one week is the recomended seasonality from call center management theory consider 1 % of the data as the test set.

In [None]:
# Split the dataset into train / set
train_variable, test_variable = ts_variable.split_after(0.99)

Use 168 periods (this mean 1 week) as seasonal perdios for the model.

In [None]:
test_model = ExponentialSmoothing(seasonal_periods=168)

In [None]:
# Train the model in the corresponding train time series
test_model.fit(series=train_variable)

In [None]:
prediction = test_model.predict(n=len(test_variable))

**Visualizing the test prediction**

In [None]:
plot_prediction_and_test(target=test_variable, 
                         prediction=prediction);

### **Resampling the prediction**

Scheduling staff in the call center gets improved when the aggregation time is for one shift of 8 hours. By just resampling the prediction and test variables, the R-RMSE improves for almost 5 %

In [None]:
r_train_variable = train_variable.resample(freq='8h',
                                           method='pad')
r_test_variable = test_variable.resample(freq='8h', 
                                         method='pad')
r_prediction = prediction.resample(freq='8h',
                                   method='pad')

In [None]:
plot_prediction_and_test(target= r_test_variable, 
                         prediction=r_prediction);

**Plotting the resampled series with train set included**

In [None]:
plot_predict(train_target=r_train_variable[-96:],
             test_target=r_test_variable,
             prediction=r_prediction,
             low_percentile=0.05);