# **Advanced Analytics Showcase - Advanced Time Series Forecasting**

![time_series_meme](images/time_series_meme.jpg)

## Learning Objectives

* Explain the key concepts of time series and the most common forecasting algorithm - SARIMA
* Recognize common use cases and potential business impact of time series forecasting
* Familiarize with key vocabularies in forecasting and understand the language of a Data Analyst/Data Scientist
* Appreciate the advantages of ARIMA over Excel-based forecasting

![](images/intro_meme.jpg)


## Introduction to Python Jupyter Notebook

**What is a Jupyter Notebook**

What you are working with right now is a "Python Jupyter Notebook".

- Python Jupyter Notebook is a popular and powerful tool used by data scientists and analysts to conduct analytics projects.
- It allows users to create and share documents containing live code, equations, visualizations, and text.
- Jupyter Notebook provides an interactive environment where data scientists can explore data, build models, and communicate their findings effectively.
- The flexibility of Jupyter Notebook makes it easy to experiment with different data analysis techniques and share the results with stakeholders.

**How to run a Jupyter Notebook?**

- Insert text to the code cell by clicking on it.
- To execute the code, select the code cell and press "Shift" + "Enter" keys on the keyboard.
- Wait for the code to execute, which might take some time for longer cells.
- Once executed, the output will be displayed below the code cell.

Practice on the cell below

In [None]:
PRACTICE_TEXT = "REPLACE WITH YOUR NAME"
print("Hi {}! Welcome to the clustering showcase session. You have learnt how to execute a Jupyter code cell!".format(PRACTICE_TEXT))

## Introduction to Time Series Forecasting

**Time series is anything observed sequentially over time.**
- Most commonly, time series data is observed at equally spaced successive intervals of time.
- The measurements taken during an event in a time series are arranged in a proper chronological order.
- A time series containing records of a single variable is termed as univariate, and more than one variable a multivariate.

**Time Series Forecasting is a method of predicting future values of a variable based on its past values.**
- It involves analyzing patterns and trends in time series data to make predictions about future values.
- Common methods used for time series forecasting include exponential smoothing, ARIMA, neural networks, and regression analysis.
    - Excel Forecasting Sheet function uses exponential smoothing algorithm.
- Time series forecasting is used in many industries to make informed decisions about resource allocation, production planning, and other business operations.


<img src="images/time-series-forecasting-example.png" alt="alt text">

## Possible Applications

Time Series Forecasting is useful when time plays the most crucial role in determining the value of variable of interest and there are no or very few external variables that can impact the particular variable. Every time series forecast deals with continuous data over time.

- Sales forecasting for retail
- Inventory management forecasting
- Energy demand forecasting
- Traffic volume forecasting
- Workforce requirement forecasting
- Temperature forecasting
- Crop yield forecasting
- Stock price forecasting
- Customer demand forecasting
- Website traffic forecasting
- Disease outbreak forecasting
- Real estate market forecasting
- Airline passenger volume forecasting
- Mobile app usage forecasting
- Social media engagement forecasting
- Hotel occupancy forecasting
- Tourist arrivals forecasting

<table>
  <tr>
    <td><img src="images/discussion.png" height="150" width="150" /></td>
    <td>
      <p style="font-size: 18px;">Discussion: </p><br>
      <p style="font-size: 18px;">
        Are any of these use cases relevant to your company, or any other use cases not in this list? Can you use Excel Forecast Sheet to solve them?
      </p>
    </td>
  </tr>
</table>


# Advantages of Advanced Time Series Forecasting over Excel Forecasting

**Recap: Demand forecasting introduced in M1:**

<br>

<img src = "images/time_series_example.png">

**Challenge: Complex demand patterns**
- The demand pattern is very complex. Excel Forecast Sheet has limited flexibility to customize for the trend, seasonality, and cyclical patterns
- You can finetune the parameters of advanced forecasting model or use different types of algorithms to account for the different patterns in the data

**Challenge: Include business context in the model**
- There are multiple business factors that could impact cement demand - promotions, holidays, competitor activities, etc. Excel Forecast Sheet cannot consume these information
- Some forecasting algorithms, e.g., ARIMAX, will be able to take in both time series and external variables to make predictions

**Challenge: Accurate forecasting results**
- Due to the limitations in model flexibility and data, accuracy of the Excel Forecast Sheet is less satisfactory
- Many advanced forecasting algorithms are designed to handle different types of time series data, which will lead to much better performance

In summary, the table below summarises the advantages of advanced time series forecasting over Excel forecasting

|                                                                                | Advanced Forecasting | Excel Forecast Sheet |
|--------------------------------------------------------------------------------|----------------------|----------------------|
| Flexibility to be customized to account for different types of time series data | Yes                  | No                   |
| Ability to include business context and external variables                     | Yes                  | No                   |
| More accurate forecasting results                                              | Yes                  | No                   |
| Ability to handle outliers                                                     | Yes                  | No                   |
| Ability to handle large and complex datasets                                   | Yes                  | No                   |


## Key Concepts from the Time Series Forecasting

### Components of Time Series
The various reasons or the forces which affect the values of an observation in a time series are the components of a time series. The four categories of the components of time series:
- Trend: Trend is defined as long term increase or decrease in the data. It can be linear or non-linear
- Seasonality: It is characteristics of time series in which data experiences regular and predictable changes after a fixed period.
- Cyclic: A cyclic pattern exists when data exhibit rises and falls that are not of fixed period.
- Irregularity/Residuals: After removing all above-mentioned components from time series, the remaining part is called irregular component.

![](images/timeseries-components.PNG) <br>

### Evaluation metrics

- __Mean Absolute Error (MAE)__: It is mean of mod(actual - predicted value). This metric is easy to interpret as it has the same unit of measurement as the initial series. It can range from [0, +inf)
- __Mean Squared Error(MSE)__: This is mean of (actual-predicted)^2. The squared error gives higher penalty to large deviations. This ranges from [0,+inf)
- __Root mean squared error (RMSE)__: This is standard deviation of the residuals. Range: [0, +inf)

**Use of various metrics**

- RMSE has the benefit of penalizing large errors more so can be more appropriate in some cases, for example, if being off by 10 is more than twice as bad as being off by 5. But if being off by 10 is just twice as bad as being off by 5, then MAE is more appropriate.
- From an interpretation standpoint, MAE is clearly the winner. RMSE does not describe average error alone and has other implications that are more difficult to tease out and understand.
- On the other hand, one distinct advantage of RMSE over MAE is that RMSE avoids the use of taking the absolute value, which is undesirable in many mathematical calculations

<table><tr>
<td><img src = "images/mae.jpeg" height = "300" width = "300"/></td>
<td><img src = "images/mse.jpeg" height = "300" width = "300"/></td>
</tr></table>


## Forecasting Methods

### 4 Stages of Maturity in Forecasting

There are various advanced forecasting methods available, ranging from basic statistical aggregation to complex neural networks, covering the entire spectrum of maturity. Although many small and medium-sized enterprises (SMEs) typically use simple methods such as Moving Averages or Excel Forecast Sheet at the basic level, your company could obtain significant advantages by adopting intermediate or even advanced level of forecasting methods, which are not overly complicated to implement. <br>

![](images/stages_of_forecasting.png)

<br>

### Intermediate Stage Methods

<br>

| Methods |                                       Description                                       |                                                                Potential Use Cases                                                                |
|:-------:|:---------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------:|
| ARIMA   | A way to find patterns and trends in data that changes a lot over time                  | Financial forecasting, sales forecasting, demand forecasting, predicting traffic volume, forecasting energy consumption                           |
| ARIMAX  | It takes into account outside factors that may affect the outcome                       | Economic forecasting, stock market analysis, marketing campaign optimization, predicting customer churn, weather forecasting                      |
| SARIMA  | It takes into account for regular patterns that repeat within each season               | Retail sales forecasting, airline passenger forecasting, hotel room occupancy forecasting, predicting electricity demand during different seasons |
| SARIMAX | It takes into account outside factors and seasonal patterns that may affect the outcome | Retail sales forecasting with external factors such as promotions, predicting electricity demand with weather and holiday as exogenous variables  |

<br>

In summary:
- you should choose ARIMA models when you are working with univariate time series data and do not have any known external factors that may impact the time series.
- Use ARIMAX models when you have external variables that can be used to improve the forecasting accuracy of the time series.
- Use SARIMA and SARIMAX models when your data exhibit seasonal patterns that need to be captured to make accurate forecasts.

## Apply Forecasting Like a Data Scientist

Now, let's revisit our FFC sales data and apply advanced forecasting methods on this dataset.

You have aggregated the sales transaction into daily interval and derived variables on dates, e.g., day of week, weekend indicator, and Singapore public holiday indicator, etc.


### Load required libraries

Python is an open-sourced programming language with many contributors creating libraries for different purposes. These libraries will need to be loaded at the beginning of the scripts. After loading, we will be able to use the functions in these libraries

In [None]:
# Note - all these lines in the code block starting with # signs are comments
# Comments provide explanations of the code and also instructions for execution

## Press Shift + Enter to load all the required libraries

!wget https://raw.githubusercontent.com/RISEBCG/DAB/main/M3A3-forecasting/forecasting_helper.py
from forecasting_helper import *

### Load the dataset

Python will need to load the data into the memory before consuming or transforming the data

In [None]:
# Load the dataset prepared for forecasting.
# Press Shift + Enter

df = pd.read_csv('https://raw.githubusercontent.com/RISEBCG/DAB/main/M3A3-forecasting/ffc_sales_data.csv', parse_dates=['Date'], index_col=['Date'] )

In [None]:
# Let's take a glance at the data
# .head() command shows the top 10 rows of the dataset
# Key in df.head(10) and press Shift + Enter



Beside the daily sales, there are other features available in this dataset:
- Weekend: a binary indicator for whether the day is a weekend
- Public Holiday: a binary indicator for whether the day is a public holiday
- month_1 ... month_12: binary indicators for which month does the day belong to
- Weekday_1 ... Weekday_7, binary indicators for which day of week does the day belong to. 1 For Sunday, 7 for Saturday

In [None]:
# Let's visualise this time series by a line chart
# Press Shift + Enter

fig = px.line(df, x=df.index, y='DailySales', title='Daily Sales')
fig.show()

### Time series decomposition

Through statistical methods, a time series can be split into its key components - trend, seasonality, and residuals. Note the cyclic and seasonality components are combined in this exercise

In [None]:
# In this step, we decompose the time series into its key component
# Press Shift + Enter

decompose(df['DailySales'])

<table>
  <tr>
    <td><img src="images/discussion.png" height="150" width="150" /></td>
    <td>
      <p style="font-size: 18px;">Discussion: </p><br>
      <p style="font-size: 18px;">
        - Which type of methods are more suitable for FFC - ARIMA or seasonal ARIMA?<br>
        - What insights can you draw from the trend and seasonality components?
      </p>
    </td>
  </tr>
</table>



### Train a SARIMA model

- Based on the seasonality observed from the decomposition, it is recommended to utilize a SARIMA model for handling the seasonality.
- During Machine Learning, it is crucial to divide data into training and testing sets. The model is trained on the training set and validated using the testing set.
- For this particular exercise, the SARIMA model will be trained on all the available data up until July 2022, and the model's performance will be evaluated using data from the remainder of 2022.


In [None]:
# In this step, we will split the time series into training and testing dataset
# The cutoff date is 2022-07-01
# Replace the "CUTOFF_DATE" with "2022-07-01"
# Press Shift + Enter

train, test = create_training_testing_data(df, 'CUTOFF_DATE')

In [None]:
# replace "MODEL_TYPE" with "sarima"
# Press Shift + Enter

sarima = train_forecasting_model(train, type="MODEL_TYPE")

### Evaluate SARIMA

- We will compute Mean Absolute Error (MAE) and Mean Squared Error (MSE) from the testing set
- We will compare the model performance with Excel Forecast Sheet function, using the same training and testing sets

In [None]:
# replace model with sarima, and "MODEL_TYPE" with "sarima"
# Press Shift + Enter

sarima_outputs = evaluate_model_performance(test, model, type="MODEL_TYPE")

<table>
  <tr>
    <td><img src="images/discussion.png" height="150" width="150" /></td>
    <td>
      <p style="font-size: 18px;">Discussion: </p><br>
      <p style="font-size: 18px;">
        - Which forecasting is more accurate? SARIMA or Excel?<br>
        - How are SARIMA forecasts different from Excel forecasts?
      </p>
    </td>
  </tr>
</table>

### Train Evaluate a SARIMAX model

The SARIMA model has demonstrated superior accuracy compared to Excel Forecast. However, can we improve the model's performance further? It's worth considering the external variables present in the dataset. By incorporating these variables into the SARIMAX model, we can achieve even better accuracy. Variables such as public holidays and weekends are particularly relevant to the sales context and should be considered.

In [None]:
# replace "MODEL_TYPE" with "sarimax"
# Press Shift + Enter
# This cell might take up to 3 minutes to complete

sarimax = train_forecasting_model(train, type="MODEL_TYPE")

In [None]:
# replace model with sarimax, and "MODEL_TYPE" with "sarimax"
# Press Shift + Enter

sarimax_outputs = evaluate_model_performance(test, model, type="MODEL_TYPE")

In [None]:
# Let's combine all the predictions together
# Press Shift + Enter

combine_viz(sarima_outputs, sarimax_outputs)

<table>
  <tr>
    <td><img src="images/discussion.png" height="150" width="150" /></td>
    <td>
      <p style="font-size: 18px;">Discussion: </p><br>
      <p style="font-size: 18px;">
        - Which forecasting is more accurate? SARIMAX or SARIMA?<br>
        - How are SARIMAX forecasts different from SARIMA forecasts?<br>
        - If you are the business owner of FFC, which model will you trust more?<br>
        - What other important business context variables can be included in the SARIMAX model?
      </p>
    </td>
  </tr>
</table>


## Summary:
- Various industries leverage advanced forecasting methods to address diverse business problems.
- By handling complex patterns and external factors, advanced forecasting methods deliver superior accuracy and outperform Excel-based forecasting.
- Insights of great value can be derived by decomposing time series data into its fundamental components.
- Although they unlock immense potential value, ARIMA, ARIMAX, SARIMA, and SARIMAX are readily accessible forecasting methods that are easy to implement.

## Additional Resources

In this showcase, we skipped several technical concepts in time series forecasting. If you are interested to learn more about these time series concepts, you can refer to the links below:

- [stationary](https://otexts.com/fpp2/stationarity.html): time series with constant mean and variance over time. It is a common assumption for many forecasting algorithms
- [differencing](https://otexts.com/fpp2/stationarity.html): a technique to make the time series stationary
- [Dickey-Fuller test](https://www.machinelearningplus.com/time-series/augmented-dickey-fuller-test/): a hypothesis test for stationary
- [auto-correlation](https://otexts.com/fpp2/autocorrelation.html): the degree of correlation between a time series and its past values
- [ACF/PACF plots](https://towardsdatascience.com/interpreting-acf-and-pacf-plots-for-time-series-forecasting-af0d6db4061c): visualisation for auto-correlation and partial auto-correlation
- [p,d,q,P,D,Q,M](https://otexts.com/fpp2/seasonal-arima.html): model parameters of a SARIMA model

<font style="font-family:Trebuchet MS;">

***

*This marks the end of this lesson*<br><br>

<div style="text-align: center"><font size="8"><font style="font-family:Trebuchet MS;">Happy Forecasting !!!</font></font></div>