# Time Series Forecasting. Classical approach.

## Introduction

Welcome to this notebook, where we embark on an exploration of two classical techniques, ARIMA (AutoRegressive Integrated Moving Average) and Prophet, to predict the number of cyberattacks a country may face in the following month.


## Table of Contents

1. [Time Series Visualization](#tsv)
2. [Time Series Component Analysis](#tsca)
   - 2.1 [Trend](#trend)
   - 2.2 [Seasonality](#seasonality)
   - 2.3 [Stationarity](#stationarity)




In [None]:
# Requiered imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
import seaborn as sns
from pmdarima.arima import auto_arima
from prophet import Prophet
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error
from utils import *
# To ignore warnings
import warnings
warnings.filterwarnings("ignore")

## Time Series Visualization
<a id='tsv'></a>
Let us read the data and visualize it as a time series. 

In [None]:
# Read data
df1 = pd.read_csv('../Data/21_november_to_april.csv')
df2 = pd.read_csv('../Data/22_april_to_november.csv')
df3 = pd.read_csv('../Data/22_november_to_april.csv')
df4 = pd.read_csv('../Data/23_april_to_november.csv')

# Concatenate dataframes
df = pd.concat([df1, df2, df3, df4], axis=0, ignore_index=True)

# Delete dataframes
del  df1, df2, df3, df4

In [None]:
# Select some countries to analyze
df_Spain = select_country(df, 'Spain')
df_USA = select_country(df, 'United States')
df_Singapore = select_country(df, 'Singapore')
df_Germany = select_country(df, 'Germany')
df_Japan = select_country(df, 'Japan')

In [None]:
daily_count_Spain = visualize_ts(df_Spain)

In [None]:
daily_count_USA = visualize_ts(df_USA)

In [None]:
daily_count_Singapore = visualize_ts(df_Singapore)

In [None]:
daily_count_Germany = visualize_ts(df_Germany)

In [None]:
daily_count_Japan = visualize_ts(df_Japan)

## Time Series Component Analysis
<a id='tsca'></a>
Analyzing the components of a time series is a critical step that provides a wealth of valuable information before applying classical forecasting models. Each component, trend, seasonality, and noise, carries distinct insights, contributing to a comprehensive understanding that forms the basis for effective modeling. The components of a time series are:

**Trend**: A gradual shift or movement to relatively higher or lower values over a long period of time.
 - When the time series analysis shows a general trend , that is upward . It is called uptrend.
 - When the time series analysis shows a general trend , that is downward. It is called downtrend.
 - When there is no trend, we call it horizontal or stationary trend.

**Seasonality**: Patterns of variation that repeat at specific time intervals. These can be weekly, monthly, yearly, etc. Seasonal changes indicate deviations from the trend in specific directions.

**Residuals**: Unusual events that occur in the data, such as a sudden increase in heart rate for a person during exercise. These cause random errors and are also referred to as “white noise.”

### Trend
<a id='trend'></a>
Used techniques to detect trends:

 - **Visual Inspection**: Plotting the time series data can often reveal the presence of a trend. A clear upward or downward movement over time suggests the presence of a trend component. Visual inspection allows you to observe the overall pattern and identify any deviations or changes in the series.

 - **Moving Averages**: Moving averages are widely used for trend analysis. They help smooth out short-term fluctuations in the data, making it easier to identify the underlying trend. Common types of moving averages include the simple moving average (SMA), weighted moving average (WMA), and exponential moving average (EMA).

In [None]:
trend(daily_count_Spain)

In [None]:
trend(daily_count_USA)

In [None]:
trend(daily_count_Singapore)

In [None]:
trend(daily_count_Germany)

In [None]:
trend(daily_count_Japan)

None of the time series displays a distinct trend pattern, except for the United States, which shows a shy upward trend.

### Seasonality

<a id='seasonality'></a>

Used techniques to detect seasonality:

- **Autocorrelation Function (ACF) Plot**: The ACF plot shows the correlation between the time series and its lagged values. For a seasonal time series, the ACF plot often exhibits significant spikes at regular intervals, indicating the presence of seasonality.

- **Seasonal Decomposition**: Seasonal decomposition of time series (STL) is a method that separates a time series into its individual components: trend, seasonality, and residual. This technique decomposes the series to better understand and analyze the seasonal component independently.

#### ACF & PACF

In [None]:
plot_acf_pacf('Spain', daily_count_Spain)

In [None]:
plot_acf_pacf('USA', daily_count_USA)

In [None]:
plot_acf_pacf('Singapore', daily_count_Singapore)

In [None]:
plot_acf_pacf('Germany', daily_count_Germany)

In [None]:
plot_acf_pacf('Japan', daily_count_Japan)

#### Seasonal Decomposition

In [None]:
decomposition_ts(daily_count_Spain)

In [None]:
decomposition_ts(daily_count_USA)

In [None]:
decomposition_ts(daily_count_Singapore)

In [None]:
decomposition_ts(daily_count_Germany)

In [None]:
decomposition_ts(daily_count_Japan)


Examining the ACF/PACF plots and decomposition plots, it is challenging to assert the presence of seasonality components in the time series. However, the USA appears to exhibit a subtle annual seasonality. 

Furthermore, it can be asserted that there is some white noise present, complicating the task of prediction. This complexity arises from the presence of peaks in cyberattacks that are challenging to explain solely by examining the time series.

## Stationarity
<a id='stationarity'></a>