# University Certificate in Artificial Intelligence (Hands on AI, Third Challenge, 2022-2023, UMONS)
# Introduction to time series analysis and forecasting




In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [20, 5]


## White noise time series



- Generate a time series with 500 observations from a white noise process with zero mean and unit standard deviation.



In [None]:
# Hint: use np.random.normal


- Plot the generated tiime series.


- Compute and plot the ACF for 50 lags. Did you expect to see such results? Why?

In [None]:
from statsmodels.graphics.tsaplots import plot_acf
# Hint: use plot_acf



* Perform a Ljung-Box test for the first ten lags.

In [None]:
from statsmodels.stats.diagnostic import acorr_ljungbox
# Hint: use acorr_ljungbox


## Real-world time series

In [None]:

# Read the data file
DF = pd.read_csv("../data/data_train.csv", parse_dates = True)


DF['Day'] =  pd.to_datetime(DF['Day'], format='%Y-%m-%d')
DF.set_index("Day", inplace=True)
DF = DF.asfreq("D")
print(DF)


In [None]:
print(DF.shape)


- Compute the number of missing values per series.


In [None]:
# Hint: use isna()



- Replace the missing values with the method of your choice.


In [None]:
# Hint: you can use fillna()



- Select one series (among "series-001", "series-002", ..., "series-111") and plot it.


In [None]:
# Select one series (among "series-001", "series-002", ..., "series-111") and plot it
my_series = "series-003"
DF[my_series].plot()

In [None]:
# Extract calendar variables from dates (useful for seasonal plots)
DF["d"] = DF.index.day.to_numpy()
DF["m"] = DF.index.month.to_numpy()
DF["y"] = DF.index.year.to_numpy()
DF["w"] = DF.index.weekday.to_numpy()
DF["wy"] = DF.index.isocalendar().week.to_numpy()
DF.head()

- Generate a seasonal plot with the day of the week in the x-axis.

In [None]:
# Seasonal plots (Day of the week)
# Hint: You could generate a data frame with the weekly series and plot it





- Plot a histogram for each day of the week.

- Generate a seasonal plot with the day of the month in the x-axis.

In [None]:
# Seasonal plots (Day of the month)



- Produce lagged scatterplots for lags 1, 3 and 7. What do you observe? Add the diagonal for a better visualization.

In [None]:
# Lag plots




## Autocorrelation
* Plot the autocorrelation function (ACF) for the first 20 lags, and interpret the results. 



* Recompute the ACF after applying a seasonal difference. 


In [None]:
# Hint: use np.diff


- Compute and print the ACF values for the first 20 lags.

In [None]:
from statsmodels.graphics.tsaplots import acf
# Hint: use acf


* Perform a Ljung-Box test for the series.

## Transformations

* Apply a Box-Cox transformation with $\lambda = 0.5$, $\lambda = 0.3$ and $\lambda = 0$. Plot the transformed series.

- Find the best value of $\lambda$ in the Box-Cox transformation, and plot the transformed series.

In [None]:
from scipy.stats import boxcox
eps = 0.0001
x = DF[my_series] + eps
# Hint: use boxcox on the x variable.


## Time series decomposition 


* Decompose the time series into trend, seasonal and remainder components. Plot the different component. Does it help you to understand the data?

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
# Hint: use seasonal_decompose



* Use the STL decomposition algorithm to decompose the series into trend, seasonal and remainder components.

In [None]:
from statsmodels.tsa.seasonal import STL
# Hint: Use STL with period = 7







- Plot the deseasonalized series, i.e. $z_t = y_t - s_t$.

In [None]:
# Deaseasonlized data





## Forecasting

Split the time series in a training and a test set where the test set is composed of the last 21 observations.

In [None]:
series = DF[my_series]

n_test = 21
series_train = series[:-n_test]
series_test = series[-n_test:]

plt.plot(series_train)
plt.plot(series_test)

* Compute the in-sample one-step ahead predictions for simple forecasting methods (mean, naive, and sesonal naive).

In [None]:
## Mean forecasts



In [None]:

## Naive forecasts



In [None]:

## Seasonal naive forecasts
# For the first week, you can compute a naive forecast (non-seasonal)



* Plot a histogram of residuals for the three mthods.

In [None]:
# 


* Compute the bias for each method. Which method has a higher bias?


In [None]:
#


* Compute the mean squared error (MSE).


In [None]:
#


* Plot the ACF for the first 20 lags. Which method has a better fit and why?


In [None]:
# Plot the ACF for the first 20 lags. Which method has a better fit and why?


* For each method, plot the predictions and the true in-sample values.


* Implement a new forecasting method which computes the forecast for $y_{t}$ by taking the average of $y_{t-1}, y_{t-7}, y_{t-14}$.
For the first two weeks, you can use the seasonal naive forecasts.


In [None]:
# 




Compare this new method with the seasonal naive method (e.g. histogram of residuals and ACF plot).

In [None]:
#


* Compute $21$-step ahead out-of-sample forecasts for the different methods.
* Plot the forecasts and the true values.

In [None]:
# Out-of-sample forecasts

period = 7
T = len(series_train)
HORIZON = n_test

## Mean
meanf = series_train.mean()
f_mean = pd.DataFrame([meanf for h in range(0, HORIZON) ], index = series_test.index)

## Naive


## Seasonal naive


