### Time Series Workshop 
# 1. Introduction &#x1F60E;

In [None]:
%config InlineBackend.figure_format='retina'
%load_ext autoreload
%autoreload 2

from pathlib import Path

import matplotlib.pyplot as plt

from timeseries.data import load_sunspots

DATA_DIR = Path("..") / Path("data")

## What are time series?
* Time series are data points indexed in time order.
* Time series data are a collection of observations obtained through repeated measurements over time.

## Example: Sunspot Data
* Monthly counts of sunspots from mit 18th century to present
* Univariate time series
* Strong periodicity of 11 year cycle (actually 22y)
* No dominant trend

In [None]:
df = load_sunspots(DATA_DIR / Path("sunspots.csv"))

_, ax = plt.subplots(1, 1, figsize=(12, 3))
_ = df.plot(ax=ax)
df.head()

Source: [wikipedia.org/Sunspot](https://en.wikipedia.org/wiki/Sunspot)

<img src="../images/sunspot.gif" width="100">



## What is forecasting?
* Predicting future values of a time series through values and events in the past and present, e.g.

## Forecasting vs. supervised machine learning
#### Supervised learning &#x1F440; 
* We know the values of predictor variables &#x2705; 
* We assume that future data looks the same as past data &#x2705; 
#### Forecasting &#x1F4C8; 
* We often don't know the values of predictor variables &#x274C;
* Sometimes we don't even have predictors &#x274C;
* Time series are dynamic: distributions change! &#x274C;



## Forecasting modelling approach
```mermaid
graph TD
Z(Forecasting) --> A(Specialised<br/>models)
Z(Forecasting) --> B(Off-the-shelf<br/>algorithms)
A --> C(Exponential smoothing,<br/>ARIMA)
A --> D(Prophet)
A --> E(Recursive NNs)
A --> EE(...)
B --> F(Linear regression)
B --> G(Decision trees)
B --> H(...)
```

In [None]:
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.naive import NaiveForecaster

SPLIT_DATE = "1966-01-01"

df_train = df[df.index < SPLIT_DATE]
df_test = df[df.index >= SPLIT_DATE]

model = NaiveForecaster(strategy="mean", window_length=12 * 11 * 2, sp=12 * 11)
model.fit(df_train)

df_pred = model.predict(ForecastingHorizon(df_test.index, is_relative=False))

_, ax = plt.subplots(1, 1, figsize=(12, 3))
_ = df_train.plot(ax=ax)
_ = df_test.plot(ax=ax)
_ = df_pred.plot(ax=ax)
_ = plt.legend(["train", "test", "pred"])

## Before we get started: A warning about Jupyter &#x1F974; 

Throughout the workshop, we'll work with Jupyter notebooks. These are the de-facto standard for explorative work in data science

They are also highly controversial. Why's that?

One simple reason: 

- Code cells can be executed in any order. This can lead to a lot of confusion when jumping around the noteboook. 

So, one thing to look out for first:
 
- Always make your notebook cells **idempotent**! 
- No matter how many times you execute them, you achieve the same result!

You'll save yourselves a lot of headaches! 