
# Welcome to the Time Series Workshop!


## Who Are We?

All data scientists, from different parts of ING

* Joris de Wind (1-1 Analytics)  

* Artur Usov (IAA) 

* Gertjan van den Bos (FAA) 

* Mehmet Kutluay (FAA) 


## Why Are We Doing This?

A lot of data in ING is time-stamped and regularly spaced. For instance: 
* Daily cash withdrawals
* Weekly transacations between corporate accounts 
* Monthly risk-weighted assets figures

However there are not that many people (around us) using time series analysis 

So, we decided to organize this workshop!


## Wait.. wait.. What is a "Time Series"?

A time series is a series of data points that are indexed in time. For instance, the monthly total number of airline passengers:

<img src="figures/airpassengers.png" width="300">

The series itself holds quite a lot of interesting information, like: 
* The number of air passengers has largely increased over time 
* There are cyclical booms and busts

## What isn't "Time Series"

Just because observations are time stamped, or are in relation to time, doesn't make them automatically a time series. Some examples are:

* Duration Analysis $\rightarrow$ the variable of interest is the amount of time it took for something to happen 
 * Mainly used in medical research (e.g. the amount of time it took for patients to die from a certain illness)
 * Another example: amount of time it takes for characters to die in *Game of Thrones*, [see here](https://arxiv.org/ftp/arxiv/papers/1802/1802.04161.pdf)
* Continuous Time Models $\rightarrow$  there are no spaces between the time stamps 
 * We resort to calculus in order to start modelling the data generating process 
   * The delta of a value between today and yesterday becomes the partial/complete differentiation of that value with respect to time 
   * Estimating the parameters of this has it's own advantages and disadvantages. But this is out of the scope for this workshop

To make our definition clearer, we will be focusing on **regularly spaced** and **discrete** time-stamped data series.

## Time Series (Friendly) Analysis

A gentle first start is to visualize the trend and seasonality we see:

<img src="figures/AirPassengers_decompose.png" width="300">

However, we would like to do some further digging, uncover some insights about these air passengers: 
* Are the booms and busts occurring at the same months every year? 
* The distance between the peaks and troughs are getting wider, is this significant? 
* And if so, what does this tell us about air travel? Is the trend affected by time or not? 

## Time Series (Not-so-Friendly) Analysis

Answering all of these questions requires more than visuals. So you will be exposed to some formulation and math!

Each point $y_t$ in this series can be written as a function of preceding values (up to $p$ periods), shocks, time independent and time dependent parameters: $$y_t = f(y_{t-1}, y_{t-2}, \dots, y_{t-p}, \beta, \theta_t, \epsilon_t) $$

Sometimes we can also add in latent variables to this framework
$$y_t = f(y_{t-1}, y_{t-2}, \dots, y_{t-p}, z_{t-1}, z_{t-2}, \dots, z_{t-p}, \beta, \theta_t, \epsilon_t) $$

We can also add in other time series to this function, expanding the analysis from univariate to multivariate 


## Why not use Logistic Regression (or Other Cool Regressions)?

There are lots of issues in time series data that logistic and other standard regressions are unable to address. The two main ones are:

* Autocorrelation $\rightarrow$ observations are dependent on each other (i.e. today is affected by yesterday)
* Seasonality $\rightarrow$ localized trends may re-occur periodically (e.g. more airline passengers in summer than winter)
* Latent Variables $\rightarrow$ throughout time, the above two properties of a time series may change due to a fundamental, but not perfectly measurable, change in the data generating process (e.g. air travel becoming more widespread)

The natural response can be: 

"But.. but.. if I put the numbers in and add all of these points as features, then my model runs!"

Of course it will run, the computer will run any number you give it. But the inferential value of your output may not be what you expected.


## Time Series Forecasting

After analyzing the time series, we can use the resulting information to do a forecast.

Below we fit an ARIMA model onto the air passengers dataset and use insights from this model to forecast two years ahead.

<img src="figures/airpassengers_forecast.png" width="400">

## What Next?

We are going to be teaching you how to run time series models. In some cases we will be using these models to gather insights about the data, and also to see the (dis)advantages of using said model. In other cases we will be going one step further and doing a forecast.

What are the models?

1. Autoregressive Integrated Moving Average (ARIMA) $\rightarrow$ Gertjan

2. State Space Kalman Filtering $\rightarrow$ Joris

3. Neural Networks $\rightarrow$ Artur

After, we will show the Bayesian approach to statistical inference, applicable to any of these models!



## So it's all Work and No Play?

![some image](./figures/all_work_no_play.jpg)


## All Work and Some Play

| Topic                           | Time          |
|---------------------------------|---------------|
| Opening Remarks                 | 9:10 - 9:30   |
| Introduction                    | 9:30 - 9:45   |
| ARIMA                           | 9:45 - 11:00  |
| Over-Fitting Issues             | 11:00 - 11:45 |  
| Kalman Filters (theory)         | 11:45 - 12:15 |
| Lunch                           | 12:15 - 13:00 | 
| Kalman Filters (exercise)       | 13:00 - 14:00 |
| Neural Networks                 | 14:00 - 15:30 |  
| Bayesian Statistics             | 15:30 - 16:45 |
| Closing and Borrel :)           | 16:45 - 01:30 |

# Veel plezier!