# Introduction to Time-Series

## Topics introduced
1. **What is Time-series**
2. **Data-generating process (DGP)**
    - **Components of a time-series**
    - **Decomposition of a Time Series**
    - **Stationary and non-stationary time-series**
4. **What can we forecast**
5. **Forecasting terminology and notation**

## What is a Time-Series

A time series is a sequence of observations taken sequentially in time.
Each data point in the series represents the value of a variable at a specific point in time. The time intervals can be yearly, quarterly, monthly, weekly, daily, hourly, etc. The data can be at any level of aggregation, such as the number of daily visitors to a website, the number of cars produced by a factory each month, or the number of flights taken each year.

### Types of Time Series
There are two types of time series:
- **Regular time-series**: The data points are equally spaced in time. For example, the daily closing value of the S&P 500 index.
- **Irregular time-series**: There are a few times when we don't have regular time intervals between data points. For example, the number of daily visitors to a website.


## Data-generating process (DGP)

The data-generating process (DGP) is a model that explains how the data was generated. In statistics, this underlying process that generated the time series is referred to as the **DGP**. The DGP is the true model that we are trying to estimate. The DGP is usually unknown, and we have to estimate it from the data.

### Components of a time-series

A time series is a combination of four components: **trend, seasonality, cycles and noise**.

**Trend Component**
This represents the long-term progression of the series. It is the underlying trend in the series. It can be increasing or decreasing over time. For example, the number of daily visitors to a website may be increasing over time.

**Seasonal Component**
This represents the seasonal variation in the series. It is the periodic component in the series. It is always of a fixed and known frequency. For example, the number of daily visitors to a website may increase during the weekends and decrease during the weekdays.

**Cyclic Component**
This represents the cyclical variation in the series. It is the non-periodic component in the series. It is of an unknown frequency. For example, the number of daily visitors to a website may increase for a few months and then decrease for a few months.

**Noise Component**
This represents the random variation in the series. It is the unpredictable component in the series. It is also known as the residual component. For example, the number of daily visitors to a website may be different each day due to random factors.


### Decomposition of a Time Series
The process of separating a time series into its components is known as time series decomposition. The additive decomposition model is given by:

$$y_t = T_t + S_t + C_t + E_t$$

where $y_t$ is the observed value at time $t$, $T_t$ is the trend component at time $t$, $S_t$ is the seasonal component at time $t$, $C_t$ is the cyclic component at time $t$, and $E_t$ is the noise component at time $t$.

The multiplicative decomposition model is given by:

$$y_t = T_t \times S_t \times C_t \times E_t$$

where $y_t$ is the observed value at time $t$, $T_t$ is the trend component at time $t$, $S_t$ is the seasonal component at time $t$, $C_t$ is the cyclic component at time $t$, and $E_t$ is the noise component at time $t$.

### Stationary and non-stationary time-series

A time series is said to be stationary if its statistical properties such as mean, variance, autocorrelation, etc. remain constant over time. A stationary time series is easier to model and forecast as compared to a non-stationary time series.

## What can we forecast

The most basic assumption when we forecast a time series is that the future will be like the past. This is known as the **stationarity assumption**. If the future is not like the past, then we cannot forecast the time series. This is because the past data is the only information we have about the future. If the future is not like the past, then we cannot use the past data to forecast the future.

Let's rank the time series from easiest to hardest to forecast:
1. **Deterministic time series**: The future is completely determined by the past. For example, the number of daily visitors to a website is completely determined by the number of daily visitors to the website in the past.

2. **Stochastic time series**: The future is not completely determined by the past. For example, the number of daily visitors to a website is not completely determined by the number of daily visitors to the website in the past. There is some randomness involved.

3. **Chaotic time series**: The future is not completely determined by the past. There is some randomness involved. In addition, the time series is sensitive to initial conditions. For example, the number of daily visitors to a website is not completely determined by the number of daily visitors to the website in the past. There is some randomness involved. In addition, the number of daily visitors to the website is sensitive to initial conditions. For example, if the number of daily visitors to the website is 1000 today, then the number of daily visitors to the website tomorrow will be different if the number of daily visitors to the website today was 1001 instead of 1000.

4. **Non-forecastable time series**: The future is not forecastable. For example, the number of daily visitors to a website is not forecastable.

### Predictability of a time series
Three factors determine the predictability of a time series:
1. **Understanding the DGP**: The more we understand the DGP, the more predictable the time series is. For example, if we know that the number of daily visitors to a website is increasing over time, then we can forecast the number of daily visitors to the website in the future.
2. **Amount of data**: The more data we have, the more predictable the time series is. 
3. **Adequately repeating patterns**: The more adequately repeating patterns we have, the more predictable the time series is.

## Forecasting terminology and notation

- **Forecasting**: Is the prediction of future values of a time series using the known past values of the time series.
- **Multivariate forecasting**: Consist of more than one time-series variable that is not only dependent on it's past values but also has some dependency on the other variables. The aim of multivariate forecasting is to come up with a model that captures the interrelationships between the variables and use it to forecast the future values of the variables.
- **Explanatory forecasting**: In addition to the past values of a time series, we might use some other information to predict the future values of a time series.
- **Backtesting**: Is the process of testing the accuracy of a forecasting model on past data.
- **In-sample and out-sample**: In-sample data is the data that we use to fit the model. Out-sample data is the data that we use to test the model.
- **Exogenous and endogenous variables**: Exogenous variables are the variables that are not dependent on the other variables. Endogenous variables are the variables that are dependent on the other variables.
- **Forecast combination**: Is the process of combining the forecasts from different forecasting models to come up with a better forecast.