## Stochastic Process Basics and Time Series

A stochastic process is a sequence of random variables arranged in order, often indexed by time. Imagine a series of measurements taken one after another, where each measurement is random but follows some underlying probability behavior. We denote the elements of this sequence as $ Y_t $, where $ t $ indicates the time index, going from 1 up to $ T $, which is the length of this sequence (it could be indefinitely long). The whole set $ \{ Y_1, Y_2, \dots, Y_T \} $ represent the process.

- A **time series** is one specific observed sequence (one sample) from this stochastic process.

- The process itself represents a model of how random values evolve over time, with probabilities governing their behavior.

## Stationarity

Stationarity is a critical property in time series analysis. It means the statistical characteristics of the process do not change over time:

- The **mean** (average value) stays constant no matter when you look.
- The **variance** (spread or variability) remains the same.
- The **correlations** between values at different times depend only on the distance (lag) between those times, not on the actual time points.

For example, if you pick any window of time of a certain size anywhere in the series, the behavior inside that window looks statistically the same as any other window of the same size.

- Formally, the probability distribution of $ Y_t $ at any time $ t_1 $ is the same as at any other time $ t_2 $.
- This means marginal distributions (the distribution of individual variables) are identical over time.
- Stationary processes make modeling easier since we assume time-invariant properties.

## Independence

Independence in this context means that each random variable $ Y_t $ in the sequence does not depend on any other $ Y_{t'} $. If all variables are independent, the joint probability distribution equals the product of the probabilities of each variable.

- If a process is independent, past values do not give any information about future values.
- Mathematically, the distribution of $ Y_t $ given all past values is just the distribution of $ Y_t $ alone.
- This makes forecasting difficult because no useful pattern or dependency exists.

## Combining Stationarity and Independence

- A process can be stationary but dependent, meaning its statistical properties remain stable but variables influence each other.
- It can also be independent but non-stationary.
- When the process is both stationary and independent, it is called **IID** (Independent and Identically Distributed).
- A famous example is **Gaussian white noise**, where each $ Y_t $ is independent and follows a normal distribution with zero mean and variance of one. This noise often serves as a building block for more complex time series models.

## Autocorrelation and the Autocorrelation Matrix

To understand relationships between different times in the series, we measure correlations between pairs of variables $ Y_t $ and $ Y_{t+k} $, where $ k $ is called the lag.

- These correlation values form a matrix called the **autocorrelation matrix**, where the diagonal values are 1 (each variable perfectly correlates with itself).
- Values just above and below the diagonal measure correlations between adjacent time points, then further off-diagonals measure correlations between times farther apart.
- In stationary processes, correlations depend only on lag, so all elements on each diagonal line are the same (e.g., all correlations between points 1 step apart are equal).
- This simplification makes the autocorrelation matrix very structured and easier to analyze.

## Autocorrelation Function (ACF)

Instead of analyzing the whole matrix, usually the **Autocorrelation Function (ACF)** is used:

- The ACF plots autocorrelations against the lag as a line or scatter plot.
- The value at lag 0 is always 1 because it compares each point with itself.
- For many processes, the ACF decays as lag increases, meaning more distant points in time become less correlated.
- Some processes may show strong correlations at particular lags, indicating periodic or repeating patterns.

## Computing and Interpreting the ACF from Data

- Given a time series, the mean is first calculated.
- Then **sample autocovariances** are calculated for each lag.
- These are normalized by the zero lag covariance to produce the sample autocorrelations.
- This process provides an empirical way to assess if the time series data shows patterns or behaves like white noise.
- For example, white noise shows no significant autocorrelations beyond lag 0.
- When sample autocorrelations exceed a certain threshold band, it indicates meaningful correlation at that lag.

In practice, you don't have to calculate these manually:

- Python’s `statsmodels` package provides functions like `acf()` and plotting utilities to easily compute and visualize the autocorrelation function.
- This visualization helps recognize properties of the data, such as randomness, periodicity, and stationarity.

## Summary

- A **stochastic process** models time-dependent random variables.
- **Stationarity** implies time-invariant statistics.
- **Independence** implies no influence of past on future.
- **IID** means independent and identical distribution, special case of stationary independent processes.
- The **autocorrelation matrix** and **ACF** reveal temporal dependencies.
- Tools like Python’s `statsmodels` simplify computation and visualization of autocorrelations.

This foundation supports building and understanding forecasting models of time series data.

Sources:

[1](https://halweb.uc3m.es/esp/personal/personas/amalonso/esp/TSAtema3.pdf)
[2](https://www.lem.sssup.it/phd/documents/Lesson3.pdf)
[3](https://fiveable.me/actuarial-mathematics/unit-2/stationary-processes-autocorrelation/study-guide/cXce9wwhXGYrLPyx)
[4](https://fiveable.me/stochastic-processes/unit-3/stationarity-ergodicity/study-guide/fPY0Wg9Yr7IaCU6Z)
[5](https://www.reddit.com/r/statistics/comments/1osw9z/white_noise_vs_iid/)
[6](https://statisticsbyjim.com/time-series/autocorrelation-partial-autocorrelation/)
[7](https://www.dummies.com/article/technology/information-technology/data-science/big-data/autocorrelation-plots-graphical-technique-for-statistical-data-141241/)
[8](https://www.statology.org/autocorrelation-python/)
[9](https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/11mesmet.pdf)
[10](https://dictionary.helmholtz-uq.de/content/independence.html)