# Week 7: Linear Systems Theory & AR Models

**Objective:** Understand the fundamentals of time series analysis, learn the structure of an Autoregressive (AR) model, and use Python to simulate, fit, and forecast with AR models.

## Step 1: Build Intuition

So far, we've looked at processes where the next state depends on the current state (Markov Chains) or where events occur randomly in time (Poisson Process). Now, let's think about data that is collected sequentially over time, like the daily price of a stock, monthly sales figures, or a patient's heart rate measured every second. This is **time series data**.

An **Autoregressive (AR) model** is a simple yet powerful way to forecast future values of a time series. The intuition is straightforward: **the value of the series tomorrow is likely related to its value today, yesterday, and so on.**

Think about predicting tomorrow's temperature. A good guess would be that it will be similar to today's temperature. Maybe you can improve your forecast by also looking at yesterday's temperature. An AR model formalizes this idea by creating a linear relationship between the current value and its past values.

## Step 2: Understand the Core Idea

The name "Autoregressive" tells you everything you need to know:
- **Auto:** Means "self".
- **Regressive:** Refers to linear regression.

So, an autoregressive model is a linear regression of the time series against its own **past values**, which are called **lags**.

Key components:
1.  **The Lag (p):** This is the number of previous time steps we use to predict the current value. An AR(1) model uses only the immediately preceding value (lag 1). An AR(2) model uses the two preceding values (lag 1 and lag 2).
2.  **Coefficients (\(\phi_p\)):** These are the weights that the regression model learns for each lag. They tell us how influential each past value is.
3.  **White Noise (\(\epsilon_t\)):** This is a random error term. It represents the part of the series that cannot be predicted from past values. It's assumed to be a random shock with a mean of 0 and constant variance.

## Step 3: Learn the Definitions and Formulas

**Definition: Autoregressive Model of order p, AR(p)**

A time series \(Y_t\) is described by an AR(p) model if it follows this equation:

$$ Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p} + \epsilon_t $$

Where:
- \(Y_t\) is the value of the series at time \(t\).
- \(c\) is a constant (or intercept).
- \(\phi_1, \phi_2, ..., \phi_p\) are the model parameters (coefficients for the lags).
- \(Y_{t-1}, ..., Y_{t-p}\) are the past values (lags) of the series.
- \(\epsilon_t\) is the white noise error term at time \(t\).

For the process to be **stationary** (meaning its statistical properties like mean and variance don't change over time), the roots of its characteristic equation must lie outside the unit circle. In simpler terms, the coefficients \(\phi\) must be constrained. For an AR(1) model, this means \(|\phi_1| < 1\).

## Step 4: Apply and Practice

Let's use the `statsmodels` library in Python, which is the standard for time series analysis. We will:
1.  Simulate data from a known AR(2) process.
2.  Fit an AR model to this data and see if we can recover the original parameters.
3.  Use the fitted model to make forecasts.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.api import AutoReg, ar_select_order
from statsmodels.tsa.arima_process import ArmaProcess

plt.style.use('seaborn-v0_8-whitegrid')

### Part A: Simulating an AR Process

Let's create an AR(2) process with the equation:
$$ Y_t = 0.6 Y_{t-1} + 0.3 Y_{t-2} + \epsilon_t $$
Here, \(c=0\), \(\phi_1 = 0.6\), and \(\phi_2 = 0.3\).

In [None]:
# The coefficients for the AR process.
# Note: ArmaProcess requires the sign to be flipped for AR coefficients.
# The first element is for the AR(0) term (always 1).
ar_coeffs = [1, -0.6, -0.3]
ma_coeffs = [1] # No moving average component

# Create the AR(2) process
ar_process = ArmaProcess(ar_coeffs, ma_coeffs)

# Generate 200 data points from this process
np.random.seed(42)
data = ar_process.generate_sample(nsample=200)

# Plot the simulated data
plt.figure(figsize=(14, 6))
plt.plot(data)
plt.title('Simulated Data from an AR(2) Process')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.show()

### Part B: Fitting an AR Model to the Data

Now, let's pretend we don't know the true process that generated the data. We will use `statsmodels` to fit an AR model and see what it finds.

In [None]:
# The modern way to fit AR models is with the AutoReg class.
# We specify the data and the number of lags to include.
# Let's fit an AR(2) model since we know that's the true order.
model = AutoReg(data, lags=2)
model_fit = model.fit()

# Print the summary of the model fit
print(model_fit.summary())

**Interpretation of the Summary:**

- **Coefficients:** Look at the `coef` column. The coefficient for `y.L1` (lag 1) is `0.5862`, which is very close to our true \(\phi_1=0.6\). The coefficient for `y.L2` (lag 2) is `0.2809`, very close to our true \(\phi_2=0.3\).
- **P>|z|:** This is the p-value. A small p-value (typically < 0.05) indicates that the coefficient is statistically significant. Here, both lags are significant, which is correct.
- **const:** The intercept is `0.0818`, which is close to our true value of 0.

The model successfully recovered the parameters of the underlying process!

### Part C: Forecasting with the Fitted Model

The most common use of a time series model is to forecast future values. Let's forecast the next 20 time steps.

In [None]:
# The end of our data is at index 199.
# We want to forecast from step 200 to 219.
start_forecast = len(data)
end_forecast = len(data) + 19

# Generate the forecast
forecast = model_fit.predict(start=start_forecast, end=end_forecast)

# Create a pandas series for easier plotting
data_series = pd.Series(data)
forecast_series = pd.Series(forecast, index=range(start_forecast, end_forecast + 1))

# Plot the original data and the forecast
plt.figure(figsize=(14, 7))
plt.plot(data_series, label='Observed Data')
plt.plot(forecast_series, label='Forecast', color='red', linestyle='--')
plt.title('AR(2) Model Forecast')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.show()

**Interpretation of the Forecast:**

The forecast (red dashed line) continues the pattern observed in the data. For a stationary AR process, the long-term forecast will always converge to the mean of the series. The interesting part is the short-term dynamics, which are captured by the model.

## Summary & Next Steps

In this notebook, we've covered the basics of Autoregressive models:
1.  An **AR(p) model** predicts the current value of a time series based on a linear combination of its `p` previous values (lags).
2.  The model's parameters tell us the influence of each lag.
3.  We can use libraries like `statsmodels` to easily fit these models to data.
4.  Once fitted, AR models are a powerful tool for short-term forecasting.

AR models are the foundation for more complex time series models like MA (Moving Average), ARMA, and ARIMA, which you will likely encounter in advanced time series courses.

In **Weeks 8-9**, we will dive into some more advanced theory, introducing **Martingales and Brownian Motion**, which are fundamental to financial mathematics and physics.