# Chapter 3:  AR(1) Time Series Simulation and Analysis

This notebook covers the simulation, analysis, and forecasting of AR(1) time series using Python's `statsmodels` library.

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore")


## Simulate AR(1) Time Series
You will simulate and plot a few AR(1) time series, each with a different parameter, , using the arima_process module in statsmodels. In this exercise, you will look at an AR(1) model with a large positive  and a large negative , but feel free to play around with your own parameters.

There are a few conventions when using the arima_process module that require some explanation. First, these routines were made very generally to handle both AR and MA models. We will cover MA models next, so for now, just ignore the MA part. Second, when inputting the coefficients, you must include the zero-lag coefficient of 1, and the sign of the other coefficients is opposite what we have been using (to be consistent with the time series literature in signal processing). For example, for an AR(1) process with , the array representing the AR parameters would be ar = np.array([1, -0.9])

Import the class ArmaProcess in the arima_process module.
Plot the simulated AR processes:
Let ar1 represent an array of the AR parameters [1, ] as explained above. For now, the MA parameter array, ma1, will contain just the lag-zero coefficient of one.
With parameters ar1 and ma1, create an instance of the class ArmaProcess(ar,ma) called AR_object1.
Simulate 1000 data points from the object you just created, AR_object1, using the method .generate_sample(). Plot the simulated data in a subplot.
Repeat for the other AR parameter.

In [None]:

# Simulate AR(1) with phi = +0.9
plt.figure(figsize=(10,6))
plt.subplot(2,1,1)
ar1 = np.array([1, -0.9])
ma1 = np.array([1])
AR_object1 = ArmaProcess(ar1, ma1)
simulated_data_1 = AR_object1.generate_sample(nsample=1000)
plt.plot(simulated_data_1)
plt.title("AR(1) with phi = 0.9")

# Simulate AR(1) with phi = -0.9
plt.subplot(2,1,2)
ar2 = np.array([1, 0.9])
ma2 = np.array([1])
AR_object2 = ArmaProcess(ar2, ma2)
simulated_data_2 = AR_object2.generate_sample(nsample=1000)
plt.plot(simulated_data_2)
plt.title("AR(1) with phi = -0.9")

plt.tight_layout()
plt.show()


## Compare ACF for Several AR Time Series
The autocorrelation function decays exponentially for an AR time series at a rate of the AR parameter. For example, if the AR parameter, , the first-lag autocorrelation will be 0.9, the second-lag will be
, the third-lag will be
, etc. A smaller AR parameter will have a steeper decay, and for a negative AR parameter, say -0.9, the decay will flip signs, so the first-lag autocorrelation will be -0.9, the second-lag will be
, the third-lag will be
, etc.

The object simulated_data_1 is the simulated time series with an AR parameter of +0.9, simulated_data_2 is for an AR parameter of -0.9, and simulated_data_3 is for an AR parameter of 0.3

Compute the autocorrelation function for each of the three simulated datasets using the plot_acf function with 20 lags (and suppress the confidence intervals by setting alpha=1).


In [None]:

# Simulate AR(1) with phi = 0.3
ar3 = np.array([1, -0.3])
ma3 = np.array([1])
AR_object3 = ArmaProcess(ar3, ma3)
simulated_data_3 = AR_object3.generate_sample(nsample=1000)

# Plot ACFs
fig, axes = plt.subplots(3, 1, figsize=(8, 12))
plot_acf(simulated_data_1, alpha=1, lags=20, ax=axes[0])
axes[0].set_title("ACF for AR(1) with phi = 0.9")
plot_acf(simulated_data_2, alpha=1, lags=20, ax=axes[1])
axes[1].set_title("ACF for AR(1) with phi = -0.9")
plot_acf(simulated_data_3, alpha=1, lags=20, ax=axes[2])
axes[2].set_title("ACF for AR(1) with phi = 0.3")

plt.tight_layout()
plt.show()


## Estimating an AR Model
You will estimate the AR(1) parameter, , of one of the simulated series that you generated in the earlier exercise. Since the parameters are known for a simulated series, it is a good way to understand the estimation routines before applying it to real data.

For simulated_data_1 with a true  of 0.9, you will print out the estimate of . In addition, you will also print out the entire output that is produced when you fit a time series, so you can get an idea of what other tests and summary statistics are available in statsmodels.

Import the class ARMA in the module statsmodels.tsa.arima_model.
Create an instance of the ARMA class called mod using the simulated data simulated_data_1 and the order (p,q) of the model (in this case, for an AR(1)), is order=(1,0).
Fit the model mod using the method .fit() and save it in a results object called res.
Print out the entire summary of results using the .summary() method.
Just print out an estimate of the constant and  using the .params attribute (no parentheses).

To fit the model, first create an instance of the class: mod = ARMA(simulated_data_1, order=(1,0)) and then res = mod.fit(), followed by either res.summary() for a full summary or res.params for just the AR parameter estimates


In [None]:

from statsmodels.tsa.arima.model import ARIMA

# Fit AR(1) model
mod = ARIMA(simulated_data_1, order=(1,0,0))
res = mod.fit()

# Print summary and estimated parameters
print(res.summary())
print("Estimated parameters:", res.params)


## Forecasting with an AR Model
In addition to estimating the parameters of a model that you did in the last exercise, you can also do forecasting, both in-sample and out-of-sample using statsmodels. The in-sample is a forecast of the next data point using the data up to that point, and the out-of-sample forecasts any number of data points in the future. These forecasts can be made using either the predict() method if you want the forecasts in the form of a series of data, or using the plot_predict() method if you want a plot of the forecasted data. You supply the starting point for forecasting and the ending point, which can be any number of data points after the data set ends.

For the simulated series simulated_data_1 with , you will plot in-sample and out-of-sample forecasts.

Import the class ARMA in the module statsmodels.tsa.arima_model
Create an instance of the ARMA class called mod using the simulated data simulated_data_1 and the order (p,q) of the model (in this case, for an AR(1) order=(1,0)
Fit the model mod using the method .fit() and save it in a results object called res
Plot the in-sample and out-of-sample forecasts of the data using the plot_predict() method
Start the forecast 10 data points before the end of the 1000 point series at 990, and end the forecast 10 data points after the end of the series at point 1010


In [None]:

# Forecasting
res.plot_predict(start=990, end=1010)
plt.show()


## Forecasting Interest Rates
You will now use the forecasting techniques you learned in the last exercise and apply it to real data rather than simulated data. You will revisit a dataset from the first chapter: the annual data of 10-year interest rates going back 56 years, which is in a Series called interest_rate_data. Being able to forecast interest rates is of enormous importance, not only for bond investors but also for individuals like new homeowners who must decide between fixed and floating rate mortgages.

You saw in the first chapter that there is some mean reversion in interest rates over long horizons. In other words, when interest rates are high, they tend to drop and when they are low, they tend to rise over time. Currently they are below long-term rates, so they are expected to rise, but an AR model attempts to quantify how much they are expected to rise.

Import the class ARMA in the module statsmodels.tsa.arima_model.
Create an instance of the ARMA class called mod using the annual interest rate data and choosing the order for an AR(1) model.
Fit the model mod using the method .fit() and save it in a results object called res.
Plot the in-sample and out-of-sample forecasts of the data using the .plot_predict() method.
Pass the arguments start=0 to start the in-sample forecast from the beginning, and choose end to be '2022' to forecast several years in the future.
Note that the end argument 2022 must be in quotes here since it represents a date and not an integer position.


In [None]:

# Example: Forecasting interest rates (replace with real data)
interest_rate_data = np.random.randn(56)  # Placeholder data

mod = ARIMA(interest_rate_data, order=(1,0,0))
res = mod.fit()

res.plot_predict(start=0, end=70)
plt.legend(fontsize=8)
plt.show()


## Compare AR Model with Random Walk
Sometimes it is difficult to distinguish between a time series that is slightly mean reverting and a time series that does not mean revert at all, like a random walk. You will compare the ACF for the slightly mean-reverting interest rate series of the last exercise with a simulated random walk with the same number of observations.

You should notice when plotting the autocorrelation of these two series side-by-side that they look very similar.

Import plot_acf function from the statsmodels module
Create two axes for the two subplots
Plot the autocorrelation function for 12 lags of the interest rate series interest_rate_data in the top plot
Plot the autocorrelation function for 12 lags of the interest rate series simulated_data in the bottom plot

In [None]:

# Simulate a random walk
simulated_data = np.cumsum(np.random.randn(len(interest_rate_data)))

fig, axes = plt.subplots(2,1, figsize=(8, 8))
plot_acf(interest_rate_data, alpha=1, lags=12, ax=axes[0])
axes[0].set_title("Interest Rate Data")
plot_acf(simulated_data, alpha=1, lags=12, ax=axes[1])
axes[1].set_title("Simulated Random Walk Data")

plt.tight_layout()
plt.show()


## Estimate Order of Model: PACF
One useful tool to identify the order of an AR model is to look at the Partial Autocorrelation Function (PACF). In this exercise, you will simulate two time series, an AR(1) and an AR(2), and calculate the sample PACF for each. You will notice that for an AR(1), the PACF should have a significant lag-1 value, and roughly zeros after that. And for an AR(2), the sample PACF should have significant lag-1 and lag-2 values, and zeros after that.

Just like you used the plot_acf function in earlier exercises, here you will use a function called plot_pacf in the statsmodels module.

Import the modules for simulating data and for plotting the PACF
Simulate an AR(1) with  (remember that the sign for the AR parameter is reversed)
Plot the PACF for simulated_data_1 using the plot_pacf function
Simulate an AR(2) with
 (again, reverse the signs)
Plot the PACF for simulated_data_2 using the plot_pacf function

The PACF plot is similar to the ACF plot: the first plot is plot_pacf(simulated_data_1, lags=20)


In [None]:

# Simulate AR(1) with phi=0.6
ar = np.array([1, -0.6])
ma = np.array([1])
AR_object = ArmaProcess(ar, ma)
simulated_data_1 = AR_object.generate_sample(nsample=5000)

# Plot PACF for AR(1)
plot_pacf(simulated_data_1, lags=20)
plt.show()

# Simulate AR(2) with phi1=0.6, phi2=0.3
ar = np.array([1, -0.6, -0.3])
AR_object = ArmaProcess(ar, ma)
simulated_data_2 = AR_object.generate_sample(nsample=5000)

# Plot PACF for AR(2)
plot_pacf(simulated_data_2, lags=20)
plt.show()


## Estimate Order of Model: Information Criteria
Another tool to identify the order of a model is to look at the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). These measures compute the goodness of fit with the estimated parameters, but apply a penalty function on the number of parameters in the model. You will take the AR(2) simulated data from the last exercise, saved as simulated_data_2, and compute the BIC as you vary the order, p, in an AR(p) from 0 to 6.

Import the ARMA module for estimating the parameters and computing BIC.
Initialize a numpy array BIC, which we will use to store the BIC for each AR(p) model.
Loop through order p for p = 0,…,6.
For each p, fit the data to an AR model of order p.
For each p, save the value of BIC using the .bic attribute (no parentheses) of res.
Plot BIC as a function of p (for the plot, skip p=0 and plot for p=1,…6).

In [None]:

# Compute BIC for AR(p), p=0 to 6
BIC = np.zeros(7)
for p in range(7):
    mod = ARIMA(simulated_data_2, order=(p,0,0))
    res = mod.fit()
    BIC[p] = res.bic

# Plot BIC
plt.plot(range(1,7), BIC[1:7], marker='o')
plt.xlabel('Order of AR Model')
plt.ylabel('Bayesian Information Criterion')
plt.show()
