# Time Series Forecasting Techniques

Imagine you are given asked to analyze climate data. Your employer has asked you to create a model to forcast tomorrows tempreture.

* What would have the most explanatory power in tomorrows weather?
    - Todays?

In [None]:
import pandas as pd
import statsmodels.api as sm

# Load the Vega dataset
df = pd.read_csv("https://raw.githubusercontent.com/stanfordnlp/plot-interface/master/public/data/sf-temps.csv")

# Set the date as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)


# Calculate monthly average temperature
monthly_avg_temp = df.resample('M').mean()
monthly_avg_temp = monthly_avg_temp.diff()
monthly_avg_temp = monthly_avg_temp.dropna()
# Display the first few rows of the monthly average temperature
monthly_avg_temp.plot()


df = monthly_avg_temp

In [None]:
from statsmodels.tsa.ar_model import AutoReg

# Ensure the data is appropriate for modeling
df['temp'] = df['temp'].dropna()  # Drop any missing values

# Fit an Autoregressive Model (AR model)
ar_model = AutoReg(df['temp'], lags=3)
ar_result = ar_model.fit()

print(ar_result.summary())

# Autoregressive (AR) Models

## What is an AR Model?
- An **Autoregressive (AR) model** is a type of statistical model used for forecasting in time series data.
- It predicts future behavior based on past behavior.
- The model is termed "autoregressive" because it regresses the variable against itself.

## Formulation of an AR Model
- The AR model is defined as:

$
Y_t = c + x_1 Y_{t-1} + x_2 Y_{t-2} + ... + x_p Y_{t-p} + \epsilon_t$

where,
- $Y_t$ is the value at time \( t \),
- $c$ is a constant,
- $x_1, x_2, ..., x_p$ are the parameters of the model,
- $p$ is the order of the AR model,
- $\epsilon_t$ is white noise.

## Moving Averages

Now imagine you're trying to understand the weather and predict what it will be like tomorrow. One way to do it is to look at how the weather has been changing over the past few days. The Moving Average (MA) model does something similar with data.

In an MA model, instead of looking directly at the data itself (like the temperature each day), we look at the "errors" made in previous predictions. "Error" here means how much we were off in our past predictions. For example, if we predicted yesterday's temperature would be 70°F, but it was actually 72°F, the error is 2°F.

The MA model says that the best prediction for today's temperature is a combination of these past errors. So, it's like saying, "I know I've been off by a few degrees in the past few days, let me use that information to make a better guess today."

In [None]:
### The MA model in action

ma_model = sm.tsa.ARIMA(df['temp'], order=(0, 0, 1))
ma_result = ma_model.fit()

# Display the summary of the MA model
print(ma_result.summary())

# Moving Average (MA) Models

## What is an MA Model?
- A **Moving Average (MA) model** is a statistical approach used for forecasting time series data.
- It predicts future values based on the errors (differences between actual and predicted values) of past predictions.

## How Does It Work?
- Instead of using past values of the series directly, the MA model uses the past forecast errors.
- A forecast error is the difference between the actual value and the forecasted value at a previous time point.

## Model Formulation
- The MA model is defined as:

$Y_t = \mu + ε_t + θ_1 ε_{t-1} + θ_2 ε_{t-2} + ... + θ_q ε_{t-q}$


where,
- $Y_t$ is the value at time $t$,
- $\mu$ is the mean of the series,
- $ε_t$ is the forecast error at time `t`,
- $θ_1, θ_2, ..., θ_q$ are the parameters of the model,
- $q$ is the order of the MA model.

In [None]:

# Load the Vega dataset
df = pd.read_csv("https://raw.githubusercontent.com/stanfordnlp/plot-interface/master/public/data/co2-concentration.csv")

# Set the date as the index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Calculate monthly average temperature
monthly_avg_co = df.resample('M').mean()

# Display the first few rows of the monthly average temperature
monthly_avg_co = monthly_avg_co.diff()


In [None]:
monthly_avg_co.plot()

In [None]:
### The MA model in action

ma_model = sm.tsa.ARIMA(monthly_avg_co['CO2'], order=(1, 0, 1))
ma_result = ma_model.fit()

# Display the summary of the MA model
print(ma_result.summary())

In [None]:
### The MA model in action

ma_model = sm.tsa.ARIMA(monthly_avg_co['CO2'], order=(2, 0, 1))
ma_result = ma_model.fit()

# Display the summary of the MA model
print(ma_result.summary())

## Autoregressive Integrated Moving Average (ARIMA) models

Now we can talk both models, and combine them. Meaning, we can model both the error term and the moving average.

In [None]:
import pandas as pd

# Load the Vega dataset
df = pd.read_csv("https://raw.githubusercontent.com/stanfordnlp/plot-interface/master/public/data/co2-concentration.csv")

# Set the date as the index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Calculate monthly average temperature
monthly_avg_co = df.resample('M').mean()

# Display the first few rows of the monthly average temperature
monthly_avg_co = monthly_avg_co.diff()


In [None]:
import statsmodels.api as sm


ma_model = sm.tsa.ARIMA(monthly_avg_co['CO2'], order=(1, 0, 1))
ma_result = ma_model.fit()

# Display the summary of the MA model
print(ma_result.summary())

# Seasonal ARIMA (SARIMA)

In [None]:
import pandas as pd
import statsmodels.api as sm

# Load the Vega dataset
df = pd.read_csv("https://raw.githubusercontent.com/stanfordnlp/plot-interface/master/public/data/co2-concentration.csv")

# Set the date as the index and convert to datetime
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Resample to monthly average and take the first difference to make it stationary
monthly_avg_co = df.resample('M').mean()
monthly_avg_co_diff = monthly_avg_co.diff().dropna()

In [None]:
df.plot()

In [None]:

# Define the SARIMA model - adjust p, d, q and seasonal_order parameters as needed
# Example parameters: p=1, d=1, q=1 for the non-seasonal part; 
# P=1, D=1, Q=1, s=12 for the seasonal part (assuming yearly seasonality)
model = sm.tsa.SARIMAX(monthly_avg_co_diff, 
                       order=(2, 1, 1), 
                       seasonal_order=(1, 1, 1, 12))

# Fit the model
sarima_result = model.fit()

# Display the summary of the model
print(sarima_result.summary())
