Elementary Time Series Models
ðŸ”· Objective:
Apply and visualize elementary time series forecasting models on a sample dataset (AirPassengers or synthetic sales data).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from sklearn.metrics import mean_squared_error

# Sample monthly sales data (you can replace with your own time series)
data = {
    'Month': pd.date_range(start='2022-01-01', periods=24, freq='M'),
    'Sales': [200, 220, 230, 250, 270, 300, 310, 320, 340, 360, 370, 390,
              400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600, 620]
}
df = pd.DataFrame(data)
df.set_index('Month', inplace=True)

# Plot original series
df['Sales'].plot(title='Original Sales Time Series', marker='o')
plt.grid()
plt.show()

# ---------- Naive Forecast ----------
naive_forecast = df['Sales'].shift(1)

# ---------- Moving Average ----------
window = 3
moving_avg_forecast = df['Sales'].rolling(window=window).mean()

# ---------- Simple Exponential Smoothing ----------
model_ses = SimpleExpSmoothing(df['Sales']).fit(smoothing_level=0.2, optimized=False)
ses_forecast = model_ses.fittedvalues

# Plot all models
plt.figure(figsize=(12, 6))
plt.plot(df['Sales'], label='Original')
plt.plot(naive_forecast, label='Naive Forecast', linestyle='--')
plt.plot(moving_avg_forecast, label=f'Moving Avg ({window})', linestyle='-.')
plt.plot(ses_forecast, label='Simple Exp Smoothing', linestyle=':')
plt.title('Elementary Time Series Models')
plt.legend()
plt.grid()
plt.show()

# ---------- Evaluation (last 10 points) ----------
actual = df['Sales'][-10:]
naive_pred = naive_forecast[-10:]
ses_pred = ses_forecast[-10:]
ma_pred = moving_avg_forecast[-10:]

print("Mean Squared Error:")
print("Naive:", mean_squared_error(actual[1:], naive_pred[1:]))
print("Moving Average:", mean_squared_error(actual[2:], ma_pred[2:]))
print("Simple Exponential Smoothing:", mean_squared_error(actual, ses_pred))


Program: Time Series Decomposition
ðŸ”· Objective:
Decompose a time series dataset into trend, seasonal and residual components using additive or multiplicative models.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

# Sample time series data (monthly)
data = {
    'Month': pd.date_range(start='2020-01-01', periods=24, freq='M'),
    'Sales': [250, 270, 300, 310, 330, 360, 400, 420, 410, 390, 380, 370,
              260, 280, 320, 340, 350, 370, 410, 430, 420, 400, 390, 380]
}

df = pd.DataFrame(data)
df.set_index('Month', inplace=True)

# Decompose the time series
result = seasonal_decompose(df['Sales'], model='additive', period=12)

# Plot the components
result.plot()
plt.suptitle("Time Series Decomposition", fontsize=16)
plt.tight_layout()
plt.show()

# Optional: Print components
print("Trend:\n", result.trend.dropna().head())
print("\nSeasonal:\n", result.seasonal.head(12))  # one season
print("\nResidual:\n", result.resid.dropna().head())


Elementary model evaluation techniques for the given data after fitting the models. â€“ BIAS, MAD, MAPE, MSE.Below is a Python program to perform elementary model evaluation techniques for time series forecasting, including:

BIAS (Forecast Bias)

MAD (Mean Absolute Deviation)

MAPE (Mean Absolute Percentage Error)

MSE (Mean Squared Error)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Simulated time series data (actual vs forecasted)
actual = np.array([120, 130, 125, 145, 150, 165, 160, 170, 180, 175])
forecast = np.array([118, 132, 128, 140, 148, 160, 158, 172, 178, 174])

# Convert to DataFrame for ease
df = pd.DataFrame({'Actual': actual, 'Forecast': forecast})

# Compute Errors
df['Error'] = df['Actual'] - df['Forecast']
df['Abs_Error'] = df['Error'].abs()
df['Squared_Error'] = df['Error'] ** 2
df['APE'] = (df['Abs_Error'] / df['Actual']) * 100

# Evaluation Metrics
bias = df['Error'].mean()
mad = df['Abs_Error'].mean()
mse = df['Squared_Error'].mean()
mape = df['APE'].mean()

# Output
print("BIAS (Forecast Bias):", round(bias, 2))
print("MAD (Mean Absolute Deviation):", round(mad, 2))
print("MSE (Mean Squared Error):", round(mse, 2))
print("MAPE (Mean Absolute Percentage Error):", round(mape, 2), "%")

# Optional: Plot actual vs forecast
plt.plot(df['Actual'], label='Actual', marker='o')
plt.plot(df['Forecast'], label='Forecast', marker='x')
plt.title("Actual vs Forecast")
plt.xlabel("Time Index")
plt.ylabel("Value")
plt.legend()
plt.grid()
plt.show()


Python program to check elementary stationarity of a time series using visual inspection and the Augmented Dickey-Fuller (ADF) test, which is a commonly used statistical method in time series analysis

Stationarity is vital in time series modeling as many forecasting algorithms assume a stable mean and variance over time. Visual inspection serves as a quick check while the Augmented Dickey-Fuller test provides statistical confirmation. Ensuring stationarity through transformation improves model accuracy and interpretability, particularly in forecasting tasks that rely on ARIMA or similar models.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

# Generate synthetic time series data (e.g., random walk)
np.random.seed(42)
n = 200
random_walk = np.cumsum(np.random.normal(0, 1, n))

# Convert to pandas Series
ts = pd.Series(random_walk)

# Plot the original time series
plt.figure(figsize=(10, 4))
plt.plot(ts, label='Original Time Series')
plt.title('Time Series Plot')
plt.xlabel('Time')
plt.ylabel('Value')
plt.grid(True)
plt.legend()
plt.show()

# Perform Augmented Dickey-Fuller test
result = adfuller(ts)

# Display results
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:')
for key, value in result[4].items():
    print(f'   {key}: {value}')

# Check stationarity
if result[1] < 0.05:
    print("\nConclusion: The time series is stationary (reject null hypothesis).")
else:
    print("\nConclusion: The time series is non-stationary (fail to reject null hypothesis).")


Python program that applies the Augmented Dickey-Fuller (ADF) test to check stationarity of a time series dataset

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

# Create sample non-stationary time series (random walk)
np.random.seed(42)
n = 100
steps = np.random.normal(loc=0, scale=1, size=n)
data = np.cumsum(steps)  # Random walk = non-stationary
time_series = pd.Series(data)

# Plot the time series
plt.figure(figsize=(10, 5))
plt.plot(time_series)
plt.title("Generated Random Walk (Non-Stationary Series)")
plt.xlabel("Time")
plt.ylabel("Value")
plt.grid(True)
plt.show()

# Apply Augmented Dickey-Fuller Test
result = adfuller(time_series)

# Display the results
print("ADF Statistic:", result[0])
print("p-value:", result[1])
print("Critical Values:")
for key, value in result[4].items():
    print(f"   {key}: {value}")

# Conclusion
if result[1] > 0.05:
    print("Conclusion: The series is likely NON-STATIONARY (p > 0.05)")
else:
    print("Conclusion: The series is likely STATIONARY (p <= 0.05)")


Python program to examine the ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) for a given time series using statsmodels.

**Explanation**
ðŸ”¹ What is ACF?
The Autocorrelation Function (ACF) measures the correlation between a time series and its past values (lags). It captures direct and indirect dependencies.

ðŸ”¹ What is PACF?
The Partial Autocorrelation Function (PACF) shows the direct correlation between the series and its lags, after removing the effect of intermediate lags.

ðŸ”¹ AR Process and Interpretation:
ACF of an AR(p) process decays gradually.

PACF cuts off after lag p (e.g., AR(2) has PACF spikes at lag 1 and 2, then drops).

ðŸ“Š Use Cases
Use Case	Description
ARIMA Modeling	ACF and PACF are essential for selecting the order (p, d, q) in ARIMA.
Seasonal Analysis	Helps identify repeated patterns or seasonality in data.
Forecast Diagnostics	ACF of residuals used to check model fit and independence.
Signal Processing	Understand temporal relationships in engineering signals.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Sample synthetic time series data (or load your own)
np.random.seed(42)
n = 100
time_series = pd.Series(np.random.normal(0, 1, n).cumsum())

# Plot the original time series
plt.figure(figsize=(10, 4))
plt.plot(time_series)
plt.title("Time Series Data")
plt.xlabel("Time")
plt.ylabel("Value")
plt.grid()
plt.tight_layout()
plt.show()

# Plot ACF
plt.figure(figsize=(10, 4))
plot_acf(time_series, lags=20)
plt.title("Autocorrelation Function (ACF)")
plt.tight_layout()
plt.show()

# Plot PACF
plt.figure(figsize=(10, 4))
plot_pacf(time_series, lags=20, method='ywm')
plt.title("Partial Autocorrelation Function (PACF)")
plt.tight_layout()
plt.show()


Python program to fit a time series model for a given univariate time series dataset using ARIMA, which is a standard technique in time series analysis.

Explanation of ARIMA Components:
AR (p): Lag of the series itself.

I (d): Order of differencing needed to make the series stationary.

MA (q): Lag of forecast errors.

âœ… Use Cases:
Stock price forecasting in finance

Sales prediction for retail and e-commerce

Temperature forecasting in meteorology

Demand forecasting in supply chain management





In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from pandas.plotting import autocorrelation_plot

# Step 1: Load the univariate time series data
# For example purposes, we'll use airline passenger data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
series = data['Passengers']

# Step 2: Plot the data
plt.figure(figsize=(10, 4))
plt.plot(series, label="Original Series")
plt.title("Airline Passenger Time Series")
plt.xlabel("Time")
plt.ylabel("Passengers")
plt.grid(True)
plt.legend()
plt.show()

# Step 3: Check for stationarity using Augmented Dickey-Fuller test
result = adfuller(series)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
if result[1] <= 0.05:
    print("The series is stationary")
else:
    print("The series is not stationary (differencing required)")

# Step 4: Difference the data if needed (here once)
diff_series = series.diff().dropna()

# Step 5: Fit ARIMA model (p=2, d=1, q=2 as a generic choice; can be tuned)
model = ARIMA(series, order=(2, 1, 2))
model_fit = model.fit()

# Step 6: Summary of model
print(model_fit.summary())

# Step 7: Forecasting
forecast = model_fit.predict(start=len(series), end=len(series)+11, typ='levels')
print("Forecasted values:\n", forecast)

# Step 8: Plot forecast
plt.figure(figsize=(10, 4))
plt.plot(series, label="Original")
plt.plot(forecast, label="Forecast", linestyle='--')
plt.title("ARIMA Forecast")
plt.xlabel("Time")
plt.ylabel("Passengers")
plt.legend()
plt.grid()
plt.show()


Python program to test the significance of lag coefficients in a time series model (AR model) using the statsmodels library

All lag coefficients (L1, L2, L3) have p < 0.05

This confirms that they are statistically significant

The model effectively captures the autoregressive process

ðŸ›  Use Cases:
Financial Modeling â€“ Stock return prediction using past values

Energy Demand Forecasting â€“ Load forecasting based on previous time steps

Sales Forecasting â€“ Using lagged sales data to forecast future values

Weather Prediction â€“ Forecasting temperature or rainfall using past data

Signal Processing â€“ Modeling autoregressive noise



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.ar_model import AutoReg
import statsmodels.api as sm

# Load dataset (AirPassengers)
from statsmodels.datasets import airpassengers
data = airpassengers.load_pandas().data
data['Month'] = pd.to_datetime(data['month'])
data.set_index('Month', inplace=True)
ts = data['AirPassengers']

# Plot the time series
ts.plot(title="Monthly Air Passengers")
plt.ylabel("Passengers")
plt.show()

# Fit AR model with selected lags
max_lag = 12
model = AutoReg(ts, lags=max_lag, old_names=False)
result = model.fit()

# Summary of model coefficients
print(result.summary())

# Extract p-values of coefficients
print("\nLag Coefficient Significance:")
for lag, pval in zip(result.params.index, result.pvalues):
    significance = "Significant" if pval < 0.05 else "Not Significant"
    print(f"{lag}: p-value = {pval:.4f} â†’ {significance}")


Python program that demonstrates how to apply ARIMA/ARMA model for time series forecasting, including model fitting, evaluation, and visualization using a real-world dataset.

Use Cases of ARIMA/ARMA in Real World
Stock Price Prediction
Forecasting stock closing prices to support investment strategies.

Energy Consumption Forecasting
Predicting power usage in smart grid systems or buildings.

Weather Prediction
Estimating future temperature or rainfall based on historical data.

Retail Sales Forecasting
Helping businesses manage inventory by forecasting future sales.

Economic Indicators
Forecasting GDP, inflation, or unemployment trends for policy decisions.

In [None]:
%pip install pandas matplotlib statsmodels yfinance




In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings("ignore")

# Load dataset
# Using AirPassengers dataset (Monthly Airline Passengers from 1949 to 1960)
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
data = pd.read_csv(url, parse_dates=['Month'], index_col='Month')

# Plot raw data
data.plot(title='Monthly Airline Passengers', figsize=(10, 4))
plt.ylabel('Passengers')
plt.grid()
plt.show()

# Train-test split
train = data[:'1958']
test = data['1959':]

# Fit ARIMA model: order=(p,d,q)
# After analysis, order (2,1,2) is used
model = ARIMA(train, order=(2, 1, 2))
model_fit = model.fit()

# Summary of the model
print(model_fit.summary())

# Forecast on test data
forecast = model_fit.forecast(steps=len(test))
test['Forecast'] = forecast.values

# Plot actual vs predicted
plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label='Train')
plt.plot(test.index, test['Passengers'], label='Actual')
plt.plot(test.index, test['Forecast'], label='Forecast')
plt.legend()
plt.title("ARIMA Forecast vs Actual")
plt.grid()
plt.show()

# Evaluation metrics
mse = mean_squared_error(test['Passengers'], test['Forecast'])
mae = mean_absolute_error(test['Passengers'], test['Forecast'])
rmse = np.sqrt(mse)

print("Model Evaluation Metrics:")
print("Mean Absolute Error (MAE):", round(mae, 2))
print("Mean Squared Error (MSE):", round(mse, 2))
print("Root Mean Squared Error (RMSE):", round(rmse, 2))


Python program to examine the feasibility and implementation of the VAR (Vector Autoregression) model in time series analysis using multivariate time series data

**Explanation**
Two artificial time series are created: GDP Growth and Inflation.

The Augmented Dickey-Fuller (ADF) test checks for stationarity.

If not stationary, the series are differenced.

The VAR model is fitted using the optimal lag determined by information criteria.

Forecasting is done for the next 5 time points, and results are plotted.

ðŸ“Œ **Use Cases of VAR Model**
Macroeconomic Forecasting: Forecast GDP, inflation, and interest rates jointly.

Policy Impact Analysis: Evaluate the influence of monetary policy on different economic indicators.

Energy Market Analysis: Examine interactions between oil prices, gas demand, and stock indices.

Finance: Capture interdependence among stock returns, volatility, and exchange rates.

Marketing Analytics: Explore mutual effects of product pricing and sales across categories.



In [None]:
%pip install statsmodels pandas matplotlib


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller
from statsmodels.tools.eval_measures import rmse

# Load dataset (using macroeconomic indicators dataset)
from statsmodels.datasets import macrodata
data = macrodata.load_pandas().data

# Convert date range to datetime format
index = pd.date_range(start='1959-01-01', periods=len(data), freq='Q')
df = pd.DataFrame(data, columns=['realgdp', 'realcons', 'realinv'], index=index)

# Plot raw data
df.plot(title='Macroeconomic Time Series')
plt.xlabel("Year")
plt.ylabel("Values")
plt.grid()
plt.show()

# Stationarity check using Augmented Dickey-Fuller Test
def adf_test(series):
    result = adfuller(series)
    print(f"ADF Statistic for {series.name}: {result[0]}")
    print(f"p-value: {result[1]}")
    return result[1] > 0.05  # True if non-stationary

non_stationary = [col for col in df.columns if adf_test(df[col])]

# Differencing to achieve stationarity
df_diff = df.diff().dropna()

# Confirm stationarity
print("\nAfter differencing:")
for col in df_diff.columns:
    adf_test(df_diff[col])

# Splitting into train and test sets
n_obs = 10
train, test = df_diff[:-n_obs], df_diff[-n_obs:]

# Fit VAR model
model = VAR(train)
lag_order = model.select_order(maxlags=15)
print("\nSelected Lags by AIC:", lag_order.aic)

model_fitted = model.fit(lag_order.aic)

# Summary
print(model_fitted.summary())

# Forecast
forecast_input = train.values[-model_fitted.k_ar:]
forecast = model_fitted.forecast(y=forecast_input, steps=n_obs)

# Convert forecast to DataFrame
forecast_df = pd.DataFrame(forecast, index=test.index, columns=test.columns)

# Plot forecasts vs actual
for col in df.columns:
    plt.figure(figsize=(8, 4))
    plt.plot(df_diff[col][-n_obs:], label='Actual')
    plt.plot(forecast_df[col], label='Forecast', linestyle='--')
    plt.title(f"{col} Forecast vs Actual")
    plt.legend()
    plt.grid()
    plt.show()

# Calculate RMSE
for col in df.columns:
    error = rmse(test[col], forecast_df[col])
    print(f"RMSE for {col}: {error:.4f}")


Python program that applies the ARCH (Autoregressive Conditional Heteroskedasticity) model in time series analysis using the arch package.
**Explanation**:
A synthetic return series is simulated with ARCH effects.

arch_model(..., vol='ARCH', p=1) fits an ARCH(1) model.

The model estimates time-varying volatility.

The summary shows key metrics like AIC, BIC, and estimated coefficients.

A plot displays the conditional volatility, which changes over time.

**ðŸ“Š Use Cases:**
Financial return modeling

Risk estimation (e.g., Value at Risk)

Detecting volatility clustering in assets

In [None]:
%pip install arch


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from arch import arch_model

# Simulate time series data (daily returns)
np.random.seed(42)
n = 1000
eps = np.random.normal(0, 1, n)
alpha = 0.1
returns = []
for i in range(n):
    if i == 0:
        returns.append(eps[i])
    else:
        var = alpha * returns[i-1]**2
        returns.append(np.random.normal(0, np.sqrt(var + 1e-4)))

returns = pd.Series(returns)

# Plot the returns
plt.figure(figsize=(10, 4))
plt.plot(returns)
plt.title('Simulated Financial Returns')
plt.xlabel('Time')
plt.ylabel('Return')
plt.grid()
plt.show()

# Fit ARCH(1) model
model = arch_model(returns, vol='ARCH', p=1)
results = model.fit(disp='off')

# Print model summary
print(results.summary())

# Plot conditional volatility
plt.figure(figsize=(10, 4))
plt.plot(results.conditional_volatility)
plt.title('Estimated Conditional Volatility (ARCH Model)')
plt.xlabel('Time')
plt.ylabel('Volatility')
plt.grid()
plt.show()
