# Appendix to Main Sections 4 and 5

## Selecting GARCH(1,1) and GJR-GARCH(1,1) Models for our analysis

### Selecting GARCH models:

* According to academic practice it is uncommon to estimate the GARCH model using a train/fvalidation/test split. Indeed, it is usually done using the full in-sample data (train + validation), this is because the parameters of GARCH are determined optimally through maximum likelihood. 
* Indeed, the GARCH model has economically interpretable parameters ($\alpha$ the ARCH term, $\beta$ the GARCH term and $\omega$ the long-run variance) and we find the parameters which maximise the likelihood of seeing our data:
    * GARCH parameters maximise the probability of observing our historical data
* Thus, using the QLIKE or MSE to select which model would perform best would lead to overfitting, this is because models with greater complexity are more likely to fit our data better and QLIKE and MSE do not entirely penalise for this complexity
* On the other hand, metrics such as AIC (Akaike Information Criterion $\mathrm{AIC} = 2k - 2\ln\bigl(\hat{L}\bigr)$) or the BIC (Bayesian Information Criterion $\mathrm{BIC} = -2 \ln(\widehat{L}) + k \ln(n)$) which both penalise for complexity are more appropriate.
* Indeed, as per p.41-43 [Ruey S. Tsay - "Analysis of Financial Time Series" (2nd Edition, 2005)](https://pzs.dstu.dp.ua/DataMining/times/bibl/Tsay.pdf) AIC and BIC are commonly used measures to select the lags in autoregressive processes - thus, we use both measures
* More details on information criteria for model selection [MathWorks Information Criteria for Model Selection Documentation](https://www.mathworks.com/help/econ/information-criteria.html)

In [None]:
import arch
import numpy as np
import pandas as pd
import datetime
import yfinance as yf

In [2]:
Ticker = "MSFT"
# date format is "YYYY-MM-DD"
ins_start = "2015-01-01"
ins_end = "2019-12-31"
validation_start = "2020-01-01"
validation_end = "2021-12-31"
oos_start = '2022-01-01'
oos_end = '2025-11-01'

In [3]:
# Define a function to convert price data to continuously compounded daily returns

def ccr(Ticker, Start ,End):
    """
    Gets stock price data from Yahoo Finance and outputs continuously compounded daily returns.

    Parameters
    ----------
    Ticker : str
        The ticker symbol of the stock to analyze (e.g., 'AZN.L' for AstraZeneca plc).
    Start : str
        The start date for fetching data in 'YYYY-MM-DD' format.
    End : str
        The end date for fetching data in 'YYYY-MM-DD' format.
    
    Returns
    -------
    StockReturns : pandas series
        An array of continuously compounded daily returns.

    Notes
    -----
    A straightforward method for calculating continuously compounded returns

    Examples
    --------
    >>> ccr("AZN.L", "2020-01-01", "2023-12-31")
    """
    # Ensure Start and End are in datetime format
    Start = pd.to_datetime(Start) - pd.Timedelta(days=1)
    End = pd.to_datetime(End)

    # Fetch data from Yahoo Finance using yfinance
    Data = yf.download(Ticker, start=Start, end=End, auto_adjust=True, progress=False)['Close']
    
    # Calculate daily returns
    StockReturns = np.log(Data / Data.shift(1)).dropna()
    
    StockReturns = pd.DataFrame(StockReturns)
    
    StockReturns = StockReturns.set_axis(['Adj Close'], axis = 1)
        
    return StockReturns

In [4]:
from arch import arch_model

def garch_arch_library(returns, p=1, q=1, o=0, dist='normal'):
    """
    Estimate GARCH using arch library.
    """
    # Set up model
    if o == 0:
        model = arch_model(returns, vol='Garch', p=p, q=q, dist=dist)
    else:
        model = arch_model(returns, vol='Garch', p=p, o=o, q=q, dist=dist)
    
    # Fit
    fitted = model.fit(disp='off')
    
    # Extract parameters
    params = fitted.params
    
    # Print
    print(f"\n{'='*60}")
    print(f"ARCH Library: {'GJR-' if o>0 else ''}GARCH({p},{q}) with {dist}")
    print(f"{'='*60}")
    print(f"μ:  {params['mu']:.6f}")
    print(f"ω:  {params['omega']:.8f}")
    
    # Handle multiple alpha/beta for higher order models
    alpha_keys = [k for k in params.index if k.startswith('alpha')]
    beta_keys = [k for k in params.index if k.startswith('beta')]
    
    for key in alpha_keys:
        print(f"{key}:  {params[key]:.6f}")
    for key in beta_keys:
        print(f"{key}:  {params[key]:.6f}")
    
    if o > 0:
        gamma_keys = [k for k in params.index if k.startswith('gamma')]
        for key in gamma_keys:
            print(f"{key}:  {params[key]:.6f}")
    
    if dist == 't':
        print(f"ν:  {params['nu']:.2f}")
    
    # Calculate persistence
    persistence = sum([params[k] for k in alpha_keys]) + sum([params[k] for k in beta_keys])
    print(f"\nPersistence: {persistence:.4f}")
    print(f"AIC: {fitted.aic:.2f}")
    print(f"BIC: {fitted.bic:.2f}")
    print(f"{'='*60}\n")
    
    # Return fitted model object for forecasting
    return {
        'params': params,
        'aic': fitted.aic,
        'bic': fitted.bic,
        'variance': fitted.conditional_volatility**2,
        'residuals': fitted.resid,
        'model': fitted,  # Keep the fitted model for forecasting
        'long_run_var': fitted.params['omega'] / (1 - persistence)
    }
    
lags1 = []  # p (GARCH lags)
lags2 = []  # q (ARCH lags)
lags3 = []  # o (asymmetry lags)

for p in range(1, 6):
    for q in range(1, 6):
        for o in range(0,6):  # Standard GARCH
            lags1.append(p)
            lags2.append(q)
            lags3.append(o)  # Add asymmetric term


# Load returns ONCE (already rescaled × 100 in your code)
returns = ccr(Ticker, ins_start, validation_end) * 100

results = []

for idx, (p_val, q_val, o_val) in enumerate(zip(lags1, lags2, lags3), 1):
    try:
        model_name = f"GARCH({p_val},{q_val})" if o_val == 0 else f"GJR-GARCH({p_val},{o_val},{q_val})"
        
        print(f"\n{'#'*70}")
        print(f"MODEL {idx}/12: {model_name}")
        print(f"{'#'*70}")
        
        # arch library (use returns/100 since arch doesn't expect pre-scaled)
        arch_res = garch_arch_library(returns, p=p_val, q=q_val, o=o_val, dist='normal')
        
        # Store comparison
        results.append({
            'model': model_name,
            'p': p_val, 'q': q_val, 'o': o_val,
            'arch_aic': arch_res['aic'],
            'arch_bic': arch_res['bic']
        })
        
        print(f"\nError Metrics:")
        print(f" Arch AIC: {arch_res['aic']:.2f}")
        print(f" Arch BIC: {arch_res['bic']:.2f}")
        
    except Exception as e:
        print(f"\nERROR for {model_name}: {e}")
        print(f"Skipping this combination...\n")
        continue

# Summary
print("\n" + "="*80)
print("SUMMARY TABLE")
print("="*80)
if results:
    summary = pd.DataFrame(results)
    print(summary.to_string(index=False))
    
    best_arch = summary.loc[summary['arch_aic'].idxmin()]
    
    best_arch_bic = summary.loc[summary['arch_bic'].idxmin()]
    
    best_2_arch = summary.loc[summary['arch_aic'].nsmallest(2).index[-1]]
    
    best_2_arch_bic = summary.loc[summary['arch_bic'].nsmallest(2).index[-1]]
    
    print(f"Best model (arch):      {best_arch['model']} (AIC={best_arch['arch_aic']:.2f})")
    print(f"Best model (arch):      {best_arch_bic['model']} (BIC={best_arch_bic['arch_bic']:.2f})")
    
    print(f"2nd Best model (arch):      {best_2_arch['model']} (AIC={best_2_arch['arch_aic']:.2f})")
    print(f"2nd Best model (arch):      {best_2_arch_bic['model']} (BIC={best_2_arch_bic['arch_bic']:.2f})")
else:
    print("No successful estimations!")


######################################################################
MODEL 1/12: GARCH(1,1)
######################################################################

ARCH Library: GARCH(1,1) with normal
μ:  0.139255
ω:  0.26398567
alpha[1]:  0.212369
beta[1]:  0.702124

Persistence: 0.9145
AIC: 6366.19
BIC: 6388.08


Error Metrics:
 Arch AIC: 6366.19
 Arch BIC: 6388.08

######################################################################
MODEL 2/12: GJR-GARCH(1,1,1)
######################################################################

ARCH Library: GJR-GARCH(1,1) with normal
μ:  0.111237
ω:  0.27228562
alpha[1]:  0.119266
beta[1]:  0.710911
gamma[1]:  0.154020

Persistence: 0.8302
AIC: 6358.02
BIC: 6385.39


Error Metrics:
 Arch AIC: 6358.02
 Arch BIC: 6385.39

######################################################################
MODEL 3/12: GJR-GARCH(1,2,1)
######################################################################

ARCH Library: GJR-GARCH(1,1) with normal
μ:  0.1117

Observations: We select the two best performing GARCH models according to the $\mathrm{BIC}$ which are the:
- GJR-GARCH(1,1,1)
- GARCH(1,1)