# 3.1 Background

Numerous empirical studies have examined the impact of major global crises on stock market volatility. Analyses of S&P 500 data during the 2008 global financial crisis show that this period was marked by historically high levels of volatility, particularly among financial sector stocks. However, the market did not expect volatility to remain elevated for long, and it did not (Schwert, 2011). Similarly, studies conducted during the COVID-19 pandemic documented unprecedented increases in conditional volatility. The effects were not symmetrical. For instance, the negative impact of deaths was more pronounced than the positive effect of recoveries (Basuony et al., 2022). These findings highlight the relevance of analyzing volatility persistence.

Financial volatility, defined as the conditional standard deviation of an underlying asset’s returns over time (Tsay, 2010, p. 97), has long been a central topic in economic and financial research. Traditional time series models such as ARMA (Autoregressive Moving Average) and ARIMA, popularized by Box and Jenkins (1970), are frequently used for modeling and forecasting financial data. These models effectively capture linear relationships in the mean of a time series but assume homoskedasticity (the variance of the error term is constant). However, their assumption of homoskedasticity limits their effectiveness in accurately modeling volatility clustering, a common feature in financial markets. As a result, this limitation motivated the development of models specifically designed to capture time-varying volatility.

Recognizing the limitations of homoskedasticity assumptions, Engle (1982) introduced the Autoregressive Conditional Heteroskedasticity (ARCH) model. This innovation allowed for the variance of a time series to change over time by using past error terms to predict future variability. Engle’s work marked a significant turning point in financial econometrics by establishing the importance of capturing time-varying volatility in financial data.

Bollerslev (1986) built upon Engle’s ideas by developing the GARCH model, which generalizes the ARCH model by including an autoregressive structure in the variance. This extension makes it possible to capture longer-term effects and more complex patterns in volatility and has proven to be a more robust approach for modeling financial time series. The GARCH model’s ability to handle time-dependent changes in variance has made it a standard tool in empirical financial analysis, especially in risk management and portfolio construction.

An important aspect of the research is understanding how global economic events impact market volatility. Structural breaks are sudden changes in the underlying behavior of financial data that often occur during events like financial crises or pandemics. Methods such as the Chow Test are used to test the effects of breaks on specific dates, allowing researchers to distinguish between different volatility regimes. However, if breaks are not known in advance, the Quandt likelihood ratio (QLR) test will be more suitable (Brooks, 2019, p. 309). Identifying structural breaks is essential to distinguish between different volatility regimes and improve model accuracy.

Moreover, assessing the stationarity of the time series is critical. A time-series is considered strictly stationary if its distribution of values remains constant over time (Brooks, 2019, p. 332). To test the hypothesis that a series is stationary, various statistical tests have been developed, with the Dickey-Fuller test being one of the most widely used methods. This test examines the presence of a unit root, which indicates non-stationarity. However, this standard unit-root test does not perform well if there are several structural breaks in the series (Brooks, 2019, p. 453). 

# 3.2 Method

The general procedure we follow consist of building GARCH models and testing their relative differences. However, we start by performing exploratory analysis to investigate the presence of certian relationships in the log return series, with the aim of gaining an understanding of its behavior.

Below follow the procedure we have used, along with the code and explanations.

In [1]:
import yfinance as yf
import numpy as np
from statsmodels.tsa.stattools import adfuller
from scipy.stats import f
import pandas as pd
import itertools
from statsmodels.tsa.arima.model import ARIMA
import warnings
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_arch
from arch import arch_model
from scipy.stats import chi2
warnings.filterwarnings("ignore")


In the cell below, we extract the closing prices of the GSPC ticker symbol, which corresponds to the S&P 500 index. Then we convert the closing-price series into a log return series. This log return series is our foundation for all further work. We also define the events and their corresponding time periods. The event_windows variable holds the events that we use for our analysis. The "break date" is given as the iddle date for each of the events.

In [12]:
df = yf.Ticker("^GSPC").history(period="max")
df['daily_log_returns'] = np.log(df['Close'] / df['Close'].shift(1))
df = df.dropna()

log_returns = df['daily_log_returns']


event_windows = [
    ("dot_com_bubble",   "1997-01-01", "2000-03-10", "2003-01-01"),
    ("financial_crisis", "2006-01-01", "2008-09-15", "2011-01-01"),
    ("flash_crash",      "2008-05-06", "2010-05-06", "2012-05-06"),
    ("covid_crash",      "2018-03-11", "2020-03-11", "2022-03-11"),
    ("russia_invasion",  "2020-02-24", "2022-02-24", "2024-02-24"),
]

structured_periods = []
for name, start, event_date, end in event_windows:
    period_data = log_returns.loc[start:end]
    structured_periods.append({
        "event": name,
        "start": start,
        "event_date": event_date,
        "end": end,
        "returns": period_data
    })

for p in structured_periods:
    print(f"{p['event']} → {p['start']} to {p['end']} (event at {p['event_date']})")
    print(f"  Observations: {len(p['returns'])}\n")

dot_com_bubble → 1997-01-01 to 2003-01-01 (event at 2000-03-10)
  Observations: 1509

financial_crisis → 2006-01-01 to 2011-01-01 (event at 2008-09-15)
  Observations: 1259

flash_crash → 2008-05-06 to 2012-05-06 (event at 2010-05-06)
  Observations: 1009

covid_crash → 2018-03-11 to 2022-03-11 (event at 2020-03-11)
  Observations: 1009

russia_invasion → 2020-02-24 to 2024-02-24 (event at 2022-02-24)
  Observations: 1008



### Stationarity testing

To validate the usage of time series models like ARMA, we explore whether the return series is stationairy or not. For this, we use the Augmented Dickey Fuller test.

In [13]:
from statsmodels.tsa.stattools import adfuller

result = adfuller(log_returns)  

print("ADF Statistic:", result[0])
print("p-value:", result[1])
print("Used lags:", result[2])
print("Number of observations:", result[3])
print("Critical values:")
for key, value in result[4].items():
    print(f"   {key}: {value}")


ADF Statistic: -22.505443248370327
p-value: 0.0
Used lags: 48
Number of observations: 24377
Critical values:
   1%: -3.4306182852169496
   5%: -2.8616585738095623
   10%: -2.566833113395068


The p-value indicates that the null hypothesis can be rejected. Since the null hypothesis is that the return series has a unit root, we reject this, and the outcome is that the return series can be regarded as stationariy.

### Overall level of variance

To obtain an idea of the overall changes in variance in regard to the specific events, we performed an F-test. The F-test compared the sample variance before an event, with the sample variance after the event. This was done on all the events and corresponding time intervals. As indicated by the test results, the F-test suggests that there is significant change in the variance of each period.

In [3]:
def f_test_variance(x1, x2):
    s1_sq = np.var(x1, ddof=1)
    s2_sq = np.var(x2, ddof=1)
    F_stat = s1_sq / s2_sq
    df1 = len(x1) - 1
    df2 = len(x2) - 1
    # Two-sided p-value
    p_value = 2 * min(f.cdf(F_stat, df1, df2), 1 - f.cdf(F_stat, df1, df2))
    return s1_sq, s2_sq, F_stat, p_value

f_test_results = []
for name, start, event_date, end in event_windows:
    data_full = log_returns.loc[start:end]

    before = data_full.loc[start:event_date].iloc[:-1]  
    after  = data_full.loc[event_date:]

    s1, s2, F_stat, p_val = f_test_variance(before, after)
    f_test_results.append({
        "event": name,
        "start": start,
        "event_date": event_date,
        "end": end,
        "var_before": s1,
        "var_after": s2,
        "F_stat": F_stat,
        "p_value": p_val
    })

f_test_df = pd.DataFrame(f_test_results)
print("=== F-test variance results ===")
print(f_test_df)
print()

=== F-test variance results ===
              event       start  event_date         end  var_before  \
0    dot_com_bubble  1997-01-01  2000-03-10  2003-01-01    0.000146   
1  financial_crisis  2006-01-01  2008-09-15  2011-01-01    0.000101   
2       flash_crash  2008-05-06  2010-05-06  2012-05-06    0.000464   
3       covid_crash  2018-03-11  2020-03-11  2022-03-11    0.000117   
4   russia_invasion  2020-02-24  2022-02-24  2024-02-24    0.000280   

   var_after    F_stat       p_value  
0   0.000215  0.677315  9.206945e-08  
1   0.000419  0.241116  9.501963e-68  
2   0.000166  2.797905  2.220446e-16  
3   0.000249  0.468542  4.473343e-17  
4   0.000143  1.959486  8.215650e-14  



### ARMA, ARCH-LM (Engle)

Given that the return series is stationariy, we want to build GARCH models to investigate the conditional heteroskedasticity. To do this, we attempt to remove dependencies from the residuals. For this, we use an ARMA model to fit the data, and compute residuals. Specifically, we use the ARIMA-python package, but with 0 differencing, so that the behaivor is that of the regular ARMA. There is no need to perform differencing, as we already have a stationary asset log return series. 

The models are fitted by attemping to fit orders of (0,1,2,3). This is repeated for all the events (events are defined above). 

After fitting the models, we investigate the presence of ARCH effects. ARCH effects refer to how the residuals can be correlated to residuals in earlier lags in regards to magnitude (squared etc). To investigate the ARCH effects for the events we have defined, we perform an ARCH-LM test, also known as Engle's ARCH test.

In [4]:

p_values = range(4)
q_values = range(4)

arch_lm_results = []
for name, start, event_date, end in event_windows:
    data = log_returns.loc[start:end]

    best_aic = np.inf
    best_order = None
    best_model = None

    for p, q in itertools.product(p_values, q_values):
        try:
            model = ARIMA(data, order=(p, 0, q)).fit()
            if model.aic < best_aic:
                best_aic = model.aic
                best_order = (p, q)
                best_model = model
        except:
            continue

    if best_model is None:
        arch_lm_results.append({
            "event": name,
            "p": None,
            "q": None,
            "LM_stat": None,
            "p_value": None,
            "ARCH_effects": "Model fitting failed"
        })
        continue

    #ARCH-LM test/Engle's test
    residuals = best_model.resid
    lm_test = het_arch(residuals)
    LM_stat = lm_test[0]
    LM_pvalue = lm_test[1]

    arch_lm_results.append({
        "event": name,
        "p": best_order[0],
        "q": best_order[1],
        "LM_stat": LM_stat,
        "p_value": LM_pvalue,
        "ARCH_effects": "Yes" if LM_pvalue < 0.05 else "No"
    })

arch_lm_df = pd.DataFrame(arch_lm_results)
print("=== ARMA best orders & ARCH-LM test ===")
print(arch_lm_df)
print()


=== ARMA best orders & ARCH-LM test ===
              event  p  q     LM_stat       p_value ARCH_effects
0    dot_com_bubble  0  0  106.888422  2.259278e-18          Yes
1  financial_crisis  3  0  398.177955  2.298807e-79          Yes
2       flash_crash  2  0  293.699580  3.335327e-57          Yes
3       covid_crash  3  2  334.970642  6.140149e-66          Yes
4   russia_invasion  0  3  305.627744  1.003831e-59          Yes



### Chow test for testing of structural breaks


To assess the presence of structural breaks in the dynamics of daily returns surrounding key financial events, we applied a **Chow F-test** using the ARMA models identified during the building of **GARCH** models. We then estimated the models separately on three datasets: the full event window, the pre-event period, and the post-event period. This was done on all periods.

The Chow test evaluates whether fitting separate models to the pre- and post-event samples results in a statistically significant improvement in fit compared to a single model applied to the full period. The test statistic is computed as:

$$
F = \frac{ \left( SSR_{\text{full}} - (SSR_{\text{before}} + SSR_{\text{after}}) \right) (n_1 + n_2) }{ (SSR_{\text{before}} + SSR_{\text{after}}) \cdot k }
$$

**Where:**
- $ SSR_{\text{full}}, SSR_{\text{before}}, SSR_{\text{after}} $ are the sum of squared residuals from the full, pre-, and post-event models, respectively.
- $ k $ is the number of estimated parameters in the ARMA model (including the intercept).
- $ n_1 $ and $ n_2 $ are the number of observations in the pre- and post-event samples.

The **null hypothesis** is that the same ARMA process governs both sub-periods.  
A **significant F-statistic** indicates that the model parameters differ before and after the event, suggesting a structural break in the return dynamics.


In [18]:

def perform_chow_test_from_known_order(full_data, before_data, after_data, p, q):
    try:
        model_full = ARIMA(full_data, order=(p, 0, q)).fit()
        model_before = ARIMA(before_data, order=(p, 0, q)).fit()
        model_after = ARIMA(after_data, order=(p, 0, q)).fit()
        
        SSR_p = np.sum(model_full.resid ** 2)
        SSR_1 = np.sum(model_before.resid ** 2)
        SSR_2 = np.sum(model_after.resid ** 2)
        print(f"[{event}] SSR full: {SSR_p:.4f}, before: {SSR_1:.4f}, after: {SSR_2:.4f}")
        print(f"Lengths → full: {len(full_data)}, before: {len(before_data)}, after: {len(after_data)}\n")


        n1 = len(before_data)
        n2 = len(after_data)
        k = p + q + 1  # intercept + AR + MA

        numerator = max(0, (SSR_p - (SSR_1 + SSR_2))) / k
        denominator = (SSR_1 + SSR_2) / (n1 + n2 - 2 * k)
        F_stat = numerator / denominator
        p_value = 1 - f.cdf(F_stat, k, (n1 + n2 - 2 * k))

        return {
            "F_stat": F_stat,
            "p_value": p_value,
            "significant": "Yes" if p_value < 0.05 else "No",
            "model_order": (p, q)
        }

    except Exception as e:
        return {"error": str(e)}


chow_results = []

for period in structured_periods:
    event = period["event"]
    full_data = period["returns"]
    before_data = full_data.loc[:period["event_date"]]
    after_data = full_data.loc[period["event_date"]:]

    arma_row = arch_lm_df[arch_lm_df['event'] == event].iloc[0]
    p, q = int(arma_row['p']), int(arma_row['q'])

    result = perform_chow_test_from_known_order(full_data, before_data, after_data, p, q)
    result["event"] = event
    chow_results.append(result)

chow_df = pd.DataFrame(chow_results)
print(chow_df[["event", "model_order", "F_stat", "p_value", "significant"]])


[dot_com_bubble] SSR full: 0.2696, before: 0.1172, after: 0.1517
Lengths → full: 1509, before: 805, after: 705

[financial_crisis] SSR full: 0.3013, before: 0.0692, after: 0.2337
Lengths → full: 1259, before: 680, after: 580

[flash_crash] SSR full: 0.3095, before: 0.2250, after: 0.0827
Lengths → full: 1009, before: 505, after: 505

[covid_crash] SSR full: 0.1584, before: 0.0591, after: 0.1129
Lengths → full: 1009, before: 504, after: 506

[russia_invasion] SSR full: 0.2021, before: 0.1237, after: 0.0715
Lengths → full: 1008, before: 507, after: 502

              event model_order    F_stat       p_value significant
0    dot_com_bubble      (0, 0)  4.281926  3.868969e-02         Yes
1  financial_crisis      (3, 0)  0.000000  1.000000e+00          No
2       flash_crash      (2, 0)  2.000858  1.122111e-01          No
3       covid_crash      (3, 2)  0.000000  1.000000e+00          No
4   russia_invasion      (0, 3)  8.839006  5.153350e-07         Yes


### GARCH 

We have established the presence of ARCH effects, and as a result there is benefit in fitting GARCH models. To build the GARCH models, we use the ARMA(p,q)/ARIMA(p,0,q) models identified earlier as the "mean equation". 

To explore the structural volatility relationship changes, we build GARCH models. In order to remove dependencies from the residuals, we use ARMA models as the mean equation. For each event window, we identified the optimal ARMA(p, q) model by minimizing the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) across combinations of p and q in the range [0, 3]. We then tested for ARCH effects in the residuals using the ARCH-LM test.

If ARCH effects were detected (p < 0.05), we fit GARCH(p, q) models (with p, q ∈ [1, 3]) to the ARMA residuals. The optimal GARCH specification was selected based on both AIC and BIC criteria.


In [5]:
garch_results = []
for _, row in arch_lm_df.iterrows():
    event = row["event"]
    p_val = row["p"]
    q_val = row["q"]

    # Skip if ARMA order not found
    if pd.isnull(p_val) or pd.isnull(q_val):
        continue

    p_val = int(p_val)
    q_val = int(q_val)

    (__, start, event_date, end) = next(x for x in event_windows if x[0] == event)
    data = log_returns.loc[start:end]

    try:
        arma_model = ARIMA(data, order=(p_val, 0, q_val)).fit()
        residuals = arma_model.resid
    except:
        continue

    best_aic = np.inf
    best_bic = np.inf
    best_aic_model = None
    best_bic_model = None
    best_aic_order = None
    best_bic_order = None

    for gp, gq in itertools.product(range(1, 4), range(1, 4)):
        try:
            garch_mod = arch_model(residuals, vol='Garch', p=gp, q=gq, mean='Zero')
            garch_fit = garch_mod.fit(disp="off")

            if garch_fit.aic < best_aic:
                best_aic = garch_fit.aic
                best_aic_order = (gp, gq)
                best_aic_model = garch_fit

            if garch_fit.bic < best_bic:
                best_bic = garch_fit.bic
                best_bic_order = (gp, gq)
                best_bic_model = garch_fit
        except:
            pass

    garch_results.append({
        "event": event,
        "ARMA(p,q)": (p_val, q_val),
        "GARCH(p,q) AIC": best_aic_order,
        "AIC": best_aic,
        "GARCH(p,q) BIC": best_bic_order,
        "BIC": best_bic,
        "model_AIC": best_aic_model,
        "model_BIC": best_bic_model
    })

garch_summary_df = pd.DataFrame([{
    "event": r["event"],
    "ARMA(p,q)": r["ARMA(p,q)"],
    "Best GARCH(p,q) AIC": r["GARCH(p,q) AIC"],
    "AIC": r["AIC"],
    "Best GARCH(p,q) BIC": r["GARCH(p,q) BIC"],
    "BIC": r["BIC"]
} for r in garch_results])

print("=== GARCH summary results ===")
print(garch_summary_df)
print()


=== GARCH summary results ===
              event ARMA(p,q) Best GARCH(p,q) AIC          AIC  \
0    dot_com_bubble    (0, 0)              (2, 3) -8896.810455   
1  financial_crisis    (3, 0)              (3, 1) -7677.692660   
2       flash_crash    (2, 0)              (3, 3) -5855.022640   
3       covid_crash    (3, 2)              (2, 1) -6475.197944   
4   russia_invasion    (0, 3)              (1, 2) -6199.471489   

  Best GARCH(p,q) BIC          BIC  
0              (1, 1) -8871.086035  
1              (3, 1) -7652.002295  
2              (2, 1) -5828.006177  
3              (1, 1) -6458.179392  
4              (1, 2) -6179.808595  



Positive directional derivative for linesearch
See scipy.optimize.fmin_slsqp for code meaning.

Positive directional derivative for linesearch
See scipy.optimize.fmin_slsqp for code meaning.

Positive directional derivative for linesearch
See scipy.optimize.fmin_slsqp for code meaning.



## Hypothesis Testing for GARCH Differences

### LR test

Given a **GARCH** model for each identified time interval, we then used them in an LR-test to explore differences in volatility structure before and after the events. Specifically, a **GARCH(p, q)** model was fitted on the entire before-and-after sample, serving the role as the restricted model. Then, two sub-models of the same order $(p, q)$ were fitted on their respective time intervals, divided by the event date. These sub-models take on the role as the unrestricted models. 

The LR test statistic takes the form of:

$$
LR = -2 \left( \ell_{\text{restricted}} - \left( \ell_{\text{unrestricted}_1} + \ell_{\text{unrestricted}_2} \right) \right)
$$

with degrees of freedom:

$$
df = (k_1 + k_2) - k_r
$$

However, this test is not strictly speaking formally valid, as a result of the models not being truly nested. As a result, the test distribution under the null is not following a perfect chi-squared distribution. Nonetheless, this procedure can give certain indications of structural breaks.

### Wald test

To enhance the analysis of the **GARCH** models, we use the **Wald test**. To detect structural breaks in volatility dynamics, we apply the Wald test to compare **GARCH(p, q)** model parameters before and after key financial events. Using the same **GARCH** specification (identified from full-period model selection), we estimate separate models on the pre- and post-event samples.

The Wald statistic is computed as:

$$
W = (\theta_1 - \theta_2)^{\prime} \left[ \text{Var}(\theta_1) + \text{Var}(\theta_2) \right]^{-1} (\theta_1 - \theta_2)
$$

Where:
- $\theta_1$ and $\theta_2$ are the parameter vectors from the pre- and post-event models,
- $\text{Var}(\theta_1)$ and $\text{Var}(\theta_2)$ are their respective covariance matrices.

Under the null hypothesis of parameter stability, the test statistic follows a chi-squared distribution with degrees of freedom equal to the number of estimated parameters.



In [6]:
def likelihood_ratio_test(event_name, full_model, arma_order, garch_order, full_data, break_date):
    try:
        before_data = full_data.loc[:break_date]
        after_data = full_data.loc[break_date:]

        arma_before = ARIMA(before_data, order=(arma_order[0], 0, arma_order[1])).fit()
        arma_after = ARIMA(after_data, order=(arma_order[0], 0, arma_order[1])).fit()

        resid_before = arma_before.resid
        resid_after = arma_after.resid

        garch_before = arch_model(resid_before, vol="Garch", p=garch_order[0], q=garch_order[1], mean="Zero").fit(disp="off")
        garch_after = arch_model(resid_after, vol="Garch", p=garch_order[0], q=garch_order[1], mean="Zero").fit(disp="off")

        ll_restricted = full_model.loglikelihood
        ll_unrestricted = garch_before.loglikelihood + garch_after.loglikelihood
        LR_stat = -2 * (ll_restricted - ll_unrestricted)
        df = 3
        p_value = 1 - chi2.cdf(LR_stat, df)

        return {
            "event": event_name,
            "ARMA(p,q)": arma_order,
            "GARCH(p,q)": garch_order,
            "LR_stat": LR_stat,
            "p_value": p_value,
            "significant": "Yes" if p_value < 0.05 else "No",
            "garch_before": garch_before,
            "garch_after": garch_after
        }

    except Exception as e:
        return {
            "event": event_name,
            "error": str(e)
        }


In [16]:
lr_results = []

for g in garch_results:
    event_name = g["event"]
    arma_order = g["ARMA(p,q)"]
    garch_order = g["GARCH(p,q) AIC"]
    full_model = g["model_AIC"]

    period = next(p for p in structured_periods if p["event"] == event_name)
    full_data = period["returns"]
    break_date = period["event_date"]

    result = likelihood_ratio_test(event_name, full_model, arma_order, garch_order, full_data, break_date)
    lr_results.append(result)

lr_df = pd.DataFrame(lr_results)

for result in lr_results:
    print(f"\n================== {result['event'].upper()} ==================")
    
    if 'error' in result:
        print(f"Error: {result['error']}")
        continue

    print(f"Significant difference in GARCH parameters? → {result['significant']}")
    print(f"LR statistic: {result['LR_stat']:.4f}, p-value: {result['p_value']:.4f}")
    
    print("\n--- GARCH Model (Before Event) ---")
    print(result["garch_before"].summary())
    
    print("\n--- GARCH Model (After Event) ---")
    print(result["garch_after"].summary())




Significant difference in GARCH parameters? → Yes
LR statistic: 14.1463, p-value: 0.0027

--- GARCH Model (Before Event) ---
                       Zero Mean - GARCH Model Results                        
Dep. Variable:                   None   R-squared:                       0.000
Mean Model:                 Zero Mean   Adj. R-squared:                  0.001
Vol Model:                      GARCH   Log-Likelihood:                2444.27
Distribution:                  Normal   AIC:                          -4876.54
Method:            Maximum Likelihood   BIC:                          -4848.40
                                        No. Observations:                  805
Date:                Tue, Apr 01 2025   Df Residuals:                      805
Time:                        13:23:34   Df Model:                            0
                              Volatility Model                              
                 coef    std err          t      P>|t|      95.0% Conf. Int.
---------

In [17]:
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from arch import arch_model
from scipy.stats import chi2

def garch_wald_test(garch_before, garch_after):
    b1 = garch_before.params
    b2 = garch_after.params
    V1 = garch_before.param_cov
    V2 = garch_after.param_cov


    diff = b1 - b2
    V = V1 + V2
    W = diff.values.T @ np.linalg.inv(V.values) @ diff.values
    df = len(diff)
    p = 1 - chi2.cdf(W, df)

    return {
        "Wald_stat": W,
        "df": df,
        "p_value": p,
        "significant": p < 0.05,
        "param_diff": diff
    }
wald_results = []

for g in garch_results:
    event_name = g["event"]
    garch_order = g["GARCH(p,q) AIC"]
    arma_order = g["ARMA(p,q)"]

    if garch_order is None or arma_order is None:
        continue

    period = next(p for p in structured_periods if p["event"] == event_name)
    full_data = period["returns"]
    before_data = full_data.loc[:period["event_date"]]
    after_data = full_data.loc[period["event_date"]:]


    try:
        arma_before = ARIMA(before_data, order=(arma_order[0], 0, arma_order[1])).fit()
        arma_after = ARIMA(after_data, order=(arma_order[0], 0, arma_order[1])).fit()

        resid_before = arma_before.resid
        resid_after = arma_after.resid

        garch_before = arch_model(resid_before, vol="Garch", p=garch_order[0], q=garch_order[1], mean="Zero").fit(disp="off")
        garch_after = arch_model(resid_after, vol="Garch", p=garch_order[0], q=garch_order[1], mean="Zero").fit(disp="off")

        wald_result = garch_wald_test(garch_before, garch_after)

        wald_results.append({
            "event": event_name,
            "ARMA(p,q)": arma_order,
            "GARCH(p,q)": garch_order,
            **wald_result
        })

    except Exception as e:
        wald_results.append({
            "event": event_name,
            "error": str(e)
        })

wald_df = pd.DataFrame(wald_results)

for result in wald_results:
    print(f"\n================== {result['event'].upper()} ==================")

    if 'error' in result:
        print(f"Error: {result['error']}")
        continue

    print(f"Significant difference in GARCH parameters? → {'Yes' if result['significant'] else 'No'}")
    print(f"Wald statistic: {result['Wald_stat']:.4f}, p-value: {result['p_value']}")
    print("\nParameter Differences:")
    print(result["param_diff"])



Significant difference in GARCH parameters? → Yes
Wald statistic: 32.5231, p-value: 1.295036102999525e-05

Parameter Differences:
omega      -0.000006
alpha[1]   -0.009218
alpha[2]    0.014101
beta[1]    -0.027344
beta[2]    -0.027344
beta[3]     0.064028
Name: params, dtype: float64

Significant difference in GARCH parameters? → Yes
Wald statistic: 181415896.5412, p-value: 0.0

Parameter Differences:
omega      -0.000006
alpha[1]   -0.034285
alpha[2]   -0.034285
alpha[3]   -0.034285
beta[1]     0.102335
Name: params, dtype: float64

Significant difference in GARCH parameters? → Yes
Wald statistic: 9415088.8297, p-value: 0.0

Parameter Differences:
omega       0.000002
alpha[1]    0.066667
alpha[2]   -0.147619
alpha[3]    0.018105
beta[1]     0.260000
beta[2]    -0.094245
beta[3]    -0.068952
Name: params, dtype: float64

Significant difference in GARCH parameters? → Yes
Wald statistic: 105076.2831, p-value: 0.0

Parameter Differences:
omega      -7.795405e-07
alpha[1]    1.242027e-01

# 3.3 Results

### 3.3.1 ADF

- **ADF Statistic:** -22.5054  
- **p-value:** 0.0000  
- **Number of Lags Used:** 48  
- **Number of Observations:** 24,377  

The low p-value indicates that the return series is likely stationary. 

### 3.3.2 F-test sample variance
| Event             | F-stat   | p-value        |
|------------------|----------|----------------|
| dot_com_bubble   | 0.677315 | 9.206945e-08   |
| financial_crisis | 0.241116 | 9.501963e-68   |
| flash_crash      | 2.797905 | 2.220446e-16   |
| covid_crash      | 0.468542 | 4.473343e-17   |
| russia_invasion  | 1.959486 | 8.215650e-14   |

Table 1: Low p-values indicates that the overall level of variance is different in sub-periods for each event window.

### 3.3.3 ARCH-LM 

| Event             | p | q | LM_stat     | p_value        | ARCH_effects |
|------------------|---|---|-------------|----------------|---------------|
| dot_com_bubble   | 0 | 0 | 106.888422  | 2.259278e-18   | Yes           |
| financial_crisis | 3 | 0 | 398.177955  | 2.298807e-79   | Yes           |
| flash_crash      | 2 | 0 | 293.699580  | 3.335327e-57   | Yes           |
| covid_crash      | 3 | 2 | 334.970642  | 6.140149e-66   | Yes           |
| russia_invasion  | 0 | 3 | 305.627744  | 1.003831e-59   | Yes           |

Table 2: p-values indicate that there is a clear presence of ARCH effects in the return series.

### 3.3.4 Chow test

| Event             | Model Order | F-stat     | p-value        | Significant |
|------------------|-------------|------------|----------------|-------------|
| dot_com_bubble   | (0, 0)      | 4.281926   | 3.868969e-02   | Yes         |
| financial_crisis | (3, 0)      | 0.000000   | 1.000000e+00   | No          |
| flash_crash      | (2, 0)      | 2.000858   | 1.122111e-01   | No          |
| covid_crash      | (3, 2)      | 0.000000   | 1.000000e+00   | No          |
| russia_invasion  | (0, 3)      | 8.839006   | 5.153350e-07   | Yes         |

Table 3: Chow test indicate mixed results in regards to whether the results are significant or not. However, for the cases of the "financial crisis" and "covid crash", we suspect an issue, like numerical instability or something similar, due to the result of test statistic = 0. This can also be a result of the Chow test not being guaranteed F-distributed under the assumptions of time series analysis, as it typically is used for regular linear regression models where the residuals are assumed iid. 

### 3.3.5 GARCH fit

| Event             | ARMA(p,q) | Best GARCH(p,q) AIC | AIC           | Best GARCH(p,q) BIC | BIC           |
|------------------|-----------|----------------------|---------------|----------------------|---------------|
| dot_com_bubble   | (0, 0)    | (2, 3)               | -8896.810455  | (1, 1)               | -8871.086035  |
| financial_crisis | (3, 0)    | (3, 1)               | -7677.692660  | (3, 1)               | -7652.002295  |
| flash_crash      | (2, 0)    | (3, 3)               | -5855.022640  | (2, 1)               | -5828.006177  |
| covid_crash      | (3, 2)    | (2, 1)               | -6475.197944  | (1, 1)               | -6458.179392  |
| russia_invasion  | (0, 3)    | (1, 2)               | -6199.471489  | (1, 2)               | -6179.808595  |

Table 4: Best fitting GARCH models, based on their repsective ARMA models, according to AIC and BIC informaiton criterion. 

### 3.3.6 LR test
| Event             | p-value  | LR Test Statistic |
|------------------|----------|-------------------|
| Dot com          | 0.0027   | 14.1463           |
| Financial crisis | 0.0148   | 10.4966           |
| Flash crash      | 0.0224   | 9.5873            |
| Covid-19         | 0.0002   | 19.3544           |
| Russia invasion  | 0.0002   | 19.8915           |

Table 5: LR test results on GARCH models. p-values are generally low, indicating rejection of the null hypothesis. The "flash crash" and "financial crisis" are not rejected under the 1% rejection rule, and these results are therefore less strong than the other events.

### 3.3.7 Wald test

| Event             | Wald Statistic     | p-value             |
|------------------|--------------------|---------------------|
| dot_com_bubble   | 32.5231            | 1.295036e-05        |
| financial_crisis | 181415896.5412     | 0.0                 |
| flash_crash      | 9415088.8297       | 0.0                 |
| covid_crash      | 105076.2831        | 0.0                 |
| russia_invasion  | 43024809.2626      | 0.0                 |

Table 6: Wald test results. P-values are strongly saying that the parameters are different for the various sub-periods for all events.
