# Problem 1

For this problem, use the same British Pound log-return data as you did in Problem 5 from HW10. Modify the Garch11Fit function so that it fits parameters $c_1 b_1, a_1$ and $a_1^{-}$of a $T G A R C H(9.45)$ model to the training set. This is the main model used by the NYU Volatility Lab https://vlab.stern.nyu.edu/. As in Problem 5 from HW10, form standard deviations $\sigma_i$ over your validation set using the TGARCH parameters from the training set, and form the time series $y_i /$ sigma $_i$ in the validation period. How much of a difference does TGARCH make (compared to GARCH) in terms of the $y_i / \sigma_i$ variance and kurtosis targets in the validation set?

In [1]:
import pandas as pd
import numpy as np
from arch import arch_model

# Load the dataset
df = pd.read_csv('/Users/Eric/opt/anaconda3/envs/dsm/3C.csv')

# Assuming df is already loaded with DEXUSUK and DATE columns
df['DEXUSUK'] = pd.to_numeric(df['DEXUSUK'], errors='coerce').dropna()
df = df[df['DEXUSUK'] > 0]
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df['log_returns'] = np.log(df['DEXUSUK']).diff()
df = df.dropna()

# Splitting the data
training_set = df.loc['2017-01-01':'2020-12-31', 'log_returns']
validation_set = df.loc['2021-01-01':'2021-12-31', 'log_returns']

# Fit the TGARCH(9,45) model
tgarch_model = arch_model(training_set, p=1, q=1, o=1, power=9.45)
tgarch_result = tgarch_model.fit(update_freq=5, disp='off')

# Forecasting Volatility for the entire period
forecast = tgarch_result.forecast(start=training_set.index[0], horizon=1, reindex=False)
sigmas_full = np.sqrt(forecast.variance.dropna().values[:, 0])

# Select the forecasted volatilities for the validation period
sigmas_validation = sigmas_full[-len(validation_set):]

# Ensure the lengths match
if len(sigmas_validation) != len(validation_set):
    raise ValueError("Mismatch in length of forecasted volatilities and validation set.")

# Normalized Returns
normalized_returns_tgarch = validation_set / sigmas_validation

# Variance and Kurtosis
variance_tgarch = np.var(normalized_returns_tgarch)
kurtosis_tgarch = normalized_returns_tgarch.kurtosis()

# Display Results
print(f'TGARCH Normalized Variance: {variance_tgarch}')
print(f'TGARCH Excess Kurtosis: {kurtosis_tgarch}')

TGARCH Normalized Variance: 1.0505730063656
TGARCH Excess Kurtosis: 0.9279379107744421


### Observation

1. **GARCH Model Results:**
   - Normalized Variance: $0.9011$
   - Excess Kurtosis: $0.6636$

2. **TGARCH Model Results:**
   - Normalized Variance: $1.0506$
   - Excess Kurtosis: $0.9279$

#### Normalized Variance Comparison:
- The GARCH model has a normalized variance of $0.9011$. This is below 1, indicating that the model is overestimating market volatility.
- The TGARCH model's normalized variance is $1.0506$, slightly above 1, suggesting a minor underestimation of market volatility. This model is closer to the ideal value, indicating a better fit for volatility estimation.

#### Excess Kurtosis Comparison:
- The GARCH model shows an excess kurtosis of $0.6636$. This positive value indicates a leptokurtic distribution, meaning the distribution of normalized returns has fatter tails than a normal distribution.
- The TGARCH model, with an excess kurtosis of $0.9279$, also indicates a leptokurtic distribution but suggests even fatter tails than the GARCH model. This implies a higher likelihood of extreme market movements.

#### Interpretation:
- The TGARCH model's ability to account for asymmetry in volatility (different impacts of positive and negative shocks) results in a slightly more accurate estimation of market volatility compared to the GARCH model, as seen in the normalized variance values.
- Both models show a leptokurtic distribution for normalized returns, but the TGARCH model indicates a higher likelihood of extreme price movements (fatter tails) than the GARCH model. This could be due to the TGARCH model's enhanced sensitivity to market downturns, often characterized by higher volatility.

# Problem 2

Let $D, L$, and $V$ be the currency data, learning set, and validation set as in Problem 1 from Class $9 \mathrm{HW}$. Estimate the correlation matrix in $L$ using (a) (equal-weighted) historical data $\left(R_{L, h}\right)$; and (b) EWMA data with a half-life of one year $\left(R_{L, 1}\right)$. Compute the sample correlation matrix in $V\left(R_V\right)$. (a) Using the Box M Test from Chapter 4 (see Class $9 \mathrm{HW}$, Problem 4), which of $R_{L, h}$ and $R_{L, 1}$ has a higher $p$-value with $R_H$ ? (b) Find to within 20 days the optimal $h<500$ days so that $R_{L, h}$ (EWMA correlation matrix from $L$ with halflife $h$ ) has the highest $p$-value with $R_H$. (Yes, that's overfitting!)

In [2]:
import pandas as pd
import numpy as np
import scipy.stats as spst

# Load the data
file_path = '/Users/Eric/opt/anaconda3/envs/dsm/3C.csv'
data = pd.read_csv(file_path)

# Convert 'Date' column to datetime
data['Date'] = pd.to_datetime(data['Date'])

# Split the data into learning and validation sets
last_full_year = data['Date'].dt.year.max()
learning_set = data[data['Date'].dt.year < last_full_year]
validation_set = data[data['Date'].dt.year == last_full_year]

# Function to compute log returns
def compute_log_returns(data):
    return np.log(data / data.shift(1)).dropna()

# Compute log returns for learning and validation sets
log_returns_learning = compute_log_returns(learning_set.iloc[:, 1:])
log_returns_validation = compute_log_returns(validation_set.iloc[:, 1:])

# Calculate correlation matrices
R_L_h = log_returns_learning.corr()
R_V = log_returns_validation.corr()

# EWMA correlation matrix function
def ewma_correlation(log_returns, halflife):
    decay_factor = 1 - np.exp(np.log(0.5) / halflife)
    ewma_cov = log_returns.ewm(alpha=decay_factor).cov()
    last_date = log_returns.index[-1]
    ewma_cov_last = ewma_cov.loc[last_date]
    std_dev = np.sqrt(np.diag(ewma_cov_last))
    corr_matrix = ewma_cov_last.div(std_dev, axis=0).div(std_dev, axis=1)
    return corr_matrix

# Calculate the EWMA correlation matrix for the learning set with a half-life of one year (252 trading days)
R_L_1 = ewma_correlation(log_returns_learning, halflife=252)

# Box M Test function
def BoxM(T1, T2, s1, s2):
    # Ensure dimensions are the same
    if len(s1) != len(s2):
        print("Error: different dimensions in Box M Test:", len(s1), len(s2))
        return (0, 0)
    
    # Matrices are pxp
    p = len(s1)

    # Form the combined matrix
    scomb = (T1 * s1 + T2 * s2) / (T1 + T2)

    # Box M statistic
    Mstat = (T1 + T2 - 2) * np.log(np.linalg.det(scomb)) - \
            (T1 - 1) * np.log(np.linalg.det(s1)) - (T2 - 1) * np.log(np.linalg.det(s2))

    # Calculating multipliers
    A1 = (2 * p ** 2 + 3 * p - 1) / (6 * (p + 1)) * (1 / (T1 - 1) + 1 / (T2 - 1) - 1 / (T1 + T2 - 2))
    A2 = (p - 1) * (p + 2) / 6 * (1 / (T1 - 1) ** 2 + 1 / (T2 - 1) ** 2 - 1 / (T1 + T2 - 2) ** 2)

    # Calculate degrees of freedom and discriminant
    discrim = A2 - A1 ** 2
    df1 = p * (p + 1) / 2

    if discrim <= 0:
        # Use chi-square distribution
        test_value = Mstat * (1 - A1)
        p_value = 1 - spst.chi2.cdf(test_value, df1)
    else:
        # Use F distribution
        df2 = (df1 + 2) / discrim
        b = df1 / (1 - A1 - (df1 / df2))
        test_value = Mstat / b
        p_value = 1 - spst.f.cdf(test_value, df1, df2)

    return (test_value, p_value)

# Perform Box M Test for both R_L_h and R_L_1 against R_V
test_value_h, p_value_h = BoxM(len(learning_set), len(validation_set), R_L_h, R_V)
test_value_1, p_value_1 = BoxM(len(learning_set), len(validation_set), R_L_1, R_V)

# Determine which has a higher p-value
higher_p_value_matrix = "R_L_h" if p_value_h > p_value_1 else "R_L_1"
higher_p_value = max(p_value_h, p_value_1)

print(f"The matrix {higher_p_value_matrix} has a higher p-value of {higher_p_value} when compared with R_V.")

# Function to find optimal half-life
def find_optimal_h(log_returns_learning, log_returns_validation, max_days=500):
    optimal_h = 0
    max_p_value = 0
    for h in range(1, max_days + 1):
        R_L_h_current = ewma_correlation(log_returns_learning, halflife=h)
        _, p_value_current = BoxM(len(learning_set), len(validation_set), R_L_h_current, R_V)
        if p_value_current > max_p_value:
            max_p_value = p_value_current
            optimal_h = h
    return optimal_h, max_p_value

# Find the optimal h
optimal_h, optimal_p_value = find_optimal_h(log_returns_learning, log_returns_validation, max_days=20)

print(f"Optimal Half-life: {optimal_h} days")
print(f"Highest p-value: {optimal_p_value}")

The matrix R_L_1 has a higher p-value of 0.07509653246604775 when compared with R_V.
Optimal Half-life: 20 days
Highest p-value: 0.004134235665970043


  result = func(self.values, **kwargs)
  result = func(self.values, **kwargs)
  result = func(self.values, **kwargs)
  result = func(self.values, **kwargs)


# Problem 3

The joint distribution of default times $\tau_1, \tau_2$ of two bonds is given by

$$
\operatorname{Pr}\left(\tau_1 \leq T_1, \tau_2 \leq T_2\right)=C\left(L\left(T_1\right), L\left(T_2\right)\right)
$$

for times $0 \leq T_1, T_2$ where $C(u, v)=u v(1+3 \rho(1-u)(1-v))$ and $L()$ is the cdf of a standard logLaplace distribution (standard Laplace with pdf $\frac{1}{\sqrt{2}} \exp (-\sqrt{2}|x|)$ on the logarithm of time). What is the Spearman rank correlation between the bonds' default times? (Hint: read the paragraph after equation (10.13).)

To find the Spearman rank correlation between the bonds' default times, we can use the relationship between Spearman's rank correlation and the copula used to model the joint distribution of the default times. The Spearman rank correlation, denoted as $\rho_S$, can be computed from the copula $C(u, v)$ using the following formula:

$$
\rho_S = 12 \int_0^1 \int_0^1 C(u, v) \, du \, dv - 3
$$

Given the copula function $C(u, v) = uv(1 + 3\rho(1 - u)(1 - v))$, we need to integrate this over the unit square $[0,1] \times [0,1]$. 

The Spearman rank correlation between the bonds' default times, given by the copula function $C(u, v) = uv(1 + 3\rho(1 - u)(1 - v))$, is exactly equal to $\rho$, the parameter in the copula function. This result indicates a direct relationship between the Spearman rank correlation and the parameter $\rho$ in the specified copula.

In [3]:
from sympy import symbols, integrate

# Defining the variables
u, v, rho = symbols('u v rho')
C = u * v * (1 + 3 * rho * (1 - u) * (1 - v))

# Computing the double integral of the copula over the unit square [0, 1] x [0, 1]
integral_result = integrate(integrate(C, (u, 0, 1)), (v, 0, 1))

# Calculating the Spearman rank correlation
spearman_rank_correlation = 12 * integral_result - 3
spearman_rank_correlation.simplify()

rho

# Problem 4

Consider the Gaussian copula $G(u, v)=\operatorname{Norm}\left(N^{-1}(u), N^{-1}(v)\right)$. Here $\operatorname{Norm}(x, y)$ is a bivariate normal distribution with mean 0 and covariance matrix $R$, where $R$ has 1 on the diagonal and $\rho$ off the diagonal. $N(x)$ is the standard normal cdf and $N^{-1}(N(x))=x$. Verify (by setting up and computing the appropriate multiple integral) that $G$ is in fact a copula function, i.e. that its marginals are uniform; it is nondecreasing; and it has the appropriate domain and range.

Given the Gaussian copula:

$$ G(u, v) = \operatorname{Norm}\left(N^{-1}(u), N^{-1}(v)\right) $$

where $\operatorname{Norm}$ is the bivariate normal distribution function with mean 0 and covariance matrix $R$ (with 1s on the diagonal and $\rho$ off the diagonal), and $N$ is the standard normal CDF.

### 1. Uniform Marginals

For $G(u, v)$ to have uniform marginals, the integration over one variable must yield the other variable. For the uniform distribution, this means:
$$ \int_0^1 G(u, v) \, dv = u $$
$$ \int_0^1 G(u, v) \, du = v $$

We need to set up and compute these integrals.

For $ u $ marginal:
$$ \int_{-\infty}^{\infty} \frac{1}{2\pi\sqrt{1-\rho^2}} e^{-\frac{x^2 - 2\rho x N^{-1}(v) + (N^{-1}(v))^2}{2(1-\rho^2)}} dx = v $$

For $v$ marginal:
$$ \int_{-\infty}^{\infty} \frac{1}{2\pi\sqrt{1-\rho^2}} e^{-\frac{(N^{-1}(u))^2 - 2\rho N^{-1}(u) y + y^2}{2(1-\rho^2)}} dy = u $$

### 2. Non-decreasing

To verify the non-decreasing property, we need to show that:
$$ \frac{\partial G}{\partial u} \geq 0 $$

$$ \frac{\partial G}{\partial v} \geq 0 $$

This needs to be true for all $(u, v) \in [0, 1] \times [0, 1]$.

This property is satisfied if the copula function is non-decreasing in both arguments. For a Gaussian copula, this property holds because the normal CDF $ N(x) $ is a nondecreasing function, and the covariance matrix $ R $ defines a nondecreasing bivariate normal distribution. Since both are satisfied, it is non-decreasing.

### 3. Appropriate Domain and Range

The domain of a copula function is the unit square $[0, 1] \times [0, 1]$, and the range is the unit interval $[0, 1]$. For $ G(u, v) $, this is true because the inputs $ u $ and $ v $ are transformed by the inverse of the normal CDF, which maps $[0, 1]$ to the entire real line $ (-\infty, +\infty) $, and then through the bivariate normal distribution, which maps back to $[0, 1]$.

In [4]:
from scipy.stats import norm, multivariate_normal
from scipy.integrate import quad
import numpy as np

# Define the covariance matrix R
rho = 0.5  # Correlation coefficient
R = np.array([[1, rho], [rho, 1]])

# Gaussian copula density function
def gaussian_copula_density(u, v, rho):
    x = norm.ppf(u)
    y = norm.ppf(v)
    z = np.array([x, y])
    return multivariate_normal.pdf(z, mean=[0, 0], cov=R)

# Marginal distribution function for u
def marginal_u(u):
    return quad(lambda v: gaussian_copula_density(u, v, rho), 0, 1)[0]

# Marginal distribution function for v
def marginal_v(v):
    return quad(lambda u: gaussian_copula_density(u, v, rho), 0, 1)[0]

# Check if marginals are uniform at a point (e.g., 0.5)
u_check = v_check = 0.5
marginal_u_result = marginal_u(u_check)
marginal_v_result = marginal_v(v_check)

print(f"Marginal distribution of U at {u_check}: {marginal_u_result}")
print(f"Marginal distribution of V at {v_check}: {marginal_v_result}")

Marginal distribution of U at 0.5: 0.12030982838508406
Marginal distribution of V at 0.5: 0.12030982838508406
