 ### **Question C | Extreme Value Theory**

>With the dataset provided for TD1 on Natixis prices, first calculate daily returns. You will then analyse
these returns using a specific method in the field of the EVT.

>a – Estimate the GEV parameters for the two tails of the distribution of returns, using the estimator of
Pickands. What can you conclude about the nature of the extreme gains and losses?

In [68]:
import pandas as pd
import numpy as np
from scipy.special import gamma #Will be usefull for mu and sigma calculation

#We load the data 
df = pd.read_csv("Natixis Stock.csv", header=None, sep=r"\s+", names=["Date", "Price"])
#Parsing dates correctly and handling the decimal format
df["Date"]=pd.to_datetime(df["Date"], format="%d/%m/%Y", errors="coerce")
df["Price"]=df["Price"].str.replace(",",".").astype(float)
# Sort by date in chronological order
df = df.sort_values("Date").reset_index(drop=True)

In [69]:
df["return"]=df["Price"].pct_change() #Computes the fractional change from the immediately previous row by default
df=df.iloc[1:] #Del the Nan values of the first row
returns=df['return'].values
df

Unnamed: 0,Date,Price,return
1,2015-01-05,5.424,-0.035047
2,2015-01-06,5.329,-0.017515
3,2015-01-07,5.224,-0.019704
4,2015-01-08,5.453,0.043836
5,2015-01-09,5.340,-0.020723
...,...,...,...
1018,2018-12-21,4.045,-0.001481
1019,2018-12-24,4.010,-0.008653
1020,2018-12-27,3.938,-0.017955
1021,2018-12-28,4.088,0.038090


The Generalized Extreme Value distribution is the limit distribution of the maximum of $n$ i.i.d. random variables, according to the Fisher-Tippett-Gnedenko theorem.

For a sample of maxima, the GEV cumulative distribution function is defined as:

$$G(x) = \exp\left\{-\left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right\}$$

subject to $1 + \xi(x-\mu)/\sigma > 0$, where:

- $\mu$: location parameter
- $\sigma > 0$: scale parameter
- $\xi$: shape parameter

As it is state before, the distribution of the maximum of $n$ i.i.d. random variables converges to a GEV distribution. Thus, the GEV distribution is not an individual return distribution. So, we can't apply GEV directly to the returns. We must first extract maxima from blocks.
We choose 20 days block size because it represents 1 month in trading (markets are closed the week-ends).

In [70]:
#Block maxima function
def b_m(rend): 
    #For the 20 days of trading month
    size=20
    n=len(rend)
    blocks=n//size  #To have only complete block we do an integer division
    rend_on_size=rend[:blocks*size]#We select the corresponding data
    rend_reshape = rend_on_size.reshape(blocks, size) #We reshape in matrix
    maxima=np.max(rend_reshape, axis=1)#Find the maximum per block
    return maxima

losse=-returns
maxima_losse=b_m(losse)
gain=returns
maxima_gains=b_m(gain)

>Our objective is to estimate the three parameters $\xi$, $\mu$ and $\sigma$.

1) To estimate $\xi$, we use the Pickands Estimator. This estimator is based on order statistics, where $X_{1:n} \le X_{2:n} \le \dots \le X_{n:n}$ are the sorted observations.

For a given threshold parameter $k$, the Pickands estimator is defined as:

$$\hat{\xi}_P(k) = \frac{1}{\log(2)} \log \left( \frac{X_{n-k+1:n} - X_{n-2k+1:n}}{X_{n-2k+1:n} - X_{n-4k+1:n}} \right)$$

Where:
* $n$ the  number of observations
* $X_{n-k+1:n}$ corresponds to the $k$-th largest value
* $k \to \infty$ and $k/n \to 0$ as $n \to \infty$


Once the parameter find, we will be able to establish the nature of the tails:
* $\xi > 0$, the GEV is of Fréchet kind : Heavy tail, typical for financial returns
* $\xi = 0$, the GEV is of Gumbel kind: Thin tails, for normal or exponential distributions
* $\xi < 0$, the GEV is of Weibull kind: Short/bounded tails

>We have to choose a value for k. We decide to choose k=n/4 as indicated in *"Generalized Pickands estimators for the extreme value index"* by **Johan Segers - Google Scholar**  https://www.sciencedirect.com/science/article/abs/pii/S037837580300377X

In [71]:
def pickands_estimator(rend): #rend is the block maximas
    rend_sort=np.sort(rend) #Sort data in asc order
    n=len(rend_sort) 
    k=int(n/4)# We use 'int' otherwize the index can become not integer and it is not possible
    #For each order statistic
    X_n_k=rend_sort[n-k]      
    X_n_2k=rend_sort[n-2*k]   
    X_n_4k=rend_sort[n-4*k]
    #Pickands formula
    xi=(1/np.log(2))*np.log((X_n_k-X_n_2k)/(X_n_2k-X_n_4k))
    return xi

xi_losses=pickands_estimator(maxima_losses)
xi_gains=pickands_estimator(maxima_gains)

print(f"GEV ξ for left tail(losses)= {xi_losses:.4f}")
print(f"GEV ξ for right tail(gains)= {xi_gains:.4f}")

GEV ξ for left tail(losses)= 0.0978
GEV ξ for right tail(gains)= -0.4342


2) Now that we have estimated $\xi$, we use the empirical moments to estimate $\mu$ and $\sigma$:

For the GEV distribution we have:

* $E[X] = \mu + \sigma \frac{\Gamma(1-\xi) - 1}{\xi}$ for $\xi < 1$
* $Var(X) = \sigma^2 \frac{\Gamma(1-2\xi) - \Gamma^2(1-\xi)}{\xi^2}$ for $\xi < 1/2$

We solve this system to obtain $\mu$ and $\sigma$ from the empirical moments given the $\xi$ values.

* $\mu = E[X] - \sigma \frac{\Gamma(1-\xi) - 1}{\xi}$
* $\sigma = \sqrt{\frac{Var(X) \cdot \xi^2}{\Gamma(1-2\xi) - \Gamma^2(1-\xi)}}$


In [72]:
def estimate_mu_sigma(rend, xi): #rend is still block maxima
    mean_x=np.mean(rend)
    var_x=np.var(rend)
    #We apply the formulas above
    g1=gamma(1-xi)
    g2=gamma(1-2*xi)
    var=(g2-g1**2)/xi**2
    # We solve the equations
    sigma = np.sqrt(var_x/var)
    mu = mean_x - sigma * (g1 - 1) / xi
    return mu, sigma

#Estimation for both tails
mu_losses, sigma_losses = estimate_mu_sigma(maxima_losses, xi_losses)
mu_gains, sigma_gains = estimate_mu_sigma(maxima_gains, xi_gains)


print(f"For the left tail, μ= {mu_losses:.6f} and σ={sigma_losses:.6f}")
print(f"For the right tail, μ= {mu_gains:.6f} and σ={sigma_gains:.6f}")

For the left tail, μ= 0.026091 and σ=0.016265
For the right tail, μ= 0.032956 and σ=0.017569


Our estimations show a marked asymmetry between the two distribution tails: 

* The left tail(losses) exhibits a positive shape parameter $\xi = 0.0978$ (with $\mu = 0.026091$ and $\sigma = 0.016265$), characteristic of a heavy tailed Fréchet distribution. So, it implies that extreme losses, even if they are rare, can be significant and are not theoretically bounded. Thus, there is a significant probability of seeing very large negative returns during periods of high volatility for the Natixis stock.

* However, the right tail(gains) shows a negative shape parameter $\xi = -0.4342$ (with $\mu = 0.032956$ and $\sigma = 0.017569$), corresponding to a short tailed Weibull distribution. This suggests the existence of an upper bound on daily gains, limiting the potential for exceptional positive returns. Thus, while the Natixis stock can lose a lot in a single day, its capacity for extreme daily gains is more constrained.

In addition, this asymmetry is consistent with the risk profile of Natixis stock and justifies the use of the Extreme Value Theory approach rather than the use of Gaussian models, which would incorrectly assume symmetry between the tails and underestimate the severity of potential crashes.

>b – Calculate the value at risk based on EVT for various confidence levels, with the assumption of iid
returns.

VaR corresponds to the quantile of the loss distribution:

$$\text{VaR}_{\alpha} = q_{1-\alpha}$$

For GEV: $q_a = \mu + \frac{\sigma}{\xi}[(-\ln p)^{-\xi} - 1]$ for ξ ≠ 0

As seen before, our GEV parameters are for block maxima. We must scale them to daily level before calculating daily VaR.

Thus:
* ξ remains unchanged (shape is scale-invariant)
* μ and σ change according to: $\mu_1 = \mu_n + \frac{\sigma_n}{\xi}(n^{-\xi} - 1)$ and $\sigma_1 = \sigma_n \cdot n^{-\xi}$

In [85]:
def scale(xi, mu_b, sigma_b):
    size=20
    ratio=1/size
    sigma_daily=sigma_b*(ratio**xi)
    mu_daily=mu_b+(sigma_b/xi)*(ratio**xi-1)
    return mu_daily, sigma_daily

#quantile of GEV distribution
def gev_q(a, xi, mu, sigma):
    return mu+(sigma/xi)*((-np.log(a))**(-xi)- 1)

# Transformation to daily levels
mu_losses_daily, sigma_losses_daily=scale(xi_losses, mu_losses, sigma_losses)
levels=[0.80, 0.90, 0.95, 0.99] #Levels of confidence

# Calculation of the VaR for each confidence level
for alpha in levels:
    var=gev_q(1-alpha, xi_losses, mu_losses_daily, sigma_losses_daily)
    print(f"The VaR EVT with {alpha*100:.0f}% confidence level for negative returns is: {-var*100:.2f}%") #By convention, VaR is positive

The VaR EVT with 80% confidence level for negative returns is: 2.18%
The VaR EVT with 90% confidence level for negative returns is: 2.59%
The VaR EVT with 95% confidence level for negative returns is: 2.88%
The VaR EVT with 99% confidence level for negative returns is: 3.34%
