 ### **Question C | Extreme Value Theory**

>With the dataset provided for TD1 on Natixis prices, first calculate daily returns. You will then analyse
these returns using a specific method in the field of the EVT.

>a – Estimate the GEV parameters for the two tails of the distribution of returns, using the estimator of
Pickands. What can you conclude about the nature of the extreme gains and losses?

In [None]:
import pandas as pd
import numpy as np

In [7]:
#We load the data 
df = pd.read_csv("Natixis Stock.csv", header=None, sep=r"\s+", names=["Date", "Price"])
df.head()

Unnamed: 0,Date,Price
0,02/01/2015,5621
1,05/01/2015,5424
2,06/01/2015,5329
3,07/01/2015,5224
4,08/01/2015,5453


In [8]:
#Parsing dates correctly and handling the decimal format
df["Date"]=pd.to_datetime(df["Date"], format="%d/%m/%Y", errors="coerce")
df["Price"]=df["Price"].str.replace(",",".").astype(float)

In [9]:
#Some informations about the dataset
print(df.head(5))
print(df.tail(5))
print(df.describe())
#Verification of data
print("Number of NaN price:", df["Price"].isna().sum())
print("Number of NaN price:", df["Date"].isna().sum())

        Date  Price
0 2015-01-02  5.621
1 2015-01-05  5.424
2 2015-01-06  5.329
3 2015-01-07  5.224
4 2015-01-08  5.453
           Date  Price
1018 2018-12-21  4.045
1019 2018-12-24  4.010
1020 2018-12-27  3.938
1021 2018-12-28  4.088
1022 2018-12-31  4.119
                                Date        Price
count                           1023  1023.000000
mean   2016-12-30 10:54:32.727272704     5.684600
min              2015-01-02 00:00:00     3.077000
25%              2016-01-02 00:00:00     4.927000
50%              2016-12-29 00:00:00     5.782000
75%              2017-12-28 12:00:00     6.532000
max              2018-12-31 00:00:00     7.744000
std                              NaN     1.021034
Number of NaN price: 0
Number of NaN price: 0


In [10]:
df["return"]=df["Price"].pct_change() 
df = df.iloc[1:] #Del the Nan values of the first row
df

Unnamed: 0,Date,Price,return
1,2015-01-05,5.424,-0.035047
2,2015-01-06,5.329,-0.017515
3,2015-01-07,5.224,-0.019704
4,2015-01-08,5.453,0.043836
5,2015-01-09,5.340,-0.020723
...,...,...,...
1018,2018-12-21,4.045,-0.001481
1019,2018-12-24,4.010,-0.008653
1020,2018-12-27,3.938,-0.017955
1021,2018-12-28,4.088,0.038090


To estimate $\xi$, we use the Pickands Estimator. This estimator is based on order statistics, where $X_{1:n} \le X_{2:n} \le \dots \le X_{n:n}$ are the sorted observations.

For a given threshold parameter $k$, the Pickands estimator is defined as:

$$\hat{\xi}_P(k) = \frac{1}{\log(2)} \log \left( \frac{X_{n-k+1:n} - X_{n-2k+1:n}}{X_{n-2k+1:n} - X_{n-4k+1:n}} \right)$$

Where:
* $n$ the  number of observations
* $X_{n-k+1:n}$ corresponds to the $k$-th largest value
* $k \to \infty$ and $k/n \to 0$ as $n \to \infty$


Once the parameter find, we will be able to establish the nature of the tails:
* $\xi > 0$, the GEV is of Fréchet kind : Heavy tail, typical for financial returns
* $\xi = 0$, the GEV is of Gumbel kind: Thin tails, for normal or exponential distributions
* $\xi < 0$, the GEV is of Weibull kind: Short/bounded tails

In [None]:
sorted_returns = df.sort_values(by='return', ascending=True)
returns_negative= sorted_returns[sorted_returns['return']<0]*(-1) # On prends la queue de distribution des pertes, on multiplie par -1 pr les avoirs en positif
returns_negative = returns_negative.sort_values(by='return', ascending=True)

returns_positive= sorted_returns[sorted_returns['return'] > 0]


def estimateur_pickands(return_sort):
  n = len(return_sort)#revoir les indexs
  index_1 = n - int(np.log(n))+1
  index_2 = n - int(2 * np.log(n))+1
  index_4 = n - int(4 * np.log(n))+1

  pickands = (1 / np.log(2)) * np.log((return_sort.iloc[index_1]['return'] - return_sort.iloc[index_2]['return']) / (return_sort.iloc[index_2]['return'] - return_sort.iloc[index_4]['return']))

  return pickands

gev_rend_positive = estimateur_pickands(returns_positive)
gev_rend_neg = estimateur_pickands(returns_negative)

print("GEV Paremeters for positive returns : ", gev_rend_positive)
print("GEV Parameters for negative returns : ", gev_rend_neg)

TypeError: cannot perform __mul__ with this index type: DatetimeArray

In [None]:
returns = df['return'].values
gains = returns[returns > 0]
losses = -returns[returns < 0]

def p_estimator(data, k):
    """
    Calcule l'estimateur de Pickands pour un seuil k.
    data: tableau de données (gains ou pertes)
    k: paramètre de seuil (4k <= n)
    """
    n = len(data)
    # Tri des données pour obtenir les statistiques d'ordre
    sorted_data = np.sort(data)
    
    # Formule de Pickands (p. 180)
    # X_{n-k+1:n} est le k-ième plus grand élément
    index1 = sorted_data[n - k]
    index2 = sorted_data[n - 2*k]
    index3 = sorted_data[n - 4*k]
    
    numerator = val1 - val2
    denominator = val2 - val3
    
    # On évite la division par zéro ou log de valeurs négatives
    if denominator <= 0 or numerator <= 0:
        return np.nan
        
    return (1 / np.log(2)) * np.log(numerator / denominator)

# 3. Estimation pour différents k pour analyser la stabilité
k_values = range(1, len(losses) // 4)
xi_gains = [p_estimator(gains, k) for k in k_values]
xi_losses = [p_estimator(losses, k) for k in k_values]

# Affichage des résultats pour un k raisonnable (ex: 10% des données)
k_target = int(len(losses) * 0.1)
xi_p_losses = p_estimator(losses, k_target)
xi_p_gains = p_estimator(gains, k_target)

print(f"Estimation de xi (Pertes) pour k={k_target} : {xi_p_losses:.4f}")
print(f"Estimation de xi (Gains) pour k={k_target} : {xi_p_gains:.4f}")

Estimation de xi (Pertes) pour k=51 : 0.3233
Estimation de xi (Gains) pour k=51 : -0.7601
