# Extreme Value Theory

In the realm of statistics and probability theory, the Extreme Value Theory (EVT) stands as a powerful framework that deals with the analysis of extreme events in data. This theory provides tools and insights to understand the behavior of extreme values within various types of distributions. Let's delve into the intricacies of the Extreme Value Theory, accompanied by mathematical expressions, to uncover its significance and applications.

These extremes could encompass unusually high or low values, often associated with rare and catastrophic events. The primary focus of EVT is to model and predict the likelihood of these extreme events occurring in various datasets.

EVT operates under three main types: Block Maxima, Peak Over Threshold, and Mixed-Type approaches. However, we'll primarily focus on the Block Maxima approach for simplicity.

## Block Maxima

The Block Maxima approach is a fundamental concept within EVT. It involves dividing a dataset into non-overlapping blocks and extracting the maximum value from each block. The distribution of these block maxima is then analyzed using EVT techniques.

Mathematically, the Block Maxima approach can be summarized as follows:

Given a dataset of independent and identically distributed random variables

$$\{x_1, x_2, \ldots, x_n\}$$

the block maxima sequence

$$\{M_1, M_2, \ldots, M_k\}$$

where $M_i = max(x_{i1}, x_{i2}, \ldots, x_{im});\ $ for $i=1, 2, 3, \ldots, k$

here, $k$ represents the number of blocks, and $m$ represents the size of each block.

In [1]:
import numpy as np

In [2]:
# Generate synthetic data (you can replace this with your own dataset)
np.random.seed(42)
data = np.random.normal(0, 1, 1000)  # Replace with your data

In [3]:
# Define parameters for block maxima
block_size = 50  # Size of each block (m)
num_blocks = len(data) // block_size  # Number of blocks (k)

In [4]:
# Create an array to store block maxima
block_maxima = np.empty(num_blocks)

# Calculate block maxima
for i in range(num_blocks):
    start_idx = i * block_size
    end_idx = start_idx + block_size
    block_maxima[i] = np.max(data[start_idx:end_idx])

In [5]:
block_maxima

array([1.85227818, 1.56464366, 2.46324211, 2.72016917, 3.85273149,
       2.13303337, 2.09238728, 2.18980293, 2.06074792, 3.07888081,
       1.90941664, 2.27069286, 2.44575198, 2.5733598 , 1.75479418,
       2.63238206, 2.45530014, 2.52693243, 2.16325472, 1.80094043])

## The Generalized Extreme Value Distribution (GEV)
In this section, we will delve into the methodology of utilizing the Generalized Extreme Value (GEV) distribution to analyze the maximum value within each block, following the "Block Maxima" approach.

The cornerstone of the Extreme Value Theory is the Generalized Extreme Value (GEV) distribution. This distribution describes the limiting behavior of block maxima sequences. The GEV distribution is characterized by three parameters:
- location ($\mu$), 
- scale ($\sigma$), 
- and shape ($\xi$).

Mathematically, the probability density function (pdf) of the GEV distribution is given by:

$$f(x; \mu, \sigma, \xi) = \frac{1}{\sigma}g\left(\frac{x-\mu}{\sigma}\right)h(\xi)$$

where $g(z)$ is the standard pdf of the type I extreme value distribution and $h(\xi)$ is the shape function that depends on the parameter $\xi$.

In [6]:
from scipy.stats import genextreme

In [7]:
# Fit a Generalized Extreme Value (GEV) distribution to block maxima
mu, sigma, xi = genextreme.fit(block_maxima)

# Parameters of the GEV distribution
print("Estimated parameters (mu, sigma, xi):", mu, sigma, xi)

Estimated parameters (mu, sigma, xi): -0.0183519383801833 2.098882046041276 0.3833962170762179


In [8]:
# Calculate return level for a given return period (e.g., 100-year return level)
return_period = 100
return_level = genextreme.ppf(1 - 1 / return_period, mu, sigma, xi)

print(f"{return_period}-year return level:", return_level)

100-year return level: 3.939148176546465


## Application

In [9]:
import yfinance as yf
import datetime
import pandas as pd

In [10]:
symbol = 'BTC-USD'
period = '1wk'
interval = '1m'
prepost = True
today = datetime.date.today()
today_date_str = today.strftime("%Y-%m-%d")

#  NOTE: 7 days is the max allowed
days = datetime.timedelta(7)
start_date = today - days
start_date_str = datetime.datetime.strftime(start_date, "%Y-%m-%d")

df = yf.download(symbol, start=start_date_str, end=today_date_str, period=period, interval=interval, prepost=prepost)

[*********************100%***********************]  1 of 1 completed


In [11]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-08-12 17:00:00+00:00,29448.470703,29448.470703,29448.470703,29448.470703,29448.470703,0
2023-08-12 17:01:00+00:00,29446.568359,29446.568359,29446.568359,29446.568359,29446.568359,0
2023-08-12 17:02:00+00:00,29446.740234,29446.740234,29446.740234,29446.740234,29446.740234,0
2023-08-12 17:03:00+00:00,29448.080078,29448.080078,29448.080078,29448.080078,29448.080078,0
2023-08-12 17:04:00+00:00,29447.566406,29447.566406,29447.566406,29447.566406,29447.566406,0


In [12]:
pct_return = df['Close'].pct_change().dropna()
block_maxima_series = pct_return.groupby(pd.Grouper(freq = 'H')).min()

In [13]:
block_maxima = np.abs(block_maxima_series.values)
block_maxima

array([1.46489016e-04, 2.87416438e-04, 2.38095934e-04, 8.15523043e-05,
       1.70415509e-04, 1.02405581e-04, 1.45394519e-04, 1.17326182e-04,
       1.43419520e-04, 1.11137423e-04, 1.09780853e-03, 3.21699798e-04,
       2.67397993e-04, 1.79350257e-04, 1.50982737e-04, 6.99922723e-05,
       1.16015609e-04, 6.61831003e-05, 2.11640732e-04, 1.83768861e-04,
       2.35949155e-04, 5.47487779e-05, 2.45130314e-04, 3.31170031e-04,
       1.65960950e-04, 1.99687316e-04, 2.21743588e-04, 2.52013536e-04,
       5.68887497e-04, 9.09878287e-04, 7.89557609e-04, 4.38978018e-04,
       6.18430882e-04, 7.10527924e-04, 8.32469959e-04, 2.41886306e-04,
       1.81478116e-04, 4.54900410e-04, 3.65392632e-04, 4.47354416e-04,
       3.83950140e-04, 3.54051442e-04, 2.13990384e-04, 3.73737629e-04,
       3.83349496e-04, 5.85410088e-04, 1.07485490e-03, 5.27247022e-04,
       7.40857223e-04, 2.51605302e-03, 4.28067940e-04, 2.54634963e-04,
       3.56484781e-04, 1.62355434e-04, 2.37589675e-04, 3.40885826e-04,
      

In [14]:
# Fit a Generalized Extreme Value (GEV) distribution to block maxima
mu, sigma, xi = genextreme.fit(block_maxima)

# Parameters of the GEV distribution
print("Estimated parameters (mu, sigma, xi):", mu, sigma, xi)

Estimated parameters (mu, sigma, xi): -0.9136511356159176 0.00036167545103704885 0.00034155640134270323


In [15]:
# Calculate return level for a given return period (e.g., 100-year return level)
return_period = 100
return_level = genextreme.ppf(1 - 1 / return_period, mu, sigma, xi)

print(f"{return_period}-year return level:", -return_level)

100-year return level: -0.024990832705387787
