# Modelling Extreme values for the Wind Farm project

This notebook is a brief example of the possibilities offered by the toolbox for modeling extreme values, adapted from the tools provided from the ResourceCode website.

It relies on the `pyextreme` library which get installed with the Resourcecode toolbox. Here we demonstrate 2 examples of univariate modeling as shown in class. For more information, see https://georgebv.github.io/pyextremes/.

In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plot
from pyextremes import (
    plot_mean_residual_life,
    plot_parameter_stability, 
    EVA
)
import resourcecode

from resourcecode.eva import (
    censgaussfit,
    get_fitted_models,
    get_gpd_parameters,
    run_simulation,
    huseby,
)
import warnings
warnings.filterwarnings("ignore")

We first load the data from the Bretagne Sud 1 location `126096` (coordinates: $(47.5882, -3.3215)$).

In [23]:
point_id, dist_m = resourcecode.data.get_closest_point(latitude=47.5882, longitude=-3.3215)
print(point_id, dist_m)

126096 238.69


In [24]:
client = resourcecode.Client()
data = client.get_dataframe_from_criteria(
    """
{
    "node": 126096,
    "start": 0,
    "end": 99999903600,
    "parameter": ["hs","uwnd","vwnd"]
}
"""
)

In [25]:
data.head()

Unnamed: 0,hs,uwnd,vwnd
1994-01-01 00:00:00,3.126,10.2,-6.2
1994-01-01 01:00:00,3.2,9.6,-5.9
1994-01-01 02:00:00,3.242,8.9,-4.8
1994-01-01 03:00:00,3.28,9.2,-4.1
1994-01-01 04:00:00,3.304,9.4,-4.1


From the $u$ and $v$ components of the wind, calculate the wind speed and direction.

In [26]:
data["wspd"], data["wdir"] = resourcecode.utils.zmcomp2metconv(data.uwnd, data.vwnd)

In [27]:
data.head()

Unnamed: 0,hs,uwnd,vwnd,wspd,wdir
1994-01-01 00:00:00,3.126,10.2,-6.2,11.936499,301.293039
1994-01-01 01:00:00,3.2,9.6,-5.9,11.268097,301.574191
1994-01-01 02:00:00,3.242,8.9,-4.8,10.111874,298.339132
1994-01-01 03:00:00,3.28,9.2,-4.1,10.072239,294.020247
1994-01-01 04:00:00,3.304,9.4,-4.1,10.255243,293.565396


### Modelling univariate time series: Block maxima + GEVD (Generalized Extreme Value Distribution)

We show as an example here a **BM** (block maxima) model fitted to the $H_s$ time series. In this approach, the maximum value is identified within a "block" or fixed period in time, and then a GEVP distribution is fit to the data to estimate the return values.  

The same plot can readily be obtained for the other sea-state parameters.

After loading the data, apply the block method approach with a block size of 1 year (365.2425 days), where each data block must be at least 90% full to take into account in the analysis.

In [28]:
model = EVA(data.hs)
model.get_extremes(method="BM", block_size="365.2425D", min_last_block=0.9)

In [29]:
model.extremes.head()

date-time
1994-02-03 19:00:00    6.164
1995-09-07 14:00:00    7.448
1996-02-07 16:00:00    6.550
1997-11-09 12:00:00    5.766
1998-01-04 19:00:00    6.416
Name: hs, dtype: float64

In [30]:
model.plot_extremes()

(<Figure size 768x480 with 1 Axes>, <Axes: xlabel='date-time', ylabel='hs'>)

In [31]:
model.fit_model()

The parameter alpha specifies the confidence limits (default = 0.95).

In [32]:
model.plot_diagnostic(alpha=0.95)

(<Figure size 768x768 with 4 Axes>,
 (<Axes: title={'center': 'Return value plot'}, xlabel='Return period', ylabel='hs'>,
  <Axes: title={'center': 'Probability density plot'}, xlabel='hs', ylabel='Probability density'>,
  <Axes: title={'center': 'Q-Q plot'}, xlabel='Theoretical', ylabel='Observed'>,
  <Axes: title={'center': 'P-P plot'}, xlabel='Theoretical', ylabel='Observed'>))

The parameter n_samples indicates the number of bootstrap samples used to estimate the confidence bounds.

In [33]:
summary = model.get_summary(
        return_period=[1, 2, 5, 10, 25, 50, 100, 250, 500, 1000],
        alpha=0.95,
        n_samples=1000,
    )
print(summary)

               return value  lower ci   upper ci
return period                                   
1.0                    -inf       NaN        NaN
2.0                6.091837  5.878963   6.309792
5.0                6.698079  6.378760   6.975143
10.0               7.099463  6.662922   7.430999
25.0               7.606614  7.007089   8.048302
50.0               7.982847  7.263715   8.497280
100.0              8.356302  7.515419   8.950155
250.0              8.848017  7.845991   9.545445
500.0              9.219300  8.094291   9.992754
1000.0             9.590315  8.343333  10.434817


### Modelling univariate time series: Peaks over threshold (POT) + GPD (Generalized Pareto Distribution)

We show as example here a **POT** (peaks over threshold) model fitted to the $H_s$ time series. This analysis first finds values over a specified threshold and then declusters these values using a predefined clustering distance, and finally finds the maximum value within each cluster. 

The same plot can readily be obtained for the other sea-state parameters.

We first can have a look at the quality of the fitted model, and to the corresponding return levels as a function of the selected wave height threshold. The parameters r and alpha specify the minimum time distance (duration) between adjacent clusters and the confidence limits (default = 0.95), respectively.

The shape and modified scale parameters define the Generalized Pareto Distribution, and they depend on the threshold value, but should be stable within a range of valid thresholds (e.g. less than ~3m here).

In [34]:
plot_parameter_stability(ts=data.hs,r='72H',alpha=.95);

The mean residual life plots the average excess value over a given threshold, and it should be approcimately linear above the threshold for which the GPD model is valid (e.g. <~3m)

In [35]:
plot_mean_residual_life(data.hs);

The analysis is completed for both Hs and the wind speed, specifying a window of 72 hours and a quantile of 0.98 for determining the threshold to specify.

In [36]:
quant=0.98
models = get_fitted_models(data[["hs","wspd"]],quantile=quant,r="72H")

In [37]:
models

[                           Univariate Extreme Value Analysis                            
                                       Source Data                                       
 ----------------------------------------------------------------------------------------
 Data label:                            hs      Size:                             236,688
 Start:                       January 1994      End:                        December 2020
                                      Extreme Values                                     
 ----------------------------------------------------------------------------------------
 Count:                                189      Extraction method:                    POT
 Type:                                high      Threshold:                          3.828
                                          Model                                          
 ----------------------------------------------------------------------------------------
 Model:   

In [38]:
models[0].plot_diagnostic(alpha=0.95);

In [39]:
models[1].plot_diagnostic(alpha=0.95);

In [40]:
pd.DataFrame(get_gpd_parameters(models),columns=["mu","sigma","xi"],index=["Hs","Wspd"])

Unnamed: 0,mu,sigma,xi
Hs,3.828,1.325214,-0.305804
Wspd,13.730623,2.680247,-0.122354


In [41]:
summary_Hs = models[0].get_summary(
    return_period=[1, 2, 5, 10, 25, 50, 100],
    alpha=0.95,
    n_samples=1000,
)
summary_Wspd = models[1].get_summary(
    return_period=[1, 2, 5, 10, 25, 50, 100],
    alpha=0.95,
    n_samples=1000,
)
print(summary_Hs)
print(summary_Wspd)

               return value  lower ci  upper ci
return period                                  
1.0                5.771451  5.572208  5.978880
2.0                6.227978  5.986591  6.443090
5.0                6.700485  6.416664  6.936324
10.0               6.979559  6.647185  7.246600
25.0               7.268400  6.866842  7.616752
50.0               7.438997  6.990886  7.859313
100.0              7.577008  7.072680  8.070122
               return value   lower ci   upper ci
return period                                    
1.0               19.436488  18.778446  20.012010
2.0               20.753737  19.614968  21.511636
5.0               22.332120  20.483546  23.453966
10.0              23.413918  20.984015  24.773069
25.0              24.710173  21.589799  26.589661
50.0              25.598606  21.937883  27.977304
100.0             26.414799  22.191668  29.227659
