# **Tutorial 5: Estimate Model Parameters - Maximum Likelihood, Moments, Bayesian**

**Week 2, Day 4, Extremes & Vulnerability**

**Content creators:** Matthias Aengenheyster, Joeri Reinders

**Content reviewers:** Dionessa Biton, Younkap Nina Duplex, Sloane Garelick, Zahra Khodakaramimaghsoud, Peter Ohue, Laura Paccini, Jenna Pearson, Agustina Pesce, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan

**Content editors:** Jenna Pearson, Chi Zhang, Ohad Zivan

**Production editors:** Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan

**Our 2023 Sponsors:** NASA TOPS

# **Tutorial Objectives:**

In the previous tutorials, we focused on fitting GEV distributions to our observational data.

In this tutorial, we will delve deeper into the fitting process itself, specifically how the model parameters are estimated. As we learned previously, the GEV distribution consists of three variables: the location, scale, and shape parameters. By default, the `ef.fit_return_levels_sdfc` function utilizes a '[maximum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation)' (MLE) approach.

MLE serves as an estimator, a function that uses the available data to provide an approximation of the parameters. In MLE, we calculate the likelihood of a given parameter set producing the observed data. This likelihood is based on the probability density function discussed earlier, which is determined by the parameter set.

Thus, in our previous tutorials, we made two significant assumptions:

- We assumed that we were working with the GEV distribution.
- We assumed that the MLE method would yield the best parameter estimation.

In this tutorial, we will put these assumptions to the test by employing different parameter estimation methods to examine potential variations in the parameter values. We will demonstrate how you can utilize various models.

By the end of this tutorial, you will be able to:

- Explain different methods of parameter estimation.
- Compute the parameter values obtained through different parameter estimation methods.

# **Setup**

In [3]:
# Installs

In [4]:
# !pip install -q condacolab
# import condacolab
# condacolab.install()
# #install dependencies - taken from <Yosmely Bermúdez> comments for Tutorial 6
# # We need this to install eigen which is needed for SDFC to install correctly
# !mamba install eigen numpy matplotlib seaborn pandas cartopy scipy texttable intake xarrayutils xmip cf_xarray intake-esm
# !pip install -v https://github.com/yrobink/SDFC/archive/master.zip#subdirectory=python
# !pip install https://github.com/njleach/mystatsfunctions/archive/master.zip

In [None]:
# @title Video 1: Speaker Introduction
#Tech team will add code to format and display the video

In [None]:
# @title Figure Settings
import ipywidgets as widgets       # interactive display
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle")

## Imports and function definitions

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from scipy import stats

import extremes_functions as ef
from mystatsfunctions import OLSE,LMoments
import SDFC as sd

In [6]:
def estimate_return_level(quantile,model):
    loc, scale, shape = model.coef_
    level = loc - scale / shape * (1 - (-np.log(quantile))**(-shape))
    # level = stats.genextreme.ppf(quantile,-shape,loc=loc,scale=scale)
    return level

# **Section 1**
We start by opening again the precipitation data from Germany once more, and fit a GEV distribution to it using the **MLE** method, you can do this by adding `method = MLE` to the fit function. Take a look at the parameter values.

In [7]:
import os, pooch

fname = 'precipitationGermany_1920-2022.csv'
if not os.path.exists(fname):
    url = "https://osf.io/xs7h6/download"
    fname = pooch.retrieve(url, known_hash=None)

data = pd.read_csv(fname, index_col=0).set_index('years')

data.columns=['precipitation']
precipitation = data.precipitation

In [8]:
fit, model = ef.fit_return_levels_sdfc(precipitation.values,times=np.arange(1.1,1000),periods_per_year=1,kind='GEV',N_boot=10,full=True,model=True,method='mle')

In [9]:
model

+-----------+--------+------------+--------+----------------+----------------+
| GEV (mle) |  Link  |    Type    |  coef  | Quantile 0.025 | Quantile 0.975 |
| loc       | IdLink | Stationary | 26.354 | 25.222         | 27.521         |
+-----------+--------+------------+--------+----------------+----------------+
| scale     | IdLink | Stationary | 7.369  | 6.522          | 8.866          |
+-----------+--------+------------+--------+----------------+----------------+
| shape     | IdLink | Stationary | 0.047  | -0.045         | 0.149          |
+-----------+--------+------------+--------+----------------+----------------+

There are two additional methods for parameter estimation that we will explore and compare with the MLE approach: the **L-moments method** and a **Bayesian method**. 

The L-moments provide information about the shape of a probability distribution, similar to regular moments, and are derived from a linear combination of the order statistics (where "L" stands for linear). By utilizing L-moments, we can compute the parameters of the Generalized Extreme Value (GEV) distribution through a set of equations.

On the other hand, the Bayesian technique involves incorporating prior knowledge or beliefs about the parameters into the estimation process. In Bayesian estimation, a prior distribution is specified for the parameters, representing the initial beliefs about their values. This prior distribution is combined with the likelihood function, which represents the probability of observing the data given the parameters, to obtain the posterior distribution using Bayes' theorem. 

To estimate the GEV parameters using these two methods, you can modify the 'method' parameter to either `Lmoments` or `Bayesian`. Subsequently, calculate the 100-year flood using all three parameter sets (`MLE`, `L-moments`, and `Bayesian`).

In [10]:
fit_moments, model_moments = ef.fit_return_levels_sdfc(precipitation.values,times=np.arange(1.1,1000),periods_per_year=1,kind='GEV',N_boot=10,full=True,model=True,method='moments')

In [11]:
fit_moments

In [12]:
model_moments

+---------------+--------+------------+--------+----------------+----------------+
| GEV (moments) |  Link  |    Type    |  coef  | Quantile 0.025 | Quantile 0.975 |
| loc           | IdLink | Stationary | 26.526 | 25.824         | 28.227         |
+---------------+--------+------------+--------+----------------+----------------+
| scale         | IdLink | Stationary | 2.042  | 1.945          | 2.151          |
+---------------+--------+------------+--------+----------------+----------------+
| shape         | IdLink | Stationary | 0      | 0              | 0              |
+---------------+--------+------------+--------+----------------+----------------+

In [13]:
prior = stats.multivariate_normal(mean= model.coef_, cov = np.cov(model.coefs_bootstrap.T), allow_singular=True)

In [14]:
fit_bayes, model_bayes = ef.fit_return_levels_sdfc(precipitation.values,times=np.arange(1.1,1000),periods_per_year=1,kind='GEV',N_boot=10,full=True,model=True,method='bayesian',prior=prior,mcmc_init=model.coef_)

  p_accept = np.exp( p_next - p_current )


In [15]:
fit_bayes

In [16]:
model_bayes

+----------------+--------+------------+--------+----------------+----------------+
| GEV (bayesian) |  Link  |    Type    |  coef  | Quantile 0.025 | Quantile 0.975 |
| loc            | IdLink | Stationary | 26.296 | -4.711         | 22.791         |
+----------------+--------+------------+--------+----------------+----------------+
| scale          | IdLink | Stationary | 7.399  | -4.334         | 8.951          |
+----------------+--------+------------+--------+----------------+----------------+
| shape          | IdLink | Stationary | 0.053  | -4.127         | 3.365          |
+----------------+--------+------------+--------+----------------+----------------+

In [17]:
period = 100
quantile = 1-1/period

print('MLE: %.2f' % estimate_return_level(quantile,model))
print('Moments: %.2f' % estimate_return_level(quantile,model_moments))
print('Bayes: %.2f' % estimate_return_level(quantile,model_bayes))

MLE: 64.21
Moments: 35.92
Bayes: 64.80


In [14]:
period = 50
quantile = 1-1/period

print('MLE: %.2f' % estimate_return_level(quantile,model))
print('Moments: %.2f' % estimate_return_level(quantile,model_moments))
print('Bayes: %.2f' % estimate_return_level(quantile,model_bayes))

MLE: 57.92
Moments: 34.49
Bayes: 58.73


In [15]:
period = 500
quantile = 1-1/period

print('MLE: %.2f' % estimate_return_level(quantile,model))
print('Moments: %.2f' % estimate_return_level(quantile,model_moments))
print('Bayes: %.2f' % estimate_return_level(quantile,model_bayes))

MLE: 79.55
Moments: 39.21
Bayes: 81.44


In [16]:
gev = LMoments.gev()
gev.fit(precipitation.values)
gev.X, gev.a, gev.k

(26.453249703966794, 7.687907961826535, -0.01055113362649465)

## Question: 
What can you say about the parameters estimated by the different methods? 

In [2]:
# to_remove explanation
"""
The parameters estimated by the different methods show variability, indicating that the chosen method of parameter estimation can significantly impact the outcome. Both MLE and Bayesian methods produce comparable return level estimates, which are substantially higher than that estimated through the L-moments.
"""

'\nThe parameters estimated by the different methods show variability, indicating that the chosen method of parameter estimation can significantly impact the outcome. Both MLE and Bayesian methods produce comparable return level estimates, which are substantially higher than that estimated through the L-moments.\n'

# **Summary**
In this tutorial, we have learned about different methods of parameter estimation for the GEV distribution. We examined three different approaches: Maximum Likelihood Estimation (MLE), L-moments, and a Bayesian method. By comparing these techniques, we've seen how different assumptions and approaches to parameter estimation can yield varied results. We also applied these methods to compute the 100-year flood estimation using precipitation data from Germany. This has helped us understand the impact of our assumptions in the modeling process and how to use different models effectively.