# Parameter Estimation

## What is a Parameter Estimator?

A parameter estimator is a function of the sample approximating a parameter of the distribution.

Example: 

* sample mean is an estimator for the mean of the normal distribution
* sample variance is an estimator for the variance of the normal distribution



## Method of Moments Estimator

In section [Parametric Distributions](./stats_parametric_distributions.ipynb) we used formulas based on sample statistics such as sample mean, variance, and skewness (centered moments) to obtain the model parameters. Those formulas were obtained by matching the sample moments to the theoretical moments of the distribution, and the procedure is called **method of moments**.

## Maximum Likelihood Estimator



Given an observed sample $x$ and a probability model with a density function $p_{\theta}(x)$, the likelihood function is defined as a function of the parameter:
$$\mathcal{L(\theta| x)} = p_{\theta}(x)$$

When the sample consists of $N$ independent observations $x_1,...,x_N$, the likelihood is:
$$\mathcal{L(\theta| x)} = \prod_{i=1}^Np_{\theta}(x_i)$$

When the random variable is discrete, the likelihood of a parameter coincides of with the probability of observing the sample under that distribution:

$$P_{\theta}(X_1 = x_1, X_2 = x_2,..., X_N=x_N) = \prod_{i=1}^N p_{\theta}(x_i) = \mathcal{L(\theta| x)}$$

We want to find the parameter $\theta$ which maximizes this probability!

$$ \hat{\theta}_{MLE} = \argmax_{\theta}{L(\theta | x)}$$

This approach provides a good estimator even if the variable is continuous!

:::{Note}

In practice, when the sample is big to calculate the likelihood we need to compute a product of many values, which can result in a numerical error. It is equivalent ot maximize the log-likelihood which is the sum of the likelihood at each point:

$$ \hat{\theta}_{MLE} = \argmax_{\theta}{\log L(\theta | x)} = \sum_{i=1}^N \log {p_{\theta}(x_i)}$$

Many probability distributions involve exponentials so that simplifies the formulas.
:::











## Evaluating Estimators

**MSE** - Mean Squared Error

$$ MSE(\hat \theta) = \mathbb{E}_{\theta}|\hat \theta - \theta|^2$$

$$MSE(\hat \theta) = bias(\hat\theta)^2 + Var(\hat\theta)$$

Properties of an estimator:
* **unbiased**: the expected value of the estimator is equal to the true parameter
* **consistent**: as sample size grows, the estimator converges to the true parameter
* **efficient**: has minimum variance


:::{note}
MLE is **Minimum Variance Unbiased Estimator** as sample size grows.
:::

## MLE Widget

In [12]:
import numpy as np
from scipy import stats

In [13]:
# loading steps

!wget --no-check-certificate 'https://docs.google.com/uc?export=download&confirm=t&id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq' -O background.wav

from scipy.io import wavfile

# reading background data
bg_samplerate, bg_signal = wavfile.read('background.wav')

import numpy as np

# first we split small intervals of 0.1s
bg_signal_split = np.split(bg_signal[:(len(bg_signal)-len(bg_signal)%bg_samplerate)], len(bg_signal[:(len(bg_signal)-len(bg_signal)%bg_samplerate)])/bg_samplerate*10)

# we calculate RMS for each interval
RMS_split = [np.sqrt(np.mean(np.square(group.astype('float')))) for group in bg_signal_split]

# define the r.v.
X = np.log10(RMS_split)



--2025-04-02 16:46:44--  https://docs.google.com/uc?export=download&confirm=t&id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq
Resolving docs.google.com (docs.google.com)... 142.250.69.206, 2607:f8b0:400a:803::200e
Connecting to docs.google.com (docs.google.com)|142.250.69.206|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq&export=download [following]
--2025-04-02 16:46:44--  https://drive.usercontent.google.com/download?id=1466snzjsXPVTlKnzkkCkdOgwoO5Zvutq&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.251.33.97, 2607:f8b0:400a:80a::2001
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.251.33.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5759054 (5.5M) [audio/wav]
Saving to: ‘background.wav’


2025-04-02 16:46:48 (78.9 MB/s) - ‘background.wav’ saved [5759054/5759054]



In [14]:
# sample mean and variance
mean = np.mean(X)
std = np.std(X)
print(mean)
print(std)

2.0959092368886436
0.0911751628524584


In [15]:

# gaussian likelihood
gaussian_likelihood = stats.norm.pdf(X, loc=mean, scale=std)
# L = np.prod(gaussian_likelihood)
L = np.sum(np.log(gaussian_likelihood))
print(L)


575.860191529744


:::{note} Note that even if one sample point is outside of the range of the model's probability density function, the evaluation at that point will be zero, and due to the independence assumption the whole likelihood will be zero. That in a way makes sense: if we observe a point which has probability zero for a specific model, then it means that this model has not produced this observation. In practice, there will be always outliers we may not want to fit to.
:::

In [16]:
skewnorm_likelihood = stats.skewnorm.pdf(X, a=2, scale=0.06, loc=2)
L = np.sum(np.log(skewnorm_likelihood))
print(L)

-50.10634850074372


In [17]:
def plot_skewnorm_density_L(a, scale, loc):
  h = plt.hist(np.log10(RMS_split),bins=100, density=True, alpha=0.5)

  # evaluate the function at the histogram bins
  skewnorm_density = stats.skewnorm.pdf(h[1], a=a, scale=scale, loc=loc)

  # evaluation the function at the observations
  skewnorm_likelihood = stats.skewnorm.pdf(X, a=a, scale=scale, loc=loc)
  L = np.sum(np.log(skewnorm_likelihood))

  plt.plot(h[1], skewnorm_density)
  plt.title(f"Log-Likelihood {L:.10f}")

In [18]:
from ipywidgets import interact
import ipywidgets as widgets

import matplotlib.pyplot as plt

In [19]:
shape_slider = widgets.FloatSlider(
    value=2,
    min=0.5,
    max=10.0,
    step=0.5,
    description='Shape:',
    #disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)

In [None]:
scale_slider = widgets.FloatSlider(
    value=0.06,
    min=0.01,
    max=0.1,
    step=0.01,
    description='Scale:',
    #disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)

In [21]:
loc_slider = widgets.FloatSlider(
    value=1.9,
    min=1.5,
    max=2.5,
    step=0.01,
    description='Offset:',
    # disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.1f',
)


In [None]:
out = interact(plot_skewnorm_density_L, a = shape_slider, scale = scale_slider, loc = loc_slider)

interactive(children=(FloatSlider(value=2.0, description='Shape:', max=10.0, min=0.5, readout_format='.1f', st…