## Inroduction to Probabilistic Machine Learning  

Family name:

Name:

Date: 

### The exponential distribution
Given an exponential random variable whose pdf is
$$
\text{exp\_pdf}\left(x, \lambda \right) = \frac{1}{Z(\lambda)} \exp\left(- \lambda x \right)
$$

[TODO] Derive using markdown + latex:
1. The normalization constant $ Z(\lambda)$

Para normalizar la exp_pdf deberíamos tener en cuenta que 

$$ \int_{- \infty}^{\infty} dx \ \text{exp\_pdf}(x;\lambda) = 1 $$

Y por la propia definción de la función tenemos que:

$$ \frac{1}{Z(\lambda)}\int_{0}^{\infty} dx \exp(-\lambda x) = 1 \to \frac{1}{-\lambda Z(\lambda)} \int_{0}^{\infty} dx -\lambda \exp(-\lambda x) = $$

$$ = \frac{1}{-\lambda Z(\lambda)} [\exp(-\lambda x)]_{0}^{\infty} =  \frac{1}{-\lambda Z(\lambda)} (0-1) = 1 \to Z(\lambda) = \frac{1}{\lambda}$$

Y por tanto, la pdf quedaría tal que:

$$
\text{exp\_pdf}\left(X, \lambda \right) = \lambda \exp\left(- \lambda x \right)
$$

2. The cdf

$$

\text{exp\_cdf}(X) = \int_{-\infty}^x dx' \text{exp\_pdf}(x') = \int_{-\infty}^x dx' \lambda \exp(-\lambda x') = 1 - \exp(-\lambda x)


$$

3. The inverse cdf

$$

y = 1- \exp(-\lambda x) \to \exp(-\lambda x) = y-1 \to x = \frac{1}{-\lambda} \log(y-1)

$$

Luego la inversa de la cdf será:

$$

\text{exp\_cdf}^{-1}(x) = \frac{1}{-\lambda} \log(x-1)

$$

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from from tools import exp_distribution, probabilistic_fit

Para generar números aleatorios que se distribuyan según la exponencial, el método del inverso es muy útil. La idea es la siguiente:

La función $CDF$ devuelve la probabilidad (en el intervalo $[0, 1]$) de que una variable aleatoria $X$ tome un valor concreto $x$: 

$$

\text{exp\_cdf}: \mathbb{R} \to [0,1]  \ \text{tq} \ \text{exp\_cdf}(x) = P(X\leq x)

$$

Para coneguir números aleatorios distribuidos según nuestra $\text{exp\_pdf}$, la idea a seguir es mapear, utilizando la inversa de $\text{exp\_pdf}$, $\text{exp\_cdf}^{-1}$, puntos distribuidos uniformemente en el intervalo $[0,1]$ a la $\text{exp\_pdf}$. Definimos la $\text{exp\_pdf}^{-1}$ tal que:


$$

\text{exp\_cdf}: [0,1] \to \mathbb{R}  \ \text{tq} \ \text{exp\_cdf}^{-1}(x) = \frac{1}{-\lambda} \log(y-1)

$$

Dicho con otras palabras: aplicando esta función a cada punto del intervalo original $[0,1]$, distribuido uniformemente, obtendremos un conjunto de puntos de igual tamaño, pero distribuidos de forma exponencial, en lugar de uniforme. 

In [None]:
# [TODO] Generate an iid sample of the exponential distribution using the method of the inverse 

rng = np.random.default_rng(123)
lam_true = 1.5
N = 100000

# Creamos N números aleatorios distribuidos uniformemente en el intervalo [0, 1):

X = rng.random(size=N)

# Creamos una función para aplicar a cada valor del array la cdf inversa:

vectorize_exp = np.vectorize(exp_distribution.exp_inv)

# Creamos un nuevo array para los nuevos valores. Usamos el lambda dado:

X_exp = vectorize_exp(X, lam=lam_true)

In [None]:
# [TODO] Plot the histogram of the sample and compare with pdf
n_bins = np.min(np.sqrt(N), 50)

import matplotlib.pyplot as plt

x_teorico = np.linspace(min(X_exp), max(X_exp), 100)
y_teorico = 1.5 * np.exp(-1.5 * x_teorico)

plt.figure(figsize=(10, 5))
plt.hist(X_exp, density=True, bins=30, label='Muestreo')
plt.xlabel('$X=x$')
plt.ylabel('Densidad')
plt.title('Histograma del muestreo de valores siguiendo la función de \n densidad de probabilidad $f(x) = 1.5 e^{-1.5x}$.\n')
plt.plot(x_teorico, y_teorico, color='red', lw=2, label='$f(x) = 1.5 e^{-1.5x}$')
plt.legend();

## Maximum Likelihood estimate of $\lambda$

[TODO] Derive using markdown + latex:
1. The expression of the likelihood function for the iid sample $ \mathcal{D} = \left\{X_n \right\}_{n=1}^N$
2. The value of $\lambda$ that maximizes the likelihood
3. The expression of the posterior assuming the prior for lambda 

#### Solution:
The likelihood is
$$
\mathcal{L}(\lambda; \mathcal{D}) = P\left( \mathcal{D}  \vert \lambda \right) 
= P\left( \left\{X_n \right\}_{n=1}^N  \vert \lambda \right) 
= \prod_{n=1}^N \mathrm{exp\_pdf}\left( X_n ; \lambda \right) 
= \prod_{n=1}^N \left(\lambda \exp\left( - \lambda X_n \right)\right) 
= \lambda^N \exp\left(-\lambda \sum_{n=1}^N X_n\right).
$$

The corresponding log-likelihood  is
$$
\mathcal{LL}(\lambda; \mathcal{D})
= \log \mathcal{L}(\lambda; \mathcal{D})
= N \log \lambda - \lambda \sum_{n=1}^N X_n.
$$

Taking the derivative of this likelihood with respect to $\lambda$,
$$
\frac{d\mathcal{LL}(\lambda; \mathcal{D})}{d\lambda}
= \frac{N}{\lambda} - \sum_{n=1}^N X_n.
$$

Setting the derivative equal to zero gives
$$
\frac{N}{\lambda_{ML}^*} - \sum_{n=1}^N X_n = 0 
\quad \Longrightarrow \quad
\frac{N}{\lambda_{ML}^*} = \sum_{n=1}^N X_n.
$$
Hence the maximum likelihood estimator is
$$
\lambda_{ML}^* = \frac{N}{\sum_{n=1}^N X_n}
= \frac{1}{\hat{\mu}_X},
$$
where
$$
\hat{\mu}_X = \frac{1}{N}\sum_{n=1}^N X_n.
$$

The second derivative,
$$
\frac{d^2\mathcal{LL}(\lambda; \mathcal{D})}{d\lambda^2}
= -\frac{N}{\lambda^2} < 0,
$$
is negative, which confirms that this critical point (zero derivative) is a maximum.


### Numerical maximization of the likelikood 

We shall now use a numerical method to maximize the likelihood and compare with the closed-form solution.

In [None]:
# [TODO]
lam_mle_analytic =   # Closed-form formula derived in previous cell
lam_mle_numerical =  # Numerical estimate from optimization

print(lam_mle_numerical, lam_exact_analytic)

## MAP estimate of $\lambda$
Assume the conjugate prior
$$
\lambda \sim \mathrm{Gamma}(\alpha,\beta),
$$
with density
$$
p(\lambda) = \frac{\beta^\alpha}{\Gamma(\alpha)} \lambda^{\alpha-1} e^{-\beta \lambda}.
$$

[TODO] Derive using markdown + latex:
1. the expression of the posterior assuming the Gamma prior for $\lambda$, 
2. the value of $\lambda$ that maximizes the posterior.  

### Numerical maximization of the posterior 

We shall now use a numerical method to maximize the posterior.

In [None]:
# [TODO]
alpha = 4.0
beta = 0.1
lam_map_analytic =   # Closed-form formula derived in previous cell
lam_map_numerical =  # Numerical estimate from optimization


print(lam_map_analytic, lam_map_numerical)

### Questions
[TODO] Answer these question using markdown + latex:
1. What are the values of the mean, the standandard deviation, and the mode of the prior of $\lambda$?
2. How different are the ML and MAP estimates of $\lambda$?
    * When the sample size \(N = 100\) are they very different?
    * When the sample size \(N = 1000\) are they very different?
3. What do you conclude from the answers to the previous questions?


### Summary and conclusions 
[TODO] Summary and conclusions of what I have learned using markdown + latex:
1. [TODO: as many as necessary]
2. Fit a model pdf to an iid sample using the maximum likelihood method.
3. [TODO: as many as necessary]
4.
...
