Update MLE Lecture with More Details (#72)

HumphreyYang · web-flow · commit 1e3cc3bba264 · 2023-05-31T11:59:03.000+10:00
* update mle lecture with more details

* remove capitalization

* revert user expressions
diff --git a/lectures/mle.md b/lectures/mle.md
@@ -11,7 +11,6 @@ kernelspec:
   name: python3
 ---
 
-
 # Maximum Likelihood Estimation
 
 ```{contents} Contents
@@ -47,15 +46,13 @@ Let's check the GPU we are running
 !nvidia-smi
 ```
 
-
 We will use 64 bit floats with JAX in order to increase the precision.
 
 ```{code-cell} ipython3
 jax.config.update("jax_enable_x64", True)
 ```
 
-
-## MLE with Numerical Methods (JAX)
+## MLE with numerical methods (JAX)
 
 Many distributions do not have nice, analytical solutions and therefore require
 numerical methods to solve for parameter estimates.
@@ -81,7 +78,6 @@ def logL(β):
     return -(β - 10) ** 2 - 10
 ```
 
-
 To find the value of $\frac{d \log \mathcal{L(\boldsymbol{\beta})}}{d \boldsymbol{\beta}}$, we can use [jax.grad](https://jax.readthedocs.io/en/latest/_autosummary/jax.grad.html) which auto-differentiates the given function.
 
 We further use [jax.vmap](https://jax.readthedocs.io/en/latest/_autosummary/jax.vmap.html) which vectorizes the given function i.e. the function acting upon scalar inputs can now be used with vector inputs.
@@ -113,7 +109,6 @@ plt.axhline(c='black')
 plt.show()
 ```
 
-
 The plot shows that the maximum likelihood value (the top plot) occurs
 when $\frac{d \log \mathcal{L(\boldsymbol{\beta})}}{d \boldsymbol{\beta}} = 0$ (the bottom
 plot).
@@ -129,14 +124,29 @@ The Newton-Raphson algorithm finds a point where the first derivative is
 
 To use the algorithm, we take an initial guess at the maximum value,
 $\beta_0$ (the OLS parameter estimates might be a reasonable
-guess), then
+guess).
 
+Then we use the updating rule involving gradient information to iterate the algorithm until the error is sufficiently small or the algorithm reaches the maximum number of iterations.
 
 Please refer to [this section](https://python.quantecon.org/mle.html#mle-with-numerical-methods) for the detailed algorithm.
 
-Let's have a go at implementing the Newton-Raphson algorithm.
+Let's have a go at implementing the Newton-Raphson algorithm to calculate the maximum likelihood estimations of a Poisson  regression.
 
-First, we'll create a `PoissonRegressionModel`.
+The Poisson regression has a joint pmf:
+
+$$
+f(y_1, y_2, \ldots, y_n \mid \mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n; \boldsymbol{\beta})
+    = \prod_{i=1}^{n} \frac{\mu_i^{y_i}}{y_i!} e^{-\mu_i}
+
+$$
+
+$$
+\text{where}\ \mu_i
+     = \exp(\mathbf{x}_i' \boldsymbol{\beta})
+     = \exp(\beta_0 + \beta_1 x_{i1} + \ldots + \beta_k x_{ik})
+$$
+     
+We create a `namedtuple` to store the observed values
 
 ```{code-cell} ipython3
 PoissonRegressionModel = namedtuple('PoissonRegressionModel', ['X', 'y'])
@@ -149,8 +159,18 @@ def create_poisson_model(X, y):
     return PoissonRegressionModel(X=X, y=y)
 ```
 
+The log likelihood function of the Poisson regression is
+
+$$
+\underset{\beta}{\max} \Big(
+\sum_{i=1}^{n} y_i \log{\mu_i} -
+\sum_{i=1}^{n} \mu_i -
+\sum_{i=1}^{n} \log y! \Big)
+$$
+
+The full derivation can be found [here](https://python.quantecon.org/mle.html#id2).
 
-At present, JAX doesn't have an implementation to compute factorial directly.
+The log likelihood function involves factorial, but JAX doesn't have a readily available implementation to compute factorial directly.
 
 In order to compute the factorial efficiently such that we can JIT it, we use
 
@@ -162,6 +182,8 @@ since [jax.lax.lgamma](https://jax.readthedocs.io/en/latest/_autosummary/jax.lax
 
 The following function `jax_factorial` computes the factorial using this idea.
 
+Let's define this function in Python
+
 ```{code-cell} ipython3
 @jax.jit
 def _factorial(n):
@@ -171,7 +193,7 @@ jax_factorial = jax.vmap(_factorial)
 ```
 
 
-Let's define the Poisson Regression's log likelihood function.
+Now we can define the log likelihood function in Python
 
 ```{code-cell} ipython3
 @jax.jit
@@ -198,7 +220,6 @@ G_poisson_logL = jax.grad(poisson_logL)
 H_poisson_logL = jax.jacfwd(G_poisson_logL)
 ```
 
-
 Our function `newton_raphson` will take a `PoissonRegressionModel` object
 that has an initial guess of the parameter vector $\boldsymbol{\beta}_0$.
 
@@ -239,7 +260,6 @@ def newton_raphson(model, β, tol=1e-3, max_iter=100, display=True):
     return β
 ```
 
-
 Let's try out our algorithm with a small dataset of 5 observations and 3
 variables in $\mathbf{X}$.
 
@@ -262,7 +282,6 @@ poi = create_poisson_model(X, y)
 β_hat = newton_raphson(poi, init_β, display=True)
 ```
 
-
 As this was a simple model with few observations, the algorithm achieved
 convergence in only 7 iterations.
 
@@ -272,7 +291,6 @@ The gradient vector should be close to 0 at $\hat{\boldsymbol{\beta}}$
 G_poisson_logL(β_hat, poi)
 ```
 
-
 ## MLE with `statsmodels`
 
 We’ll use the Poisson regression model in `statsmodels` to verify the results