# <center>Crash course 4: Generalized linear models</center>

### <center>Alfred Galichon (NYU & Sciences Po)</center>
## <center>'math+econ+code' masterclass on optimal transport and economic applications</center>
#### <center>With python code examples</center>
© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam).

**If you reuse material from this masterclass, please cite as:**<br>
Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim

## References
* McCullagh and Nelder (1989). Generalized Linear Models (2nd ed.). Chapman and Hall/CRC.
* Friedman, Tibshirani, and Hastie (2001). The Elements of Statistical Learning. Springer.
* The Scikit-learn library www.scikit-learn.org.

    

# Generalized linear models
## Setting

* In many setting, an economic model will allow to make predictions on the
conditional mean of a dependent random variable $y$ given explanatory random
vector $x$.

* In the case of linear regression, we have
$$E\left[  y|x\right]  =x^{\top}\beta$$
however, we shall encounter situations where it will be useful to be more general.

* This leads us to *generalized linear models* (GLM), which are specified as

$$E\left[  y|x\right]  =g^{-1}\left(  x^{\top}\beta\right)$$

where $g:\mathbb{R}\rightarrow\mathbb{R}$ is an increasing and continuous
function called *link function*.

* Often we shall specify in addition $Var\left(  y|x\right)  =V\left(
g^{-1}\left(  x^{\top}\beta\right)  \right)  $.

We shall use `linear_model`from the scikit-learn library `sklearn`.

In [None]:
from sklearn import linear_model

 ## Example 1: ordinary least squares (OLS)




* In least squares (OLS), we have $$y=x^{\top}\beta+\epsilon$$
with $E\left[  \epsilon|x\right]  =0$, in which case $g\left(  z\right)  =z$.

* Additionally, assuming $E\left[  \epsilon^{2}|x\right]  =\sigma^{2}$, we
have 
$$Var\left(  y|x\right)  =\sigma^{2}.$$




### OLS regression in scikit-learn

The following example is taken from the `scikit-learn` documentation.

In [4]:
# example taken from https://scikit-learn.org/0.15/modules/linear_model.html
clf = linear_model.LinearRegression()
clf.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
clf.coef_


array([0.5, 0.5])

## Example 2: Poisson regression



* Recall a Poisson distribution with parameter $\theta\in(0,+\infty)$ has
probability mass 

$$\pi_{z|\theta}=\frac{e^{-\theta}\theta^{z}}{z!}$$

over $z\in\left\{  0,1,2,...\right\}  $. It has expectation and variance
$\theta$.

* Assume that conditional on $x$, $y$ has a Poisson distribution of
parameter $\theta=\exp\left(  x^{\top}\beta\right)  $. Then
$$ E\left[  y|x\right]  =\exp\left(  x^{\top}\beta\right)$$
so in this case $g=\ln$.

* Note that we get $$var\left(  y|x\right)  =\exp\left(  x^{\top}\beta\right)$$
which may be overrestrictive (more on this later).


## Poisson regression



* Sample log-likelihood
$$
\sum_{i}-\exp\left(  x_{i}^{\top}\beta\right)  +x_{i}^{\top}\beta y_{i}%
-\ln\left(  y_{i}!\right)
$$
and therefore, max likelihood yields the Poisson regression
$$
\max_{\beta}\left\{  \sum_{i}-\exp\left(  x_{i}^{\top}\beta\right)
+x_{i}^{\top}\beta y_{i}\right\}
$$


* First order conditions give
$$
\sum_{i}\left(  y_{i}-\exp\left(  x_{i}^{\top}\beta\right)  \right)  x_{i}=0.
$$



### Poisson regression in scikit-learn

The following example is taken from the `scikit-learn` documentation.

In [13]:
# from https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PoissonRegressor.html

clf = linear_model.PoissonRegressor()
X = [[1, 2], [2, 3], [3, 4], [4, 3]]
y = [12, 17, 22, 21]
clf.fit(X, y)
print('Score       = ', clf.score(X, y))
print('Coef        = ', clf.coef_)
print('Intercept   = ', clf.intercept_)
print('Predictions =', clf.predict([[1, 1], [3, 4]]))


Score       =  0.9904855148891651
Coef        =  [0.12109212 0.15836976]
Intercept   =  2.0885914156053205
Predictions = [10.67658784 21.87505182]


## ML inference in Poisson regression (ctd)


* Recall that if $E_{P_{n}}\log p\left(  \beta,z\right)  $ is the
log-likelihood of the sample, and setting $l\left(  \beta,z\right)  =\log
p\left(  \beta,z\right)  $ we get

$$
E_{P_{n}}\left[  \partial_{\beta}l\left(  \beta_{n},z\right)  \right]   
=0\\
$$

$$
E_{P}\left[  \partial_{\beta}l\left(  \beta,z\right)  \right]     =0
$$

thus
$E_{P}\left[  \partial_{\beta}l\left(  \beta_{n},z\right)  \right]
-E_{P}\left[  \partial_{\beta}l\left(  \beta,z\right)  \right]  =E_{P}\left[
\partial_{\beta}l\left(  \beta_{n},z\right)  \right]  -E_{P_{n}}\left[
\partial_{\beta}l\left(  \beta_{n},z\right)  \right]
$
therefore

$$
\left(  \beta_{n}-\beta\right)  E_{P}\left[  \partial_{\beta}^{2}l\left(
\beta_{n},z\right)  \right]  =-\frac{1}{\sqrt{n}}g_{n}\left(  \partial_{\beta
}l\left(  \beta,z\right)  \right)
$$
where $g_{n}f=\sqrt{n}\left(  E_{P_{n}}f-E_{P}f\right)$.

* Thus
$$
\beta_{n}-\beta=-\frac{1}{\sqrt{n}}\left(  E_{P}\left[  \partial_{\beta}%
^{2}l\left(  \beta,z\right)  \right]  \right)  ^{-1}g_{n}\left(
\partial_{\beta}l\left(  \beta,z\right)  \right)
$$



* Hence
$$
V \left(\beta_{n}-\beta\right)   =\frac{1}{n}\left(  E_{P}\left[
\partial_{\beta}^{2}l\left(  \beta,z\right)  \right]  \right)  ^{-1}  \times E_{P}\left(  \partial_{\beta}l\left(  \beta,z\right)  \left(
\partial_{\beta}l\left(  \beta,z\right)  \right)  ^{\top}\right)  \times\left(  E_{P}\left[  \partial_{\beta}^{2}l\left(  \beta,z\right)
\right]  \right)  ^{-1}
$$


* And because at the ML parameter
$$
E_{P}\left(  \partial_{\beta}l\left(  \beta,z\right)  \left(  \partial_{\beta
}l\left(  \beta,z\right)  \right)  ^{\top}\right)  =E_{P}\left[
\partial_{\beta}^{2}l\left(  \beta,z\right)  \right]  ,
$$
we have thus
$$
V\left(  \beta_{n}-\beta\right)  =\frac{1}{n}\left(  E_{P}\left[
\partial_{\beta}^{2}l\left(  \beta,z\right)  \right]  \right)  ^{-1}.
$$




## Estimation of GLM



* Actually, we don't need to assume that $y\sim Poisson\left(
\exp(x^{\top}\beta)\right)  $ to estimate $\beta$.

* Consider $X$ the matrix obtained by stacking the rows $x_{i}^{\top}$ on
top of each other. Compute

$$
\max_{\beta}\left\{  y^{\top}X\beta-1^{\top}\exp\left(  X\beta\right)
\right\}
$$

and define $\overline{y}=\exp\left(  X\beta\right)  $ the predictor of $y$.
One has

$$
\sum_{i}y_{i}X_{ik}=\sum_{i}\overline{y}_{i}X_{ik}~\forall k
$$
and therefore $\beta$ is obtained by matching the predicted moments with the
observed ones

$$
\mathbb{E}\left[  y_{i}x_{i}\right]  =\mathbb{E}\left[  \overline{y}_{i}%
x_{i}\right]  .
$$


## Inference in GLM


* While the point estimate is unchanged wrt the Poisson regression, the
inference is changed as soon as one departs from the assumption that
$Var\left(  y|x\right)  =x^{\top}\beta$. Assume $Var\left(  y|x\right)
=V\left(  y|x\right)  $.

* The estimation of $\beta$ is now seen as what is called an
*M-estimation* procedure
$$
\max_{\beta}\frac{1}{n}\sum_{i=1}^{n}F\left(  z_{i},\theta\right)  .
$$


* The derivation done for MLE applies replacing $\partial_{\beta}l\left(
\beta,z_{i}\right)  =\partial_{\beta}\log p\left(  \beta,z_{i}\right)  $ by
$\partial_{\beta}l\left(  \beta,z_{i}\right)  =\left(  y_{i}-\exp\left(
x_{i}^{\top}\beta\right)  \right)  x_{i}$ with the provision that
$E_{P}\left[  \partial_{\beta}^{2}l\left(  \beta,z\right)  \right]  \neq
E_{P}\left[  \partial l\left(  \beta,z\right)  \partial l\left(
\beta,z\right)  ^{\top}\right]  $. Hence

$$
V\left(  \beta_{n}-\beta\right)   =\frac{1}{n}\left(  E_{P}\left[
\partial_{\beta}^{2}l\left(  \beta,z\right)  \right]  \right)  ^{-1} \times E_{P}\left(  \partial_{\beta}l\left(  \beta,z\right)  \left(
\partial_{\beta}l\left(  \beta,z\right)  \right)  ^{\top}\right) \times\left(  E_{P}\left[  \partial_{\beta}^{2}l\left(  \beta,z\right)
\right]  \right)  ^{-1}%
$$


* We have
$$
E_{P}\left[  \partial_{\beta}^{2}l\left(  \beta,z\right)  \right]  =E\left[
\exp\left(  x^{\top}\beta\right)  xx^{\top}\right]
$$
and

$$
E_{P}\left[  \partial_{\beta}l\left(  \beta,z\right)  \left(  \partial_{\beta
}l\left(  \beta,z\right)  \right)  ^{\top}\right]  =E\left[  \left(
y-\exp\left(  x^{\top}\beta\right)  \right)  ^{2}xx^{\top}\right] \\  =E\left[  V\left(  y|x\right)  xx^{\top}\right]  .
$$


## Poisson regression and duality


Consider $y\in\mathbb{R}_{+}^{n}$, $\beta\in R^{k}$ and $X$ a $n\times k$ matrix


> <span style="color:yellow"> **Theorem (Poisson duality)**. The primal problem
$$
\max_{\beta}\left\{  y^{\top}X\beta-1^{\top}\exp\left(  X\beta\right)
\right\}
$$
has dual
$$
\min_{\bar{y}\in\mathbb{R}_{+}^{n}}  \bar{y}^{\top}\left(  \ln\bar
{y}-1\right) \\
s.t.   X^{\top}\left(  z-\bar{y}\right)  =0.
$$</span>


**Proof**. Start from the latter expression and write the Lagrangian for
the problem 

$$
\min_{\bar{y}\geq0}\max_{\beta}\bar{y}^{\top}\left(  \ln\bar{y}-1\right)
-\left(  \bar{y}-y\right)  ^{\top}X\beta =\max_{\beta}y^{\top}X\beta+\min_{\bar{y}\geq0}\left\{  \bar{y}^{\top
}\left(  \ln\bar{y}-1\right)  -\bar{y}^{\top}X\beta\right\}
$$

has $\ln\bar{y}=X\beta$ and $\bar{y}^{\top}\left(  \ln\bar{y}-1\right)
-\bar{y}^{\top}X\beta=-\bar{y}^{\top}1=-1^{\top}\exp\left(  X\beta\right)  $
and hence this is

$$
\max_{\beta}y^{\top}X\beta-1^{\top}\exp\left(  X\beta\right)  .
$$

# Discrete choice models

## Multinomial logit model and logistic regression


* Consider the logit model
$$
\sum_{k}\Phi_{ij}^{k}\lambda_{k}+\varepsilon_{ij}$$
where $\varepsilon_{ij}$ are iid Gumbel distributions, i.e. of c.d.f.
$\exp\left(  -\exp\left(  -x\right)  \right)  $.

* The conditional probability that $i$ chooses $j$ is
$$
\pi_{ij}=\frac{\exp\left(  \sum_{k}\Phi_{ij}^{k}\lambda_{k}\right)  }{\sum
_{j}\exp\left(  \sum_{k}\Phi_{ij}^{k}\lambda_{k}\right)  }
    $$
and therefore the conditional likelihood associated with $j$ is the logistic
regression
$$
l_{ij}\left(  \lambda\right)  =\log\pi_{ij}=\sum_{k}\Phi_{ij}^{k}\lambda
_{k}-\log\sum_{j}\exp\left(  \sum_{k}\Phi_{ij}^{k}\lambda_{k}\right)
$$


* As a result, if $J\left(  i\right)  $ is the actual choice of $i$, and
$\hat{\pi}_{ij}=1\left\{  j=J\left(  i\right)  \right\}  $, the logistic
regression can be expressed as

$$
l\left(  \lambda\right)  =\hat{\pi}^{\top}\Phi\lambda-\sum_{i}\log\sum_{j}%
\exp\left(  \left(  \Phi\lambda\right)  _{ij}\right)
$$


* This is *almost*, but *not quite* the form of a GLM $-$ notice
the $\log$. To make the precise connection with GLM/Poisson regression, we
need to introduce *individual fixed effects*.



## Logistic regresssion as a GLM


* Introduce a fixed effect $u_{i}$ and let $\beta=\left(  \lambda^{\top
},u^{\top}\right)  ^{\top}$. We rewrite $\left(  \lambda,u\right)
\rightarrow\left(  \left(  \Phi\lambda\right)  _{ij}-u_{i}\right)  _{ij}$ in a
matrix form by defining

$$
X=%
\begin{pmatrix}
\Phi, -I_{n}\otimes 1_{n}%
\end{pmatrix}
$$
where $\otimes$ is the Kronecker product and we have
$$
X\beta=vec\left(  \left(  \left(  \Phi\lambda\right)  _{ij}-u_{i}\right)
_{ij}\right)  .
$$


* The Poisson regression of $\hat{\pi}_{ij}$ on $X$ yields
$$
\max_{\lambda,u}\left\{  -\sum_{ij}\exp\left(  \left(  \Phi\lambda\right)
_{ij}-u_{i}\right)  +\sum_{ij}\hat{\pi}_{ij}\left(  \left(  \Phi
\lambda\right)  _{ij}-u_{i}\right)  \right\}
$$
therefore
$$
\max_{\lambda,u}\left\{  -\sum_{ij}\exp\left(  \left(  \Phi\lambda\right)
_{ij}-u_{i}\right)  +\sum_{ij}\hat{\pi}_{ij}\left(  \Phi\lambda\right)
_{ij}-\sum_{i}u_{i}\right\}  .
$$

* Taking first order conditions in $u_{i}$ we get
$$
\sum_{j}\exp\left(  \left(  \Phi\lambda\right)  _{ij}-u_{i}\right)  =1
$$


* Therefore, $u_{i}=\log\sum_{j}\exp\left(  \left(  \Phi\lambda\right)
_{ij}\right)  $ and the problem becomes the MLE in the multinomial logit
model
$$
\max_{\lambda,u}\left\{  \sum_{ij}\hat{\pi}_{ij}\left(  \Phi\lambda\right)
_{ij}-\sum_{i}\log\sum_{j}\exp\left(  \left(  \Phi\lambda\right)
_{ij}\right)  \right\}  .
$$


* To summarize: 
> <span style="color:yellow">**Logistic regression = GLM + fixed effect**.</span>

## Discrete choice application 

See Jupyter notebook.

# Trade models

## Gravity equation


* The gravity models seeks to explain the trade flows $\hat{\pi}_{ij}$
from country $i$ to country $j$ by using various measures of proximity between
these countries. (We assume $\hat{\pi}_{ii}=0$.)

* We denote
$$
\left\{
\begin{array} \\
p_{i}=\sum_{j}\hat{\pi}_{ij}\\
q_{j}=\sum_{i}\hat{\pi}_{ij}
\end{array}
\right.
$$

the total volume of the exports of country $i$ and of the imports of country
$j$, respectively.

* We have the accounting equation
$$
\sum_{i}p_{i}=\sum_{ij}\hat{\pi}_{ij}=\sum_{j}q_{j}%
$$
and (by simply rescaling) we can without loss of generality assume that these
quantities sum to one.

* The *gravity model* assumes
$$
E\left[  \hat{\pi}_{ij}|\Phi\right]  =\exp\left(  \left(  \Phi\lambda\right)
_{ij}-u_{i}-v_{j}\right)
$$

where $u_{i}$ and $v_{j}$ are resistance terms, or country-specific fixed
effects. This is a GLM with two-way fixed effects. Need to rewrite $\left(
\lambda,u,v\right)  \rightarrow\left(  \left(  \Phi\lambda\right)  _{ij}%
-u_{i}-v_{j}\right)  _{ij}$ in a matrix form, again using vectorization and
Kronecker products.

* Hence:
> <span style="color:yellow">**Gravity equation = GLM + 2-ways fixed effect** </span>



## Fixed effects and Kronecker products


* Set up
$$
X=%
\begin{pmatrix}
\Phi & -1_{n}\otimes I_{n} & -I_{n}\otimes1_{n}%
\end{pmatrix}
$$


* Taking parameter $\beta=\left(  \lambda^{\top},u^{\top},v^{\top}\right)
^{\top}$, we have
$$
X\beta=vec\left(  \left(  \left(  \Phi\lambda\right)  _{ij}-u_{i}%
-v_{j}\right)  _{ij}\right)  .
$$


* Therefore rewrite our regression with $y_{ij}=\hat{\pi}_{ij}$, and
consider the Poisson regression

$$
\max_{\beta}\left\{  y^{\top}X\beta-1^{\top}\exp\left(  X\beta\right)
\right\}
$$
which becomes
$$
\max_{\lambda,u,v}\left\{  \sum_{ij}\hat{\pi}_{ij}\left(  \left(  \Phi
\lambda\right)  _{ij}-u_{i}-v_{j}\right)  -\sum_{ij}\exp\left(  \left(
\Phi\lambda\right)  _{ij}-u_{i}-v_{j}\right)  \right\}
$$


## Gravity as max-entropy


* By the GLM duality theorem, the dual to this program is
$$ \min_{\pi_{ij}\geq0}\sum_{ij}\pi_{ij}\ln\pi_{ij}-\sum_{ij}\pi_{ij}\\
s.t.~ \sum_{j}\pi_{ij}=p_{i},~\sum_{i}\pi_{ij}=q_{j}\\
 \sum_{ij}\pi_{ij}\Phi_{ij}^{k}=\sum_{ij}\hat{\pi}_{ij}\Phi_{ij}^{k}
$$

* But as $\sum_{ij}\pi_{ij}=1$, we interpret the previous program as
looking among the $\pi_{ij}$ that has the same margins and moments as
$\hat{\pi}$, the one that maximizes entropy $-\sum_{ij}\pi_{ij}\ln\pi_{ij}$.
Rewrite as

$$
\max_{\pi_{ij}\geq0}\left\{  -\sum_{ij}\pi_{ij}\ln\pi_{ij}\right\} \\
s.t.~  \sum_{j}\pi_{ij}=p_{i},~\sum_{i}\pi_{ij}=q_{j}\\
\sum_{ij}\pi_{ij}\Phi_{ij}^{k}=\sum_{ij}\hat{\pi}_{ij}\Phi_{ij}^{k}%
$$


## Trade application

See Jupyter notebook. XXXX

# Matching models


* Becker (1973) describes the following model of the labor market, the
marriage market, and other matching markets. Consider a population with a
share $p_{i}$ men of type $i$ and a share $q_{j}$ of women of type $j$,
assuming that men and women come in equal numbers. Assume that if $i$ and $j$
match, this generates a joint surplus (sum of their utilities) $\Phi_{ij}$.


* Let $\pi_{ij}$ be the fraction of couples $ij$ that are formed at
equilibrium. Becker shows that the equilibrium maximizes the total surplus
$\sum_{ij}\pi_{ij}\Phi_{ij}$ out of all the feasible matchings, which are
those with
$$
\sum_{j}\pi_{ij}=p_{i}\text{ and }\sum_{i}\pi_{ij}=q_{j}.
$$

* Therefore, the equilibrium matching $\pi_{ij}$ should solve
$$
\max_{\pi_{ij}\geq0} \sum_{ij}\pi_{ij}\Phi_{ij}\\
s.t.~  \sum_{j}\pi_{ij}=p_{i}\text{ and }\sum_{i}\pi_{ij}=q_{j}.
$$

* Choo and Siow (2006) and Dupuy and Galichon (2015) consider a variant
of this model with entropic regularization

$$
\max_{\pi_{ij}\geq0} \sum_{ij}\pi_{ij}\Phi_{ij}-\sigma\sum_{ij}\pi_{ij}%
\ln\pi_{ij}\\
s.t.~  \sum_{j}\pi_{ij}=p_{i}\text{ and }\sum_{i}\pi_{ij}=q_{j}.
$$


* We shall see that we can parametrically estimate $\Phi$ in this model by
the same tools as for the gravity equation.

## Back to gravity equation


* Consider the previous program
$$ \max_{\pi_{ij}\geq0}\left\{  -\sum_{ij}\pi_{ij}\ln\pi_{ij}\right\} \\
s.t.~ \sum_{j}\pi_{ij}=p_{i},~\sum_{i}\pi_{ij}=q_{j}\\
 \sum_{ij}\pi_{ij}\Phi_{ij}^{k}=\sum_{ij}\hat{\pi}_{ij}\Phi_{ij}^{k}%
$$

and rewrite as
$$
\max_{\pi_{ij}\geq0}\left\{  -\sum_{ij}\pi_{ij}\ln\pi_{ij}+\min_{\left(
\lambda_{k}\right)  }\left\{  \sum_{ijk}\left(  \pi_{ij}-\hat{\pi}%
_{ij}\right)  \Phi_{ij}^{k}\lambda_{k}\right\}  \right\} \\
s.t.~ \sum_{j}\pi_{ij}=p_{i},~\sum_{i}\pi_{ij}=q_{j}%
$$

* By the strong duality theorem, this is
$$
\min_{\left(  \lambda_{k}\right)  }\left\{  W\left(  \beta\right)  -\sum
_{ijk}\hat{\pi}_{ij}\Phi_{ij}^{k}\lambda_{k}\right\}
$$
where we recover
$$
W\left(  \beta\right)  =\max_{\pi_{ij}\geq0} \left\{  \sum_{ijk}\pi
_{ij}\Phi_{ij}^{k}\lambda_{k}-\sum_{ij}\pi_{ij}\ln\pi_{ij}\right\} \\
s.t.~ \sum_{j}\pi_{ij}=p_{i},~\sum_{i}\pi_{ij}=q_{j}%
$$
which is the matching surplus.



## Matching application

* See Jupyter notebook. XXXXXXXX

# Dynamic models


## Rust's model of dynamic discrete choice


* In a dynamic discrete choice model (Rust 1987), the decision-maker in a
state $x$ chooses $j$ based on the short-term payoff $\phi_{xj}+\varepsilon
_{xj}$, where $\varepsilon_{xj}$ is Gumbel, but also on the expected value of
being in a different state $x^{\prime}$ at the next period.

* The probability of being in state $x^{\prime}$ conditional on being in
state $x$ and having chosen $j$ is $P_{x^{\prime}|xj}$.


* In a stationary equilibrium, one has
$$
\left\{
\begin{array} \\
\pi_{j|x}=\exp\left(  \left(  \Phi\lambda\right)  _{xj}+\beta\sum_{x^{\prime}%
}P_{x^{\prime}|xj}u_{x^{\prime}}-u_{x}\right) \\
u_{x}=\log\sum_{j}\exp\left(  \left(  \Phi\lambda\right)  _{xj}+\beta
\sum_{x^{\prime}}P_{x^{\prime}|xj}u_{x^{\prime}}\right)
\end{array}
\right.
$$


* The MLE is therefore
    
$$
\max_{\beta,u}  \sum_{i}\left(  \Phi\lambda\right)  _{x_{i}J_{i}}+\beta
\sum_{x^{\prime}}P_{x^{\prime}|x_{i}J_{i}}u_{x^{\prime}}-u_{x_{i}}\\
s.t.~   u_{x}=\log\sum_{j}\exp\left(  \left(  \Phi\lambda\right)
_{xj}+\beta\sum_{x^{\prime}}P_{x^{\prime}|xj}u_{x^{\prime}}\right)
$$

this is Rust's "nested fixed point" algorithm: the "inner loop" looks for a
fixed point $u_{x}$, while the "outer loop" seeks to maximize the likelihood.
