# <center>Lecture 2: inverse optimal transport</center>
### <center>Alfred Galichon (NYU & Sciences Po)</center>
## <center>14th European Summer School in Financial Mathematics</center>
<center>© 2021 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274, and contributions by Jules Baudet, Pauline Corblet, Gregory Dannay, Julie Lenoir, and James Nesbit are acknowledged.</center>

A large part of this material is taken from the math+econ+code lecture series:<br>
https://www.math-econ-code.org/

# Part 1: structural estimation of matching models

## Learning objectives

* Matching with unobserved heterogeneities

* Estimation of matching models

## References

**[B]** Becker (1973). 'A Theory of Marriage: Part 1.' *Journal of Political Economy*.

**[COQ]** Chiappori, Oreffice and Quintana-Domeque (2012). 'Fatter Attraction: Anthropometric and Socioeconomic Matching on the Marriage Market'. *Journal of Political Economy*.

**[CS]** Choo and Siow (2006). 'Who Marries Whom and Why'. *Journal of Political Economy*.

**[CSW]** Chiappori, Salanié, and Weiss (2017). 'Partner Choice and the Marital College Premium'. * American Economic Review*.

**[DG]** Dupuy and Galichon (2014). 'Personality traits and the marriage market'. *Journal of Political Economy*.

**[GS]** Galichon and Salanié (2020). 'Cupid's Invisible Hand: Social Surplus and Identification in Matching Models'. Preprint (first version 2011).




# Motivation: models of matching since Gary Becker

* In the footsteps of Becker, empirical studies on the marriage market had long been focused on one-dimensional models, which assumes that a single index is enough to capture the interactions on the marriage market, and positive assortative matching (PAM), which predicts that the matching equilibrium will tend to match the agents with higher indices with each other.

* However, it is desirable to move beyond PAM:
    * PAM is always loosely true, never precisely
    * there are often many observed characteristics, and it is not always the case that the sorting can be captured by a single-dimensional model
    * PAM is a theoretical prediction stemming from assumptions of supermodularity of the surplus function which do not necessarly hold
    * optimal transport provide tools to study multidimensional models


* However, any model of matching based on (unregularized) optimal transport will not be exploitable because it will generate far too strong predictions, namely that some matchings will never hold. This is rather counterfactual: in the data, one observes virtually any combination of type.

* Hence, need to regularize the matching model, and we shall do so by introducing unobserved heterogeneity. The model so obtained will be exploitable for estimation and identification purposes. The first such model (with transfers) is the model by [CS]. We shall see a generalization of this model by [GS] (2015).


## Loading our libraries

We start with loading the libraries we will need. They are rather standard.

In [1]:
import pandas as pd
import numpy as np
import time

from scipy import optimize
# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here
import gurobipy as grb

## A look at our data

* Our data are Choo and Siow's original data. Choo and Siow wanted to study the impact of the legalization of abortion by the Roe vs. Wade decision by the supreme court on the 'value of marriage'. Roe vs. Wade decreased the role of marriage in covering out-of-the-wedlocks pregnancies ('shotgun weddings').

* The decision did, however, not make a change uniformly in the United
States as a number of states had already legalized abortion (reform states).
Choo and Siow thus offer a diffs-in-diffs approach in order to compute the
change in the value of marriage.

* Choo and Siow's data are thus made of the marriages between men and women in reformed states (R) vs nonreformed states (NR), in 1972 and in 1982. One should expect to see a higher drop in marriage value in NR states.

In [2]:
thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/marriage-ChooSiow/'

n_singles = pd.read_csv(thepath+'n_singles.txt', sep='\t', header = None)
marr = pd.read_csv(thepath+'marr.txt', sep='\t', header = None)
navail = pd.read_csv(thepath+'n_avail.txt', sep='\t', header = None)

The data used by Choo and Siow is census data on marriages between age categories, from age 16 (row/column 0) to age 75 (row/age 59). It is thus 60x60 tables:

In [3]:
marr.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
0,22704,10954,3932,1550,672,414,190,114,78,64,...,0,0,0,0,0,0,0,0,0,0
1,40266,38980,16368,5714,2116,1101,691,427,260,127,...,0,0,0,0,0,0,0,0,0,0
2,39219,49753,40315,18001,5401,2291,1429,856,611,338,...,0,0,0,0,0,0,0,0,0,0
3,29500,44788,47212,39429,16730,6270,3211,1910,762,438,...,0,0,0,0,0,0,0,0,0,0
4,18952,32860,40103,43488,36265,15318,6076,2872,1552,1246,...,0,0,0,0,0,0,0,0,0,0


The data also includes the number of single individuals per age category:

In [4]:
n_singles.transpose().head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
0,1010132,907226,772448,597919,454792,341370,315797,253618,171614,152228,...,68383,69177,61358,70752,69112,60909,60749,65302,62922,61117
1,790793,671332,602900,492020,404882,313755,255414,195418,141283,128787,...,197456,201183,191519,218031,219157,199926,200052,202146,210315,202775


# Building the model

The analysis here follows [GS], who build on the logit model by [CS]. 

* Consider a heterosexual marriage matching market. The set of types (observable characteristics) is $\mathcal{X}$ for men, and $\mathcal{Y}$ for women. There are $n_{x}$ men of type $x$, and $m_{y}$ women of type $y$.

* Assume that if a man $i\in\mathcal{I}$ of type $x_{i}$ and a woman $j\in\mathcal{J}$ of type $y_{j}$ match, they get respective utilities \begin{align*} &  \alpha_{x_{i}y_{j}}+w_{ij}+\varepsilon_{iy_{j}}\\ &  \gamma_{x_{i}y_{j}}-w_{ij}+\eta_{x_{i}j} \end{align*} where $w_{ij}$ is the transfer from $i$ to $j$. If they remain single $i$ and $j$ get respectively $\varepsilon_{i0}$ and $\eta_{0j}$.

* The random utility vectors $\left(  \varepsilon_{y}\right)  $ and $\left(  \eta_{x}\right)  $ are drawn from probability distributions $\mathbf{P}_{x}$ and $\mathbf{Q}_{y}$, respectively. In the sequel we shall work with a finite number of agents of each type, and then we'll investigate the limit of these results.

* The fact that the preferences for the other side of the market terms $\varepsilon_{iy_{j}}$ and $\eta_{x_{i}}$ do not vary within observed types is a very important implicit assumption called **separability**.  


## Optimal matching


The matching surplus between $i$ and $j$ is therefore $$\tilde{\Phi}_{ij}=\Phi_{x_{i}y_{j}}+\varepsilon_{iy_{j}}+\eta_{x_{i}j}$$ where $\Phi_{xy}=\alpha_{xy}+\gamma_{xy}$. The value of optimal matching is thus, under its dual form, \begin{align*}\min_{u_{i},v_{j}}  &  \sum_{i\in\mathcal{I}}u_{i}+\sum_{j\in\mathcal{J}}  v_{j}\\ s.t.~  &  u_{i}+v_{j}\geq\Phi_{x_{i}y_{j}}+\varepsilon_{iy_{j}}+\eta_{x_{i}j}\\ &  u_{i}\geq\varepsilon_{i0}\\ &  v_{j}\geq\eta_{j0} \end{align*}

Written like this, the lp has $\left\vert \mathcal{I}\right\vert +\left\vert \mathcal{J}\right\vert $ variables and $\left\vert \mathcal{I} \right\vert \times\left\vert \mathcal{J}\right\vert +\left\vert \mathcal{I} \right\vert +\left\vert \mathcal{J}\right\vert $ constraints. Assuming that there are $K$ individuals per type for each type, this is $K\left(  \left\vert \mathcal{X}\right\vert +\left\vert \mathcal{Y}\right\vert \right)  $ variables and $K^{2}\left(  \left\vert \mathcal{X}\right\vert \times\left\vert \mathcal{Y}\right\vert \right)  +K\left(  \left\vert \mathcal{X}\right\vert +\left\vert \mathcal{Y}\right\vert \right)  $ constraints.

The number of constraints is **quadratic** with respect to $K$. Fortunately, a little thinking about the implications of separability will help us reduce this complexity.


## A property of equilibrium

We have:

---

**Lemma**. Consider the set $\mathcal{I}_{xy}$ of men of type $x$ matched to women of type $y$ at equilibrium. If $\mathcal{I}_{xy}$ is nonempty, then $u_{i}-\varepsilon_{iy}$ is a constant across $\mathcal{I}_{xy}$.

---

**Proof**. For $i\in\mathcal{I}$ such that $x_{i}=x$,
\begin{align*}
u_{i}  &  =\max_{j\in\mathcal{J}}\left\{  \tilde{\Phi}_{ij}-v_{j}%
,\varepsilon_{i0}\right\} \\
&  =\max_{y\in\mathcal{Y}}\left\{  U_{xy}+\varepsilon_{iy},\varepsilon
_{i0}\right\}
\end{align*}
where $U_{xy}=\max_{j:y_{j}=y}\left\{  \Phi_{xy}+\eta_{x_{i}j}-v_{j}\right\}
$, thus $u_{i}\geq U_{xy}+\varepsilon_{iy}$ with equality on $\mathcal{I}%
_{xy}$. With similar notations, $v_{j}\geq V_{xy}+\eta_{xj}$ with equality on
$\mathcal{J}_{xy}$. As a result, if $\mathcal{I}_{xy}$ is nonempty, then
$U_{xy}+V_{xy}=\Phi_{xy}$ and $\forall i\in\mathcal{I}_{xy},$ $u_{i}%
=U_{xy}+\varepsilon_{iy}$.



## A simplification

In the sequel, we shall see that *adding* an auxiliary variable to
the previous lp will lead to *decreasing* the computational complexity of
the problem.

Observe that the first set of constraints is reexpressed by saying that,
for every $x\in\mathcal{X}$, $y\in\mathcal{Y}$,
$$
\min_{i:x_{i}=x}\left\{  u_{i}-\varepsilon_{iy}\right\}  +\min_{j:y_{j}
=y}\left\{  v_{j}-\eta_{xj}\right\}  \geq\Phi_{xy}.
$$


Hence, letting $U_{xy}=\min_{i:x_{i}=x}\left\{  u_{i}-\varepsilon
_{iy}\right\}  $ and $V_{xy}=\min_{j:y_{j}=y}\left\{  v_{j}-\eta_{xj}\right\}
$, a solution of the previous lp should satisfy
$$
u_{i}=\max_{y\in\mathcal{Y}}\left\{  U_{xy}+\varepsilon_{iy},\varepsilon
_{i0}\right\}  \text{ and }v_{j}=\max_{x\in\mathcal{X}}\left\{  V_{xy}
+\varepsilon_{xj},\varepsilon_{0j}\right\}  .
$$



The problem rewrites as
\begin{align}
\min_{u_{i},v_{j},U_{xy},V_{xy}}  &  \sum_{i\in\mathcal{I}}u_{i}+\sum
_{j\in\mathcal{J}}v_{j}\label{simplifiedDual}\\
s.t.~  &  U_{xy}+V_{xy}\geq\Phi_{xy}~\left[  \mu_{xy}\geq0\right] \nonumber\\
&  u_{i}\geq U_{x_{i}y}+\varepsilon_{iy_{j}}~\left[  \mu_{iy}\right]
\nonumber\\
&  v_{j}\geq V_{xy_{j}}+\eta_{x_{i}j}~\left[  \mu_{xi}\right] \nonumber\\
&  u_{i}\geq\varepsilon_{i0}~\left[  \mu_{i0}\right] \nonumber\\
&  v_{j}\geq\eta_{j0}~\left[  \mu_{0x}\right] \nonumber
\end{align}

This problem has $K\left(  \left\vert \mathcal{X}\right\vert +\left\vert
\mathcal{Y}\right\vert \right)  +\left\vert \mathcal{X}\right\vert
\times\left\vert \mathcal{Y}\right\vert $ variables and $\left(  \left\vert
\mathcal{X}\right\vert \times\left\vert \mathcal{Y}\right\vert \right)
+K\left(  2\left\vert \mathcal{X}\right\vert \times\left\vert \mathcal{Y}
\right\vert +\left\vert \mathcal{X}\right\vert +\left\vert \mathcal{Y}
\right\vert \right)  $ constraints.

The number of constraint is now **linear** with respect to $K$.

## Consequences

**1. Lagrange multipliers:**
* The Lagrange multiplier $\mu_{xy}$ is interpreted as the number of matchings between types $x$ and $y$.

* The Lagrange multiplier $\mu_{iy}$ ($y\in\mathcal{Y}_{0}$) is interpreted as a 0-1 indicator that man $i$ chooses a type $y$

* The Lagrange multiplier $\mu_{xj}$ ($x\in\mathcal{X}_{0}$) is interpreted as a 0-1 indicator that woman $j$ chooses a\ type $x$

**2. Utilities:**
* Man $i$ solves a discrete choice problem $u_{i}=\max_{y\in\mathcal{Y}
}\left\{  U_{xy}+\varepsilon_{iy},\varepsilon_{i0}\right\}  $

* Woman $j$ solve a discrete choice problem $v_{j}=\max_{x\in\mathcal{X}
}\left\{  V_{xy}+\eta_{xj},\eta_{0j}\right\}  .$
\end{itemize}

* $U_{xy}$ and $V_{xy}$ are related by $U_{xy}+V_{xy}\geq\Phi_{xy}$ with
equality if $\mu_{xy}>0$.


## Large market limit

Now look at the limit of previous markets when the number of market participants gets large, holding fixed the frequency of each types.

In the large population limit $n_{x}$ and $m_{y}$ are now interpreted as the mass distribution of respective types $x$ and $y$.

We shall from now on assume that $\mathbf{P}_{x}$ and $\mathbf{Q}_{y}$, the distributions of random utility vectors $\left( \varepsilon_{y}\right)  $ and $\left(  \eta_{x}\right)  $, have a density with full support. This will ensure that the Emax operators associated with the choice problems of the men and the women respectively \begin{align*} & G_x(U_{x.}) = \mathbb{E}_\mathbf{P} \left[\max_{y\in\mathcal{Y}
}\left\{  U_{xy}+\varepsilon_{iy},\varepsilon_{i0}\right\} \right]\text{, and }\\ & H_y(V_{.y}) = \mathbb{E}_\mathbf{Q} \left[\max_{x\in\mathcal{X}
}\left\{  V_{xy}+\eta_{xj},\eta_{0j}\right\} \right],\end{align*}as well as the corresponding entropies of choice $G_x^{\ast}$ and $H_y^{\ast}$ are continuously differentiable.

Under these assumptions, the problem becomes
\begin{align*}
\min_{U,V} ~&  G\left(  U\right)  +H\left(  V\right) \\
s.t.~  &  U_{xy}+V_{xy}\geq\Phi_{xy}~\left[  \mu_{xy}\right]
\end{align*}
where
\begin{align*}
G\left(  U\right)   &  =\sum_{x\in\mathcal{X}}n_{x}\mathbb{E}_{\mathbf{P}%
}\left[  \max_{y\in\mathcal{Y}}\left\{  U_{xy}+\varepsilon_{iy},\varepsilon
_{i0}\right\}  \right] \\
H\left(  V\right)   &  =\sum_{y\in\mathcal{Y}}m_{y}\mathbb{E}_{\mathbf{Q}%
}\left[  \max_{x\in\mathcal{X}}\left\{  V_{xy}+\eta_{xj},\eta_{0j}\right\}
\right]
\end{align*}


By first order conditions,
$$
\frac{\partial G\left(  U\right)  }{\partial U_{xy}}=\mu_{xy}=\frac{\partial
H\left(  V\right)  }{\partial V_{xy}}.
$$
and $\mu_{xy}>0$ for every $x\in\mathcal{X}$ and $y\in\mathcal{Y}$.



## Social planner's problem

The primal problem corresponding the problem above is
$$
\max_{\mu_{xy}\geq0}\sum_{\substack{x\in\mathcal{X}\\y\in\mathcal{Y}}}\mu
_{xy}\Phi_{xy}-\mathcal{E}\left(  \mu\right)
$$
where
$$
\mathcal{E}\left(  \mu\right)  =G^{\ast}\left(  \mu\right)  +H^{\ast}\left(
\mu\right)
$$


Recall $G^{\ast}\left(  \mu\right)  =\max\left\{  \sum_{xy}\mu
_{xy}U_{xy}-G\left(  U\right)  \right\}  $ is the Legendre transform of $G$, and similarly for $H^{\ast}$.

## Identification of the matching surplus



---

**Theorem.** By first order conditions, we get the identifcation formula of $\Phi$%
$$
\Phi_{xy}=\frac{\partial G^{\ast}\left(  \mu\right)  }{\partial\mu_{xy}}
+\frac{\partial H^{\ast}\left(  \mu\right)  }{\partial\mu_{xy}}.
$$

---

This means that the surplus function is identified *nonparametrically* given the matching patterns $\mu$ and assuming a fixed distribution of unobserved heterogeneity.

Hence only the joint surplus $\Phi
_{xy}=\alpha_{xy}+\gamma_{xy}$ is identified. However, if the transfers
$\hat{w}_{xy}$ are observed too (e.g. wages in labour market), then
$U_{xy}=\alpha_{xy}+w_{xy}$ and $V_{xy}=\gamma_{xy}-w_{xy}$, so that $\alpha$
and $\gamma$ are separately identified by
$$
\left\{
\begin{array}
[c]{c}%
\hat{\alpha}_{xy}=\frac{\partial G^{\ast}\left(  \mu\right)  }{\partial\mu_{xy}}-\hat{w}_{xy}\\
\hat{\gamma}_{xy}=\frac{\partial H^{\ast}\left(  \mu\right)  }{\partial\mu_{xy}}+\hat{w}_{xy}%
\end{array}
\right.
$$


## Choo and Siow's logit model

In Choo and Siow's model [CS], the heterogeneities in tastes are Gubmel,
we have
$$
\mathcal{E}\left(  \mu\right)  =2\sum_{\substack{x\in\mathcal{X}%
\\y\in\mathcal{Y}}}\mu_{xy}\log\mu_{xy}+\sum_{x\in\mathcal{X}}\mu_{x0}\log
\mu_{x0}+\sum_{y\in\mathcal{Y}}\mu_{0y}\log\mu_{0y}.
$$
Note that $\mathcal{E}\left(  \mu\right)  < + \infty$ if and only if $\mu
\in\mathcal{M}\left(  n,m\right)  $.


By first order conditions above, Choo-Siow's TU-logit model implies the
following matching function:
$$
\mu_{xy}=M_{xy}\left(  \mu_{x0},\mu_{0y}\right)  :=\sqrt{\mu_{x0}}\sqrt
{\mu_{0y}}\exp\left(  \frac{\Phi_{xy}}{2}\right) 
$$

This is a gravity equation of sorts. The full link with gravity equations is explored in the next lecture. 

As a result, $\partial\mathcal{E}\left(  \mu\right)  /\partial\mu
_{xy}=2\log\mu_{xy}-\log\mu_{x0}-\log\mu_{0y}$, which implies that $\Phi_{xy}$
is estimated by *Choo and Siow's identification formula*
$$
\hat{\Phi}_{xy}=\log\frac{\hat{\mu}_{xy}^{2}}{\hat{\mu}_{x0}\hat{\mu}_{0y}}.
$$

## Solving equilibrium in the Choo-Siow model

Write down the equilibrium equations in the TU-logit model:
$$
\left\{
\begin{array}
[c]{c}
\sum_{y\in\mathcal{Y}}\sqrt{\mu_{x0}}\sqrt{\mu_{0y}}\exp\left(  \frac
{\Phi_{xy}}{2}\right)  +\mu_{x0}=n_{x}\\
\sum_{x\in\mathcal{X}}\sqrt{\mu_{x0}}\sqrt{\mu_{0y}}\exp\left(  \frac
{\Phi_{xy}}{2}\right)  +\mu_{0y}=m_{y}%
\end{array}
\right.
$$


Setting $a_{x}=\sqrt{\mu_{x0}}$, $b_{y}=\sqrt{\mu_{0y}}$, and
$K_{xy}=\exp\left(  \Phi_{xy}/2\right)  $, this rewrites as
$$
\left\{
\begin{array}
[c]{c}
\sum_{y\in\mathcal{Y}}K_{xy}a_{x}b_{y}+a_{x}^{2}=n_{x}\\
\sum_{x\in\mathcal{X}}K_{xy}a_{x}b_{y}+b_{y}^{2}=m_{y}%
\end{array}
\right.$$

which is a variant of the equations previously seen to accomodate unmatched agents.

We can easily adapt the IPFP to this setting. The IPFP will consists in iteratively solving quadratic equations:
$$
\left\{
\begin{array}
[c]{l}
a_{x}^{2t+1}=\sqrt{n_{x}+\left(  \sum_{y\in\mathcal{Y}}b_{y}^{2t}%
K_{xy}/2\right)  ^{2}}-\sum_{y\in\mathcal{Y}}b_{y}^{2t}K_{xy}/2\\
b_{y}^{2t+2}=\sqrt{m_{y}+\left(  \sum_{x\in\mathcal{X}}a_{x}^{2t+1}%
K_{xy}/2\right)  ^{2}}-\sum_{x\in\mathcal{X}}a_{x}^{2t+1}K_{xy}/2
\end{array}
\right.
$$

## Dual problem

The dual problem is given by
$$
\min_{u,v}\left\{
\begin{array}
[c]{c}%
\sum_{x}n_{x}u_{x}+\sum_{y}m_{y}v_{y}\\
+2\sum_{xy}\sqrt{n_{x}m_{y}}\exp\left(  \frac{\Phi_{xy}-u_{x}-v_{y}}{2}\right)
\\
+\sum_{x}n_{x}\exp\left(  -u_{x}\right)  +\sum_{y}m_{y}\exp\left(
-v_{y}\right)
\end{array}
\right\}
$$


**Remarks.**

* This problem is an unconstrained convex optimization problem, so this formulation will be quite useful.

* If $\left(  u,v\right)  $ is solution, $u_{x}=-\log\mu_{0|x}=-\log\left(  \mu_{x0}/n_{x}\right)  $ and $v_{y}=-\log\left(  \mu
_{0|y}\right)  $.

* Note that the IPFP algorithm just seen interprets as (blockwise) *coordinate descent* method in the dual problem.


### Another application: estimation of affinity matrix

Dupuy and G (2014) focus on cross-dimensional interactions

\begin{align*}
\phi_{xy}^{A}=\sum_{p,q}A_{pq}\xi_{x}^{p}\xi_{y}^{q}
\end{align*}

and estimate "affinity matrix" $A$ on a dataset of married individuals where the "big 5" personality traits are measured.

$A$ is estimated by

\begin{align*}
\min_{s_{i},m_{n}}\min_{A}\left\{
\begin{array}
[c]{c}%
\sum_{x}p_{x}u_{x}+\sum_{y}q_{y}v_{y}\\
+\sum_{xy}\exp\left(  \sum_{p,q}A_{pq}\xi_{x}^{p}\xi_{y}^{q}-u_{x}%
-v_{y}\right) \\
-\sum_{x,y,p,q}\hat{\pi}_{xy}A_{pq}\xi_{x}^{p}\xi_{y}^{q}%
\end{array}
\right\}  .
\end{align*}

Dupuy, Galichon and Sun (2016) consider the case when the space of characteristics is high-dimensional. More on this soon.

### Estimation of affinity matrix: results

|  Husbands   \ Wives             | Education | Height | BMI   | Health | Consc. | Extra. | Agree | Emotio | Auto. | Risk  |
|-------------------|-----------|--------|-------|--------|--------|--------|-------|--------|-------|-------|
| Education         | 0.46      | 0      | -0.06 | 0.01   | -0.02  | 0.03   | -0.01 | -0.03  | 0.04  | 0.01  |
| Height            | 0.04      | 0.21   | 0.04  | 0.03   | -0.06  | 0.03   | 0.02  | 0      | -0.01 | 0.02  |
| BMI               | -0.03     | 0.03   | 0.21  | 0.01   | 0.03   | 0      | -0.05 | 0.02   | 0.01  | -0.02 |
| Health            | -0.02     | 0.02   | -0.04 | 0.17   | -0.04  | 0.02   | -0.01 | 0.01   | 0     | 0.03  |
| Conscienciousness | -0.07     | -0.01  | 0.07  | 0      | 0.16   | 0.05   | 0.04  | 0.06   | 0.01  | 0.01  |
| Extraversion      | 0         | -0.01  | 0     | 0.01   | -0.06  | 0.08   | -0.04 | -0.01  | 0.02  | -0.06 |
| Agreeableness     | 0.01      | 0.01   | -0.06 | 0.02   | 0.1    | -0.11  | 0     | 0.07   | -0.07 | -0.05 |
| Emotional         | 0.03      | -0.01  | 0.04  | 0.06   | 0.19   | 0.04   | 0.01  | -0.04  | 0.08  | 0.05  |
| Autonomy          | 0.03      | 0.02   | 0.01  | 0.02   | -0.09  | 0.09   | -0.04 | 0.02   | -0.1  | 0.03  |
| Risk              | 0.03      | -0.01  | -0.03 | -0.01  | 0      | -0.02  | -0.03 | -0.03  | 0.08  | 0.14  |

Affinity matrix. Source: Dupuy and G (2014). Note: Bold coefficients are significant at the 5 percent level.

In [5]:
nbCateg = 25

muhat_x0 = n_singles[0].iloc[0:nbCateg]
muhat_0y = n_singles[1].iloc[0:nbCateg]
muhat_xy = marr.iloc[0:nbCateg:,0:nbCateg]

In [6]:
Nh = muhat_xy.values.sum()+muhat_x0.sum()+muhat_0y.sum()

In [7]:
2*muhat_xy.values.sum()+muhat_x0.sum()+muhat_0y.sum()

14885023

In [8]:
muhat_xy = muhat_xy / Nh 
muhat_x0 = muhat_x0 / Nh 
muhat_0y = muhat_0y / Nh

In [9]:
n_x = muhat_xy.sum(axis = 1)+muhat_x0
m_y = muhat_xy.sum(axis = 0)+muhat_0y

In [10]:
nbX = nbCateg
nbY = nbCateg

xs = np.repeat(range(1,nbX+1),nbY).reshape(nbX,nbY)/25
ys = np.repeat(range(1,nbY+1),nbX).reshape(nbX,nbY).T/25

phi1_xy = -((xs-ys)**2).flatten()
phimat = np.column_stack((phi1_xy,np.multiply(phi1_xy,(((xs+ys)/2)**2).flatten()),np.multiply(phi1_xy,(((xs+ys-2)/2)**2).flatten()),np.multiply(phi1_xy,((xs+ys-1)**2).flatten())))

In [11]:
nbK = phimat.shape[1]
phimat_mean = phimat.mean(axis = 0)
phimat_stdev = phimat.std(axis = 0, ddof = 1)
phimat = ((phimat - phimat_mean).T/phimat_stdev[:,None]).T

In [12]:
def ObjFunc(uvlambda):
    u_x = uvlambda[0:nbX]
    v_y = uvlambda[nbX:(nbX+nbY)]
    l = uvlambda[(nbX+nbY):(nbX+nbY+nbK)]
    
    Phi_xy = phimat.dot(l.reshape(nbK,1)).reshape(nbX, nbY)
    mu_xy = np.exp(((Phi_xy - u_x).T-v_y).T/2)
    mu_x0 = np.exp(-u_x)
    mu_0y = np.exp(-v_y)
    
    val = sum(np.multiply(n_x,u_x))+sum(np.multiply(m_y,v_y))-np.sum(np.multiply(muhat_xy.values,Phi_xy), axis = (0,1)) + 2*np.sum(mu_xy, axis =(0,1)) + sum(mu_x0) + sum(mu_0y)
    
    return val

In [13]:
def grad_ObjFunc(uvlambda):
    u_x = uvlambda[0:nbX]
    v_y = uvlambda[nbX:(nbX+nbY)]
    l = uvlambda[(nbX+nbY):(nbX+nbY+nbK)]
    
    Phi_xy = phimat.dot(l.reshape(nbK,1)).reshape(nbX, nbY)
    mu_xy = np.exp(((Phi_xy - u_x).T-v_y).T/2)
    mu_x0 = np.exp(-u_x)
    mu_0y = np.exp(-v_y)
    
    grad_u = n_x - np.sum(mu_xy, axis = 0) - mu_x0
    grad_v = m_y - np.sum(mu_xy, axis = 1) - mu_0y
    grad_lambda = (mu_xy-muhat_xy.values).flatten()[:,None].T.dot(phimat)
    
    grad = np.concatenate((grad_u,grad_v,grad_lambda.flatten()))
    
    return grad

In [14]:
outcome = optimize.minimize(ObjFunc,method = 'CG',jac = grad_ObjFunc, x0 = np.repeat(0,nbX+nbY+nbK))

In [15]:
uvlambdahat =  outcome['x']
lambdahat = uvlambdahat[(nbX+nbY):(nbX+nbY+nbK)]
print(outcome)
print("")
print(ObjFunc(uvlambdahat))
print(lambdahat)

     fun: 7.677025691402801
     jac: array([ 4.88075801e-06,  5.10780308e-08, -5.40814185e-08, -9.67674359e-09,
       -3.27666407e-08, -5.03933471e-07, -1.01204451e-07, -6.63029291e-08,
       -6.00233391e-08, -5.64389548e-07,  9.76712940e-07,  1.22520487e-06,
        1.24756462e-06,  1.27584707e-06,  6.88245950e-07,  2.75181310e-07,
       -1.86412480e-07,  3.47244011e-07,  7.76230370e-07,  4.02940556e-08,
       -6.83570777e-07, -1.13239671e-06, -1.48543343e-06, -1.18186292e-06,
       -1.98637193e-06,  1.71723629e-06,  8.02963383e-07,  2.59413708e-07,
       -1.08526144e-07, -7.80600168e-07, -1.05835123e-06,  4.97217929e-07,
        1.18610346e-07,  1.95591308e-06,  2.88970690e-07,  5.54150734e-07,
        3.99675416e-07,  2.02151631e-06,  5.66475382e-07,  1.15734675e-06,
        2.94606296e-07,  1.68397986e-07,  1.32991993e-07,  1.41378739e-07,
       -3.87513667e-08, -2.92367924e-07, -1.25325454e-06, -1.78125071e-06,
       -1.47750538e-06, -4.15938391e-07, -1.59006568e-07,  5.7

In [16]:
print(lambdahat)

[-1.36756245 -6.4086041   4.60390741 -1.3049116 ]


# Part 2: the gravity equation

###  Learning objectives

* Regularized optimal transport

* The gravity equation

* Generalized linear models

* Pseudo-Poisson maximum likelihood estimation

### References

* Anderson and van Wincoop (2003). "Gravity with Gravitas: A Solution to the Border Puzzle". *American Economic Review*.

* Head and Mayer (2014). "Gravity Equations: Workhorse, Toolkit and Cookbook". *Handbook of International Economics*.

* Choo and Siow (2005). "Who marries whom and why". *Journal of Political Economy*.

* Gourieroux, Trognon, Monfort (1984). "Pseudo Maximum Likelihood Methods: Theory". *Econometrica*.

* McCullagh and Nelder (1989). *Generalized Linear Models*. Chapman and Hall/CRC.

* Santos Silva and Tenreyro (2006). "The Log of Gravity". *Review of Economics and Statistics*.

* Yotov et al. (2011). *An advanced guide to trade policy analysis*. WTO.

* Guimares and Portugal (2012). "Real Wages and the Business Cycle: Accounting for Worker, Firm, and Job Title Heterogeneity". *AEJ: Macro*.

* Dupuy and G (2014), "Personality traits and the marriage market". *Journal of Political Economy*.

* Dupuy, G and Sun (2019), "Estimating matching affinity matrix under low-rank constraints". *Information and Inference*.

* Carlier, Dupuy, Galichon and Sun "SISTA: learning optimal transport costs under sparsity constraints." *Communications on Pure and Applied Mathematics* (forthcoming).

## Motivation

The gravity equation is a very useful tool for explaining trade flows by various measures of proximity between countries.

A number of regressors have been proposed. They include: geographic distance, common official languague, common colonial past, share of common religions, etc.

The dependent variable is the volume of exports from country $i$ to country $n$, for each pair of country $\left(  i,n\right)$.

Today, we shall see a close connection between gravity models of international trade and separable matching models.

---

To start with, let's load some of the libraries we shall need.

In [17]:
import string as str
import math
import sys

And let's load our data, which comes from the book *An Advanced Guide to Trade Policy Analysis: The Structural Gravity Mode*, by Yotov et al. We will estimate the gravity model using optimal transport as well as using Poisson regression.

In [18]:
thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/gravity_wtodata/'
tradedata = pd.read_csv(thepath +'1_TraditionalGravity_from_WTO_book.csv', sep=',')
#tradedata = pd.read_csv(os.path.join(thepath ,'1_TraditionalGravity_from_WTO_book.csv'), sep=',')
tradedata = tradedata[['exporter', 'importer','year', 'trade', 'DIST','ln_DIST', 'CNTG', 'LANG', 'CLNY']]
tradedata.sort_values(['year','exporter','importer'], inplace = True)
tradedata.reset_index(inplace = True, drop = True)

nbt = len(tradedata['year'].unique())
nbi = len(tradedata['importer'].unique())
nbk = 4

tradedata.head()

Unnamed: 0,exporter,importer,year,trade,DIST,ln_DIST,CNTG,LANG,CLNY
0,ARG,ARG,1986,61288.590263,533.90824,6.280224,0,0,0
1,ARG,AUS,1986,27.764874,12044.574134,9.39637,0,0,0
2,ARG,AUT,1986,3.559843,11751.146521,9.371706,0,0,0
3,ARG,BEL,1986,96.102567,11305.285764,9.333026,0,0,0
4,ARG,BGR,1986,3.129231,12115.572046,9.402246,0,0,0


Consistent the common practice, we only look at the flows of trade between pairs of distinct countries:

In [19]:
tradedata.loc[np.where(tradedata['importer']==tradedata['exporter'],True, False),['DIST', 'ln_DIST', 'CNTG', 'LANG', 'CLNY']]=0

Let's prepare the data so that we can use it. We want to construct 
* $D_{ni,t}^k$ which is the $k$th pairwise discrepancy measure between importer $n$ and exporter $i$ at time $t$

* $X_{n,t}$ total value of expenditure of importer $n$ at time $t$
 
* $Y_{i,t}$ total value of production of exporter $i$ at time $t$

In [20]:
Xhatnit = []
Dnikt = []

years = tradedata['year'].unique()
for t, year in enumerate(years):
    
    tradedata_year = tradedata[tradedata['year']==year]
    
    Xhatnit.append(tradedata_year.pivot(index = 'exporter', columns = 'importer', values ='trade').values)
    np.fill_diagonal(Xhatnit[t],0)
    
    Dnikt.append(tradedata_year[[ 'ln_DIST', 'CNTG', 'LANG', 'CLNY']].values)
    
Xnt = np.zeros((nbi,nbt))
Yit = np.zeros((nbi,nbt))

for t in range(nbt):
    Xnt[:,t] = Xhatnit[t].sum(axis = 1)
    Yit[:,t] = Xhatnit[t].sum(axis = 0)
    
totalmass_t = sum(Xhatnit).sum(axis=(0,1))/nbt
pihat_nit = Xhatnit/totalmass_t

Let's standardize the data and construct some useful objects:

In [21]:
meanD_k = np.asmatrix([mat.mean(axis = 0) for mat in Dnikt]).mean(axis = 0)
sdD_k = np.asmatrix([mat.std(axis = 0,ddof = 1) for mat in Dnikt]).mean(axis = 0)

Dnikt = [(mat - meanD_k)/sdD_k for mat in Dnikt]

p_nt = Xnt/totalmass_t
q_nt = Yit/totalmass_t
IX = np.repeat(1, nbi).reshape(nbi,1)
tIY = np.repeat(1, nbi).reshape(1,nbi)

f_nit = []
g_nit = []

for t in range(nbt):
    f_nit.append(p_nt[:,t].reshape(nbi,1).dot(tIY))
    g_nit.append(IX.dot(q_nt[:,t].reshape(1,nbi)))

### Regularized optimal transport

Consider the optimal transport duality

\begin{align*}
\max_{\pi\in\mathcal{M}\left(  P,Q\right)  }\sum_{xy}\pi_{xy}\Phi_{xy}=\min_{u_{x}+v_{y}\geq\Phi_{xy}}\sum_{x\in\mathcal{X}}p_{x}u_{x}+\sum_{y\in\mathcal{Y}}q_{y}v_{y}
\end{align*}

Now let's assume that we are adding an entropy to the primal objective function. For any $\sigma>0$, we get

\begin{align*}
&  \max_{\pi\in\mathcal{M}\left(  P,Q\right)  }\sum_{xy}\pi_{xy}\Phi_{xy}-\sigma\sum_{xy}\pi_{xy}\ln\pi_{xy}\\
&  =\min_{u,v}\sum_{x\in\mathcal{X}}p_{x}u_{x}+\sum_{y\in\mathcal{Y}}q_{y}v_{y}+\sigma\sum_{xy}\exp\left(  \frac{\Phi_{xy}-u_{x}-v_{y}-\sigma}{\sigma}\right)
\end{align*}

The latter problem is an unconstrained convex optimization problem. But the most efficient numerical computation technique is often coordinate descent, i.e. alternate between minimization in $u$ and minimization in $v$.

### Iterated fitting

Maximize wrt to $u$ yields

\begin{align*}
e^{-u_{x}/\sigma}=\frac{p_{x}}{\sum_{y}\exp\left(  \frac{\Phi_{xy}-v_{y}-\sigma}{\sigma}\right)  }
\end{align*}

and wrt $v$ yields

\begin{align*}
e^{-v_{y}/\sigma}=\frac{q_{y}}{\sum_{x}\exp\left(  \frac{\Phi_{xy}-v_{y}-\sigma}{\sigma}\right)  }
\end{align*}

It is called the "iterated projection fitting procedure" (ipfp), aka "matrix scaling", "RAS algorithm", "Sinkhorn-Knopp algorithm", "Kruithof's method", "Furness procedure", "biproportional fitting procedure", "Bregman's procedure". See survey in Idel (2016).

Maybe the most often reinvented algorithm in applied mathematics. Recently rediscovered in a machine learning context.

### Econometrics of matching

The goal is to estimate the matching surplus $\Phi_{xy}$. For this, take a linear parameterization

\begin{align*}
\Phi_{xy}^{\beta}=\sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}.
\end{align*}

Following Choo and Siow (2006), Galichon and Salanie (2011) introduce logit heterogeneity in individual preferences and show that the equilibrium now maximizes the *regularized Monge-Kantorovich problem*

\begin{align*}
W\left(  \beta\right)  =\max_{\pi\in\mathcal{M}\left(  P,Q\right)  }\sum_{xy}\pi_{xy}\Phi_{xy}^{\beta}-\sigma\sum_{xy}\pi_{xy}\ln\pi_{xy}
\end{align*}

By duality, $W\left(  \beta\right)  $ can be expressed

\begin{align*}
W\left(  \beta\right)  =\min_{u,v}\sum_{x}p_{x}u_{x}+\sum_{y}q_{y}v_{y}+\sigma\sum_{xy}\exp\left(  \frac{\Phi_{xy}^{\beta}-u_{x}-v_{y}-\sigma}{\sigma}\right)
\end{align*}

and w.l.o.g. can set $\sigma=1$ and drop the additive constant $-\sigma$ in the $\exp$.

### Estimation

We observe the actual matching $\hat{\pi}_{xy}$. Note that $\partial W/ \partial\beta^{k}=\sum_{xy}\pi_{xy}\phi_{xy}^{k},$\ hence $\beta$ is estimated by running

<a name='objFun'></a>
\begin{align*}
\min_{u,v,\beta}\sum_{x}p_{x}u_{x}+\sum_{y}q_{y}v_{y}+\sum_{xy}\exp\left(\Phi_{xy}^{\beta}-u_{x}-v_{y}\right)  -\sum_{xy,k}\hat{\pi}_{xy}\beta_{k}\phi_{xy}^{k}
\end{align*}

which is still a convex optimization problem.

This is actually the objective function of the log-likelihood in a Poisson regression with $x$ and $y$ fixed effects, where we assume

\begin{align*}
\pi_{xy}|xy\sim Poisson\left(  \exp\left(  \sum_{k=1}^{K}\beta_{k}\phi
_{xy}^{k}-u_{x}-v_{y}\right)  \right)  .
\end{align*}

### Poisson regression with fixed effects

Let $\theta=\left(  \beta,u,v\right)  $ and $Z=\left(  \phi,D^{x},D^{y}\right)  $ where $D_{x^{\prime}y^{\prime}}^{x}=1\left\{  x=x^{\prime}\right\}  $ and $D_{x^{\prime}y^{\prime}}^{y}=1\left\{  y=y^{\prime}\right\}$ are $x$-and $y$-dummies. Let $m_{xy}\left(  Z;\theta\right)  =\exp\left(\theta^{\intercal}Z_{xy}\right)  $ be the parameter of the Poisson distribution.

The conditional likelihood of $\hat{\pi}_{xy}$ given $Z_{xy}$ is

\begin{align*}
l_{xy}\left(  \hat{\pi}_{xy};\theta\right)   &  =\hat{\pi}_{xy}\log m_{xy}\left(  Z;\theta\right)  -m_{xy}\left(  Z;\theta\right) \\
&  =\hat{\pi}_{xy}\left(  \theta^{\intercal}Z_{xy}\right)  -\exp\left(\theta^{\intercal}Z_{xy}\right) \\
&  =\hat{\pi}_{xy}\left(  \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right)  -\exp\left(  \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right)
\end{align*}

Summing over $x$ and $y$, the sample log-likelihood is

\begin{align*}
\sum_{xy}\hat{\pi}_{xy}\sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-\sum_{x}p_{x}u_{x}-\sum_{y}q_{y}v_{y}-\sum_{xy}\exp\left(  \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right)
\end{align*}

hence we recover the [objective function](#objFun).

### From Poisson to pseudo-Poisson

If $\pi_{xy}|xy$ is Poisson, then $\mathbb{E}\left[\pi_{xy}\right]=m_{xy}\left(  Z_{xy};\theta\right)  =\mathbb{V}ar\left(  \pi_{xy}\right)  $. While it makes sense to assume the former equality, the latter is a rather strong assumption.

For estimation purposes, $\hat{\theta}$ is obtained by

\begin{align*}
\max_{\theta}\sum_{xy}l\left(  \hat{\pi}_{xy};\theta\right)  =\sum_{xy}\left(\hat{\pi}_{xy}\left(  \theta^{\intercal}Z_{xy}\right)  -\exp\left(\theta^{\intercal}Z_{xy}\right)  \right)
\end{align*}

however, for inference purposes, one shall not assume the Poisson distribution. Instead

\begin{align*}
\sqrt{N}\left(  \hat{\theta}-\theta\right)  \Longrightarrow\left(A_{0}\right)  ^{-1}B_{0}\left(  A_{0}\right)  ^{-1}
\end{align*}

where $N=\left\vert \mathcal{X}\right\vert \times\left\vert \mathcal{Y}\right\vert $ and $A_{0}$ and $B_{0}$ are estimated by

\begin{align*}
\hat{A}_{0}  &  =N^{-1}\sum_{xy}D_{\theta\theta}^{2}l\left(  \hat{\pi}_{xy};\hat{\theta}\right)  =N^{-1}\sum_{xy}\exp\left(  \hat{\theta}^{\intercal}Z_{xy}\right)  Z_{xy}Z_{xy}^{\intercal}\\
\hat{B}_{0}  &  =N^{-1}\sum_{xy}\left(  \hat{\pi}_{xy}-\exp\left(  \hat{\theta}^{\intercal}Z_{xy}\right)  \right)  ^{2}Z_{xy}Z_{xy}^{\intercal}.
\end{align*}

## The gravity equation

"Structural gravity equation" (Anderson and van Wincoop, 2003) as exposited in Head and Mayer (2014)
handbook chapter:

\begin{align*}
X_{ni}=\underset{S_{i}}{\underbrace{\frac{Y_{i}}{\Omega_{i}}}}\underset{M_{n}}{\underbrace{\frac{X_{n}}{\Psi_{n}}}}\Phi_{ni}%
\end{align*}

where $n$=importer, $i$=exporter, $X_{ni}$=trade flow from $i$ to $n$, $Y_{i}=\sum_{n}X_{ni}$ is value of production, $X_{n}=\sum_{i}X_{ni}$ is importers' expenditures, and $\phi_{ni}$=bilateral accessibility of $n$ to $i$.

$\Omega_{i}$ and $\Psi_{n}$ are \textquotedblleft multilateral resistances\textquotedblright, satisfying the set of implicit equations

\begin{align*}
\Psi_{n}=\sum_{i}\frac{\Phi_{ni}Y_{i}}{\Omega_{i}}\text{ and }\Omega_{i}%
=\sum_{n}\frac{\Phi_{ni}X_{n}}{\Psi_{n}}%
\end{align*}

These are exactly the same equations as those of the regularized OT.

## Explaining trade

Parameterize $\Phi_{ni}=\exp\left(  \sum_{k=1}^{K}\beta_{k}D_{ni}^{k}\right)  $, where the $D_{ni}^{k}$ are $K$ pairwise measures of distance between $n$ and $i$. We have

\begin{align*}
X_{ni}=\exp\left(  \sum_{k=1}^{K}\beta_{k}D_{ni}^{k}-s_{i}-m_{n}\right)
\end{align*}

where fixed effects $s_{i}=-\ln S_{i}$ and $m_{n}=-\ln M_{n}$ are adjusted by

\begin{align*}
\sum_{i}X_{ni}=Y_{i}\text{ and }\sum_{n}X_{ni}=X_{n}.
\end{align*}

Standard choices of $D_{ni}^{k}$'s:

* Logarithm of bilateral distance between $n$ and $i$

* Indicator of contiguous borders; of common official language; of
colonial ties

* Trade policy variables: presence of a regional trade agreement; tariffs

* Could include many other measures of proximity, e.g. measure of genetic/cultural distance, intensity of communications, etc.

We will solve this model by fixing a $\beta$ and solving the matching problem using IPFP. Then in an outer loop we will solve for the $\beta$ which minimizes the distance between model and empirical moments.

In [22]:
sigma = 1
maxiterIpfp = 1000
maxiter = 500
tolIpfp = 1e-12
tolDescent = 1e-12
t_s = 0.03
iterCount = 0
contIter = True

v_it = np.zeros((nbi, nbt))
beta_k = np.repeat(0, nbk)

thegrad = np.repeat(0, nbk)
pi_nit = []

theval_old = -math.inf

ptm = time.time()
while(contIter):
    
    #print("Iteration", iterCount)
    
    for t in range(nbt):
        
        #print("Year", t)

        D_ij_k = Dnikt[t]

        Phi = D_ij_k.dot(beta_k.reshape(nbk,1)).reshape(nbi,nbi)

        contIpfp = True
        iterIpfp = 0

        v = v_it[:, t].reshape(1,nbi)
        f = f_nit[t]
        g = g_nit[t]

        K = np.exp(Phi/sigma)
        np.fill_diagonal(K,0)

        fK = np.multiply(f,K)
        gK = np.multiply(g,K)

        while(contIpfp):

            iterIpfp = iterIpfp + 1

            u = sigma * np.log(np.sum(np.multiply(gK,np.exp((-IX.dot(v))/sigma)), axis = 1)).flatten()
            vnext = sigma * np.log(np.sum(np.multiply(fK,np.exp((-u.T.dot(tIY))/sigma)), axis = 0))
            error = np.max(np.abs(np.sum(np.multiply(gK,np.exp((-IX.dot(vnext) - u.T.dot(tIY))/sigma)), axis = 1) - 1))

            if (error < tolIpfp or iterIpfp >= maxiterIpfp):
                contIpfp = False
            v = vnext

        v_it[:,t] = np.asarray(v)[0]

        fgK = np.multiply(f,gK)
        pi_nit.append(np.multiply(fgK,np.exp((-IX.dot(v) - u.T.dot(tIY))/sigma)))

        thegrad = thegrad + (pi_nit[t]-pihat_nit[t]).flatten(order = 'F').dot(D_ij_k)

    beta_k = beta_k - t_s * thegrad

    nonzero_pi_nit = np.concatenate(pi_nit).ravel()[np.where(np.concatenate(pi_nit).ravel()>0, True, False)]
    theval = float(np.sum(np.multiply(thegrad,beta_k), axis = 1)) - sigma * float(np.sum(np.multiply(nonzero_pi_nit, np.log(nonzero_pi_nit)),axis=(0,1)))

    iterCount = iterCount + 1

    if (iterCount > maxiter or np.abs(theval - theval_old) < tolDescent):
        contIter = False

    theval_old = theval
    thegrad = np.repeat(0, nbk)
    pi_nit = []
    
diff = time.time() - ptm
print('Time elapsed = ', diff, 's.')

Time elapsed =  12.56294560432434 s.


In [23]:
beta_k = beta_k/sdD_k
print(beta_k)

[[-0.84092368  0.43744866  0.2474767  -0.22249036]]


We recover the PPML estimates on Table 1 p. 42 of [Yotov et al.'s book](https://www.wto.org/english/res_e/booksp_e/advancedwtounctad2016_e.pdf)