# <center>Block 8: A short tutorial on convex analysis</center>
### <center>Alfred Galichon (NYU)</center>
## <center>`math+econ+code' masterclass on matching models, optimal transport and applications</center>
<center>© 2018-2019 by Alfred Galichon. Support from NSF grant DMS-1716489 is acknowledged. James Nesbit contributed.</center>


### References

* [OTME], Ch. 6

* Rockafellar (1970). *Convex analysis*. Princeton.

## Theory

### Legendre-Fenchel transforms

Assume that $P$ and $Q$ have a convex support with nonempty interior. Recall that if a dual minimizer $\left(  u,v\right)$ exists, $u$ and $v$ are related by

<a name='relations'></a>
\begin{align*}
v\left(  y\right)   &  =\max_{x\in\mathbb{R}^{d}}\left\{  x^{\intercal}y-u\left(  x\right)  \right\}\\
u\left(  x\right)   &  =\max_{y\in\mathbb{R}^{d}}\left\{  x^{\intercal}y-v\left(  y\right)  \right\}  \label{expr_u}%
\end{align*}

(we can always assign the value $+\infty$ to $u$ outside of the support of $P$ and same for $v$).

This expression is a fundamental tool in convex analysis: it is called the *Legendre-Fenchel transform*, which is defined in general by:

---
**Definition: Legendre-Fenchel transform**

The Legendre-Fenchel transform of $u$ is defined by

\begin{align*}
u^{\ast}\left(  y\right)  =\sup_{x\in\mathbb{R}^{d}}\left\{  x^{\intercal }y-u\left(  x\right)  \right\}.
\end{align*}

---

The Legendre-Fenchel transform has the following properties

---
**Proposition**

The following holds:

1. $u^{\ast}$ is convex.

2. $u_{1}\leq u_{2}$ implies $u_{1}^{\ast}\geq u_{2}^{\ast}$.

3. (Fenchel's inequality): $u\left(  x\right)  +u^{\ast}\left(  y\right) \geq x^{\intercal}y$.

4. $u^{\ast\ast}\leq u$ with equality iff $u$ is convex.

---

As an immediate corollary of 4., we get the fundamental result:

---
**Proposition**

If $u$ is convex, then $u=\left(  u^{\ast}\right)  ^{\ast}$. The converse holds true.

---

---
**Example**

1. For $u\left(  x\right)  =\left\vert x\right\vert ^{2}/2$, one gets $u^{\ast}\left(  y\right)  =\left\vert y\right\vert ^{2}/2$.

2. For $u\left(  x\right)  =\sum_{i}\lambda_{i}x_{i}^{2}/2$, $\lambda_{i}>0$, one gets $u^{\ast}\left(  y\right)  =\sum_{i}\lambda_{i}^{-1}y_{i}^{2}/2$.

3. The entropy function

\begin{align*}
u\left(x\right)  =\left\{
\begin{array}
[c]{c}
\sum_{i=1}^{d}x_{i}\ln x_{i}\text{ for }x\geq0\text{, }\sum_{i=1}^{d}x_{i}=1\\
+\infty\text{ otherwise}
\end{array}
\right.
\end{align*}

has a Legendre transform which is the log-partition function, a.k.a. logit function

\begin{align*}
u^{\ast}\left(  y\right)  =\ln\left(  \sum_{i=1}^{d}e^{y_{i}}\right)  .
\end{align*}

---

### Subdifferentials

We now restate the demand sets of workers and firms in terms of subdifferentials of convex functions. For this, let us recall the basic economic interpretation of the [relations](#relations), which we had previously spelled out: Expression for [$v$](#relations) captures the problem of a firm of type $y$, which hires a worker $x$ who offers the best trade-off between production if hired by $y$ (that is $\Phi\left(  x,y\right)=x^{\intercal}y$) and wage $u\left(  x\right)$. Thus, firm $y$ will be willing to match with any worker whithin the set of maximizers of [$v$](#relations)), while worker $x$ will be willing to match with any firm whithin the set of maximizers of [$u$](#relations) . The set of maximizers of [$v$](#relations) and of [$u$](#relations) are called *subdifferentials* of $v$ and $u$.

---
<a name='subdifferential'></a>
**Definition: Subdifferential**

Let $u:\mathbb{R}^{d}\rightarrow\mathbb{R}$. The subdifferential of $u$ at $x$, denoted $\partial u\left(  x\right)  $, is the set of $y\in\mathbb{R}^{d}$ such that $\forall\tilde{x}\in\mathbb{R}^{d}$, $u\left(  \tilde{x}\right)  \geq u\left(  x\right)  +y^{\intercal}\left(\tilde{x}-x\right)$.

---

The definition does *not* require $u$ to be convex; however, if $u$ is convex, the [definition](#subdifferential) immediately implies that 

\begin{align*}
\partial u\left(  x\right)  =\arg\max_{y}\left\{  x^{\intercal}y-u^{\ast }\left(  y\right)  \right\}  ,\label{subdiffConvex}
\end{align*}

hence the subdifferential of a convex function is always nonempty (while the subdifferential of a non-convex function can be empty in general).

When $u$ is differentiable and convex, then

\begin{align*}
\partial u\left(  x\right)  =\left\{  \nabla u\left(  x\right)  \right\}  .
\end{align*}

---
**Example**

When $u\left(  x\right)  =\left\vert x\right\vert $, one has $\partial u\left(  x\right)  =\left\{  -1\right\}  $ if $x<0$, $\left\{  +1\right\}  $ if $x>0$, and $\left[  -1,+1\right]  $ if $x=0$.

---

It also follows that if $u$ is a convex function, the following statements are
equivalent:

<a name='fenchel'></a>
\begin{align*}
\text{(i)}  &  \text{ }u\left(  x\right)  +u^{\ast}\left(  y\right) =x^{\intercal}y\\
\text{(ii)}  &  \text{ }y\in\partial u\left(  x\right)\\
\text{(iii)}  &  \text{ }x\in\partial u^{\ast}\left(  y\right).
\end{align*}

Going back to our worker-firm example, this has a straightforward economic interpretation. If worker $x$ chooses firm $y$, then $y$ maximizes $x^{\intercal}\tilde{y}-u^{\ast}\left(  \tilde{y}\right)$ over $\tilde{y}$, thus $y\in\partial u\left(  x\right)  $. This means that while worker $x$'s equilibrium wage $u\left(  x\right)  $ is in general greater or equal than the value $x^{\intercal}y-u^{\ast}\left(  y\right)$ she can extract from firm $y$, those two values necessarily coincide if $x$ and $y$ are willing to match, in which case $u\left(  x\right)  +u^{\ast}\left(  y\right)=x^{\intercal}y$.

These considerations allow us to relate the solutions to the primal and dual problems. Recall that in the finite-dimensional case, where the primal and the dual problems are related by a complementary slackness condition. In the present case, let $\left(  X,Y\right)  \sim\pi$ be a solution to the primal problem, and $\left(  u,u^{\ast}\right)$ be a solution to the dual problem. Then almost surely $X$ and $Y$ are willing to match, which, by the previous discussion, implies that

<a name='galCS'></a>
\begin{align*}
u\left(  X\right)  +u^{\ast}\left(  Y\right)  =X^{\intercal}Y,
\end{align*}

or equivalently $Y\in\partial u\left(  X\right)  $ or in turn $X\in\partial u^{\ast}\left(  Y\right)  $. In other words, the support of $\pi$ is included in the set $\left\{  \left(  x,y\right)  :u\left(  x\right)  +u^{\ast}\left( y\right)  =x^{\intercal}y\right\}  $. This condition appears as the correct generalization of the complementary slackness condition in the finite-dimensional case. Without surprise, taking the expectation with respect to $\pi$ of the [equality](#galCS) yields the equality between the value of the dual problem on the left-hand side, and the value of the primal problem on the right-hand side.

### Gradient of convex functions

More can be said when $u$ is differentiable at $x$. In that case, it is not hard to show that $\partial u\left(  x\right)  =\left\{  \nabla u\left( x\right)  \right\}  $, i.e. contains only one point, which is $\nabla u\left( x\right)  =\left(  \partial u\left(  x\right)  /\partial x_{i}\right)_{i}$,the vector of partial derivatives of $u$, or gradient of $u$. Similarly, if $u^{\ast}$ is differentiable at $y$, then $\partial u^{\ast}\left(  y\right) =\left\{  \nabla u^{\ast}\left(  y\right)\right\}  $. Hence, if $u$ and $v$ are differentiable, then the equivalence between [(ii)](#fenchel) and [(iii)](#fenchel) implies that $y=\nabla u\left(  x\right)  $ if and only if $x=\nabla u^{\ast}\left(  x\right)$, that is

<a name='invGradient'></a>
\begin{align*}
\left(  \nabla u\right)  ^{-1}=\nabla u^{\ast}.
\end{align*}

Alternatively, relation [above](#invGradient) can be seen as a duality between first-order conditions and the envelope theorem. First order conditions in the firm's problem [$v$](#relations) implies that if worker $x$ is chosen by firm $y$, then $\nabla u\left(  x\right)  =y$, but the envelope theorem implies that the gradient in $y$ of the firm's indirect profit $u^{\ast}\left(y\right)  $ is given by $\nabla u^{\ast}\left(  y\right)  =x$, where $x$ is chosen by $y$. Thus the first-order conditions and the envelope theorem are 'conjugate' in the sense of convex analysis.

---
**Example**

When  $u\left(  x\right)  =\sum_{i}\lambda_{i}x_{i}^{2}/2$, $\lambda_{i}>0$, recall that $u^{\ast}\left(  y\right)  =\sum_{i}\lambda_{i}^{-1}y_{i}^{2}/2$. Define $\Lambda=diag\left(  \lambda\right)  $. One has $\nabla u\left( x\right)  =\Lambda x$ and $\nabla u^{\ast}\left(  y\right)  =\Lambda^{-1}y$.

---

### Hessians of convex functions

Assume both $u$ and $u^{\ast}$ are stricly convex and differentiable. Then it can be show that their Hessians are invertible at all points, and that if $y=\nabla u\left(  x\right)$, then

\begin{align*}
D^{2}u^{\ast}\left(  y\right)  =\left(  D^{2}u\left(  x\right)  \right)
^{-1}.
\end{align*}

This can be obtained by differentiating the relationship $\nabla u^{\ast}\left(  y\right)  =\left(  \nabla u\right)  ^{-1}\left(  y\right)$.

## Exercises

### Gradients and subdifferentials

---
**Exercise**

Compute the Legendre-Fenchel transforms of the following functions:

1. $u\left(  x\right)  =x^{\intercal}\Sigma x/2$, where $\Sigma$ is a positive definite matrix, one has $u^{\ast}\left(  y\right)  =y^{\intercal }\Sigma^{-1}y/2$.

2. Let $p>1$ and $u\left(  x\right)  =\frac{1}{p}\left\Vert x\right\Vert^{p}$, where $\left\Vert .\right\Vert $ is the Euclidean norm. Then $u^{\ast}\left(  y\right)  =\frac{1}{q}\left\Vert y\right\Vert ^{q}$, where $q>1$ suchthat $1/p+1/q=1$.

3. $u\left(  x\right)  =1\left\{  x\in\left[  0,1\right]  \right\}$.

---
---
**Exercise**

Give the subdifferentials of the following functions from $\mathbb{R}$ to $\mathbb{R}$:

1. $u\left(  x\right)  =\max\left(  x,0\right)$.

2. $u\left(  x\right)  =\max\left(  f\left(  x\right)  ,g\left(  x\right)\right)  $, where both $f$ and $g$ are convex and differentiable.

3. $u\left(  x\right)  =\max_{1\leq i\leq n}\left\{  a_{i}x+b_{i}\right\}$, where $a_{1}<a_{2}<...<a_{n}$.

4. $u\left(  x\right)  =-x^{2}$.

---

### Entropy function

Consider the entropy function

\begin{align*}
u\left(  x\right)  =\left\{
\begin{array}
[c]{c}%
\sum_{i=1}^{d}x_{i}\ln x_{i}\text{ for }x\geq0\text{, }\sum_{i=1}^{d}x_{i}=1\\
+\infty\text{ otherwise}
\end{array}
\right.  .
\end{align*}

As it is defined on the simplex, it is not a differentiable function from $\mathbb{R}^{d}$ to $\mathbb{R}$. Instead, let us take $x_{d}=1-\sum_{i=1}^{d-1}x_{i}$, and let us view $u$ as a function $\tilde{u}$ from $\mathbb{R}^{d-1}$ to $\mathbb{R}$. We define

\begin{align*}
\tilde{u}\left(  x\right)  =\sum_{i=1}^{d-1}x_{i}\ln x_{i}+\left(1-\sum_{i=1}^{d-1}x_{i}\right)  \ln\left(  1-\sum_{i=1}^{d-1}x_{i}\right)
\end{align*}

if $x\geq0$, $\sum_{i=1}^{d-1}x_{i}\leq1$, $\tilde{u}\left(  x\right)=+\infty$ otherwise.

---
**Exercise**

Show that:

1. The Legendre transform of $\tilde{u}$ is a function of $\mathbb{R}^{d-1}$ to $\mathbb{R}$ given by

\begin{align*}
\tilde{u}^{\ast}\left(  y\right)  =\ln\left(  \sum_{i=1}^{d-1}e^{y_{i}}+1\right).
\end{align*}

2. The gradient of $\tilde{u}$ is a vector in $\mathbb{R}^{d-1}$ given by

\begin{align*}
\nabla\tilde{u}\left(  x\right)  =\left(  \ln\left(  \frac{x_{i}}{1-\sum_{i=1}^{d-1}x_{i}}\right)  \right)  _{1\leq i\leq d-1}
\end{align*}

3. The gradient of $\tilde{u}^{\ast}$ is a vector in $\mathbb{R}^{d-1}$ given by
\begin{align*}
\nabla\tilde{u}^{\ast}\left(  y\right)  =\left(  \frac{e^{y_{i}}}{\sum
_{i=1}^{d-1}e^{y_{i}}+1}\right)  _{1\leq i\leq d-1}%
\end{align*}

4. Compute $D^{2}\tilde{u}$ and $D^{2}\tilde{u}^{\ast}$.

---

TO ADD:
 Constant elasticity of substitution (CES): if $f\left( x\right)
=(\sum_{i}x_{i}^{(\alpha -1)/\alpha })^{\alpha /(\alpha -1)}$ for $x\geq 0$,
$+\infty $ otherwise, and assuming $r=(\alpha -1)/\alpha >1$, $f^{\ast
}\left( y\right) =0$ if $\sum_{i=1}^{d}y_{i}^{1-\alpha }\leq 1$, $+\infty $
otherwise.
    

TO ADD: Let us compute $f^{\ast }\left( y\right) =\max_{x}\left\{ x^{\prime
}y-f\left( x\right) \right\} $ for $r>1$. Note that for $k>0$, $f\left(
kx\right) =kf\left( x\right) $, thus if there is some $x$ such that $f\left(
x\right) <x^{\prime }y$, then one replace $x$ by $kx$, $k>0$ arbitrarily
small, which implies that $f^{\ast }\left( y\right) =+\infty $. One the
other hand, if for all $x$, $f\left( x\right) \geq x^{\prime }y$, then $%
f^{\ast }\left( y\right) =0$. As a result, we will look whether there is $x$
normalized so that $\sum x_{i}^{r}=1$ such that $f\left( x\right) <x^{\prime
}y$. To do this, compute%
\begin{equation*}
\max_{x\geq 0}\left\{ x^{\prime }y-f\left( x\right)
:\sum_{i=1}^{d}x_{i}^{r}=1\right\}
\end{equation*}%
which by first order conditions yields $y_{i}=\lambda ^{\frac{r-1}{r}%
}x_{i}^{r-1}$ (where $\lambda ^{\frac{r-1}{r}}/r$ is the Lagrange multiplier
of the constraint). Hence, $\lambda =\sum y_{i}^{\frac{r}{r-1}}$, and the
value of the problem is $\lambda ^{\frac{r=1}{r}}\sum x_{i}^{r}-1=\left(
\sum y_{i}^{\frac{r}{r-1}}\right) ^{\frac{r-1}{r}}-1$. Therefore, $f^{\ast
}\left( y\right) =0$ if $\sum_{i=1}^{d}y_{i}^{\frac{r}{r-1}}\leq 1$, and $%
f^{\ast }\left( y\right) =+\infty $ else. Hence for $r>1$,
\begin{equation*}
f\left( x\right) =\sup_{y\geq 0}\left\{
\sum_{i=1}^{d}x_{i}y_{i}:\sum_{i=1}^{d}y_{i}^{\frac{r}{r-1}}=1\right\} ,
\end{equation*}%
and a similar logic shows that when $r\leq 1$,%
\begin{equation*}
f\left( x\right) =\inf_{y\geq 0}\left\{
\sum_{i=1}^{d}x_{i}y_{i}:\sum_{i=1}^{d}y_{i}^{\frac{r}{r-1}}=1\right\} .
\end{equation*}