## References


1. Kaare B. Petersen, Michael S. Pedersen, The Matrix Cookbook, 2012.
2. Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Cambridge University Press, 2009.
3. https://math.stackexchange.com/questions/1158798/show-that-the-dual-norm-of-the-spectral-norm-is-the-nuclear-norm 

# 2 Derivative

Definition. Suppose $g: \mathbb R^n\rightarrow \mathbb R^m$ is a mapping (generalized function) and $x$ is a interior point $x\in {\rm int}({\rm dom}(g))$. The function $g$ is differentiable at $x$ if there exists a matrix $Dg(x)\in \mathbb R^{m\times n}$ such that 
$$\lim_{x\in {\rm dom}(g), z\rightarrow x}\frac{\Vert g(z) - g(x) - Dg(x)(z-x)\Vert}{\Vert z -x\Vert } = 0.$$

And then we call $Dg(x)$ the derivative or Jacobian of $g$ at $x$. And write 

$$\frac{\partial }{\partial x}g(x) = Dg(x).$$

The derivative $Dg(x)$ can be also determined by $m\times n$ distinct partial derivatives.

$$Dg(x)_{ij} = D_ig_j(x) = \frac{\partial}{\partial x_i}g_j(x) = \lim_{t\rightarrow 0}\frac{g_j(x+te_i)-g_j(x)}{t}$$

### Examples

#### Chain Rule

Suppose $h: \mathbb R^{n}\rightarrow \mathbb R^p$ is the composition of $f:\mathbb R^n\rightarrow \mathbb R^m, g:\mathbb R^m \rightarrow \mathbb R^p$, i.e. $h(x) = g(f(x))$. Then, 

$$Dh(x) = Dg(f(x))Df(x)$$

#### Linear Transformation

$$\frac{\partial }{\partial x}Ax = A,\quad \frac{\partial}{\partial x}x^TA = A^T$$

#### Quadratic Form

$$\frac{\partial }{\partial x}x^TAx = x^T(A+A^T)$$

#### Functions

$$\frac{\partial }{\partial x}[f_1(x_1),\dotsc,f_n(x_n)]^T = {\rm diag}[f_1'(x_1),\dotsc,f'_n(x_n)]$$

#### Hadmard Product

If $f,g:\ \mathbb R^n\rightarrow \mathbb R^m$, then 
$$\frac{\partial }{\partial x}f(x)\odot g(x) = \left(\frac{\partial }{\partial x}f(x)\right){\rm diag}g(x) + {\rm diag}f(x)\left(\frac{\partial }{\partial x}g(x)\right).$$

Proof. 
$$\frac{\partial }{\partial x}f(x)\odot g(x)_{ij}
= \frac{\partial f_i(x)g_i(x)}{\partial x_j}=\frac{\partial f_i(x)}{\partial x_j}g_i(x)
+f_i(x)\frac{\partial g_i(x)}{\partial x_j}.
$$

#### Scalar Multiplication

In particular, if $f:\ \mathbb R^n\rightarrow \mathbb R^m$ while $g:\ \mathbb R^n\rightarrow\mathbb  R$, then one can stack $g$ as a vector and see that 
$$\frac{\partial }{\partial x}f(x)g(x) = \left(\frac{\partial }{\partial x}f(x)\right)g(x) + f(x) \left(\frac{\partial }{\partial x}g(x)\right).$$




## First-order Condition

When $m = 1$ where $f: \mathbb R^n \rightarrow \mathbb R$ and differentiable, we call the transpose of the derivative $Df(x)$ as gradient, which is a column vector. $\nabla f: \mathbb R^n\rightarrow \mathbb R^n$. 

$$\nabla f(x) = Df(x)^T = [\frac{\partial f(x)}{\partial x_i}]^T$$


## Second-order Condition

$f: \mathbb R^n \rightarrow \mathbb R$ is twice differentiable at $x$ iff there exists a symmetric matrix $\nabla ^2f(x)\in \mathbb S^n$ such that $\nabla ^2f(x) = D\nabla f(x)$. 
$$\nabla ^2f(x)_{ij} = \frac{\partial ^2f(x)}{\partial x_i\partial x_j}$$

And then we call the matrix $\nabla ^2f(x)$ the Hessian of $f$.

### Taylor's Theorem

* Suppose function $f$ is continuously differentiable and $x\neq y\in {\rm dom}(f)$. Then there exists some $\alpha \in (0,1)$ that relies on $x,y$ such that

$$f(y) = f(x) + \nabla f(x+\alpha (y-x))^T(y-x)$$

Proof: Consider the function $g(t) = f(x+t(y-x))$. By Lagrange's mean value theorem we learn that there exists some $\alpha \in (0,1)$ such that $g(1) = g(0) + g'(\alpha)$. If we denote $w = x+\alpha(y-x)$, we have

$$f(y) = f(x) + \frac{\partial}{\partial \alpha}f(w)
= f(x) + \sum_{i=1}^n\frac{\partial f}{\partial w_i}\frac{\partial w_i}{\partial \alpha}
= f(x) + \nabla f(w)^T(y-x).
$$

<br>

* Further, suppose function $f$ is twice continuously differentiable and $x,y\in {\rm dom}(f)$. Then there exists some $\alpha \in (0,1)$ that relies on $x,y$ such that

$$f(y) = f(x) + \nabla f(x)^T(y-x)+\frac12 (y-x)^T \nabla^2f(x+\alpha(y-x))(y-x).$$

Proof: Similarly one would consider $g(t) = f(x+\alpha (y-x))$ where single-variable-version Taylor series have shown that $g(1) = g(0) + g'(0) + \frac12 g''(\alpha)$ for some $\alpha \in (0,1)$. It is already proved above that $g'(0) = \nabla f(x)^T(y-x)$, so the rest lies on $g''(\alpha)$. Denote $w = x + \alpha (y-x)$. Since $\frac{\partial}{\partial \alpha}f(w) = \nabla f(w)^T(y-x)$, we have

$$g''(\alpha) = \frac{\partial}{\partial \alpha}\nabla f(w)^T(y-x)=\frac{\partial}{\partial w}
\left( \nabla f(w)^T(y-x)\right)\frac{\partial w}{\partial \alpha}=(y-x)^T\nabla^2 f(w)(y-x).$$

<br>

* Further, suppose function $f$ is twice continuously differentiable and for all $x_1,x_2\in {\rm dom}(f)$, the "third order derivative" is bounded by $\Vert \nabla^2 f(x) - \nabla^2 f(y)\Vert \leqslant L \Vert x - y\Vert$, then for all $x,y\in {\rm dom}(f)$ we have 
$$f(y) \leqslant f(x) + \nabla f(x)^T(y-x)+\frac 12 (y-x)^T\nabla^2f(x)(y-x) + \frac L6\Vert y - x\Vert^3.$$

Proof: We can estimate the truncation error by integral, 
$$\begin{aligned}f(y) - f(x) - \nabla f(x)^T(y-x) &= \int_0^1 \nabla f(x+t(y-x))^T(y-x) dt - \nabla f(x)^T(y-x)\\&
=\int_0^1 \left(\nabla f(x+t(y-x))^T - \nabla f(x)^T\right)(y-x)dt\\
&= \int_0^1 \int_0^t (y-x)^T\nabla ^2f(x+u(y-x))(y-x)dudt\\
&= \iint_{0\leqslant u\leqslant t\leqslant 1}(y-x)^T\nabla ^2f(x+u(y-x))(y-x)dudt.\end{aligned}$$
Note that 
$$ (y-x)^T\left(\nabla ^2f(x+u(y-x)) - \nabla^2f(x)\right) (y-x)
\leqslant  \Vert \nabla ^2f(x+u(y-x)) - \nabla^2f(x)\Vert\cdot \Vert y - x\Vert^2
\leqslant uL\Vert y - x\Vert^3,$$

so 
$$\begin{aligned}  f(y) - f(x) - \nabla f(x)^T(y-x)  
&\leqslant \iint_{0\leqslant u\leqslant t\leqslant 1}\left((y-x)^T\nabla ^2f(x)(y-x)+uL\Vert y - x\Vert^3\right)dudt \\
&= \frac 12 (y-x)^T\nabla ^2f(x)(y-x)+\frac L6\Vert y - x\Vert^3.
\end{aligned}

### Convexity

**Theorem** Continuously differentiable $f$ is convex iff 
$$f(y)\geqslant f(x) + \nabla f(x)^T(y-x)$$

holds for all $x,y\in \mathbb R^n$. Particularly the function is strictly convex iff the inequality is strict for all $x\neq y$.

Proof: 

$\Rightarrow$ side:  For all $x,y\in \mathbb R^n$ and arbitrary $\alpha \in [0,1]$, by convexity of function $f$, we
can construct 
$$g(\alpha)= f(x+\alpha (y-x)) - (1-\alpha)f(x) - \alpha f(y)\leqslant 0,$$
whose derivative is 
$$g'(\alpha) = \nabla f(x+\alpha (y-x))^T(y-x) + f(x) - f(y).$$

Since $g(0)=0$ and therefore $g'(0)\leqslant 0$ yields the result. In particular, strictly convex function $f$ 
implies 
$$\frac12 (f(x)+f(y)) > f\left(\frac{x+y}{2}\right) \geqslant f(x)+ \frac 12\nabla f(x)^T(y-x).$$


$\Leftarrow$ side: Let $w = \alpha x+(1-\alpha)y$ and 
$$f(x)\geqslant f(w)+\nabla f(w)^T(x -w),\quad f(y)\geqslant f(w)+\nabla f(w)^T(y-w).$$

Note that $\alpha (x-w) + (1-\alpha) (y-w)=0$, we obtain
$$\alpha f(x)+(1-\alpha) f(y)\geqslant f(w).$$

From the proof above, it is simple to show the function is strictly convex if the condition is strict.
 
<br>

**Theorem** Twice continuously differentiable function $f$ defined on convex domain is convex iff

$$\nabla ^2f(x)\succeq 0,\quad \forall x\in {\rm dom}(f).$$

If $\nabla ^2f(x)\succ 0,\ \forall x\in {\rm dom}(f)$ , $f$ is strictly convex but the converse does not hold.

Proof: $\Rightarrow$ side: Taylor's theorem has showed that there exists some $\beta \in[0,\alpha]$ that
$$f(x+\alpha (y-x))=f(x)+\alpha \nabla f(x)^T(y-x) + \alpha ^2(y-x)^T\nabla ^2f(x+\beta (y-x))(y-x).$$

Yet we know that 
$$f(x+\alpha (y-x))\leqslant (1-\alpha)f(x) + \alpha f(y).$$

Combining the two yields
$$f(y)-f(x) = \nabla f(x)^T(y-x) + \alpha (y-x)^T\nabla ^2f(x+\beta (y-x))(y-x).$$

The previous theorem has stated that $f(y)-f(x)\geqslant \nabla f(x)^T(y-x)$, leading to
$$(y-x)^T\nabla ^2f(x+\beta (y-x))(y-x)\geqslant 0.$$

Take $\alpha,\beta\rightarrow 0_+$ we get
$$(y-x)^T\nabla ^2f(x)(y-x)\geqslant 0.$$

If $x\in {\rm int}({\rm dom}(f))$ and assume there exists some $r$ such that $r^T\nabla ^2f(x)r < 0$. Since the length of $r$ can be resized to arbitrarily small and without loss of generality we may assume $y=x+r\in {\rm dom}(f)$. This sees a contradiction so $\nabla ^2f(x)$ is positive semidefinite in the interior. As for the boundary, take the limit and by the continuity of Hessian we conclude the result.

<br>

$\Leftarrow$ side: It is trivial that combining the condition with Taylor's theorem yields $f(y)\geqslant f(x)+\nabla f(x)^T(y-x)$ and by the first theorem our proof is finished. Strictly positive definite Hessian leads to strict inequality and therefore strict convexity.

### Convex Functions

Here are some examples of convex functions. 

1. $f(x) = a^Tx$ because its Hessian is zero. It is at the same time concave.
2. Semidefinite quadratic forms $f(x) = x^TQx= \Vert x\Vert_Q^2$ where $Q\succeq 0$ is half of its Hessian.
3. $f(x) = e^x$ where $x\in \mathbb R$.
4. $f(x) = x^a$ where $x\in \mathbb R$ and $a\in (-\infty, 0]\cup [1,+\infty)$.
5. $f(x) = -\log x$ and $f(x) = x\log x$ on $x\in \mathbb R_{++}$.
6. $f(x) = |x|^p$ where $x\in \mathbb R$ and $p\geqslant 1$.
7. $f(x,Y) = x^TY^{-1}x:\ \mathbb R^n\times S_{++}^n \rightarrow \mathbb R$.
   
Proof. Epigraph of $f(x,Y)$ is convex because of its equivalence to $$\{(x,Y,t):\ t\geqslant x^TY^{-1}x\}
=\{(x,Y,t):\ \left[\begin{matrix}Y & x\\ x^T & t\end{matrix}\right]\succeq 0\}.$$

## Convexity Preserved Operations

Convex functions over convexity preserved operations are still convex functions. 

### Nonnegative Combinations

If $f_1,f_2,\dotsc, f_n$ are convex functions, weights $\alpha_1,\alpha_2,\dotsc, \alpha_n$ are nonnegative, then 
$f(x) = \sum_{i=1}^n \alpha_i f_i(x)$ is convex. Moreover, if for each $y\in \mathcal A$ the function $f(x,y)$ is convex with regard to $x$ and $w(\theta )$ is nonnegative, then
$$f(x) = \int_{\mathcal A} w(\theta )f(x,\theta )d\theta $$
is also convex. 

Proof. Trivial from the definition of convex functions.


### Affine Composition

Given $A\in \mathbb R^{n\times m}, b\in \mathbb R^n$, if $g(x)$ is convex, then $f(x) = g(Ax+b)$ is also convex.

### Pointwise Maximum

If $f_1,\dotsc, f_n$ are convex, then $f(x) = \max_i f_i(x)$ is also convex. Moreover, if for each $\theta \in \mathcal A$ the function $f_\theta(x )$ is convex with regard to $x$, then $f(x) = \sup_\theta  f_\theta(x )$ is also convex.

Proof. For $\alpha\in [0,1]$ and $x,y\in \bigcap_{\theta} {\rm dom}( f_\theta)$,$$
\alpha f(x) + (1-\alpha) f(y) = \alpha \sup f_\theta(x) + (1-\alpha) \sup f_\theta(y)\\
\geqslant \sup \left(\alpha f_\theta(x) + (1-\alpha) f_\theta(y)\right)
\geqslant \sup f_\theta(\alpha x+(1-\alpha)y) = f(\alpha x+(1-\alpha)y)$$

#### Examples

1. Support function of set $C$: $S_C(x) = \sup_{y\in C}y^Tx$.
2. Farthest distance to a set $C$: $f(x) = \sup_{y\in C}\Vert x -y\Vert$.
3. 2-norm of a matrix $\Vert A\Vert_2 = \sigma_1(A) =  \sup_{\Vert x\Vert_2 = 1} \Vert Ax\Vert$.
4. Sum of leading $k$ eigenvalues of a Hermite matrix, $\sum_{i=n+1-k}^{n} \lambda_i(A) = \sup_{Q^*Q = I_k}{\rm tr}(Q^*AQ)$.
5. Kyfan norm (sum of leading $k$ singular values) of a matrix, $\sum_{i=1}^k \sigma_i(A) = \sup_{Q^*Q = I_k}{\rm tr}(Q^*\left[\begin{matrix} 0 & A^T\\A & 0\end{matrix}\right]Q)$.

Note that both examples $3$ and $4$ are special cases of Kyfan norm, $5$.

### Composition with Monotonic Functions

Given $g:\ \mathbb R^n\rightarrow \mathbb R^m$ and $h:\ \mathbb R^m\rightarrow \mathbb R$, let their composition be
$$f(x) = h(g(x))=h(g_1(x),g_2(x),\dotsc,g_m(x)):\ \mathbb R^n\rightarrow \mathbb R.$$ 

Then, 
1. If $\forall g_i$ convex, $h$ convex  and $  h$ non-decreasing for each argument, then $f$ is convex. 
2. If $\forall g_i$ concave, $h$ convex  and $  h$ non-increasing for each argument then $f$ is convex.

Proof. We only prove the first. For all $x,y\in {\rm dom}(g)$ and $\alpha \in [0,1]$, we have
$$\alpha h(g(x)) + (1-\alpha)h(g(y))
\geqslant h(\alpha g(x)+(1-\alpha)g(y))
\geqslant h( g(\alpha x+(1-\alpha)y)).$$



#### Examples 

1. $-\log g(x)$ is convex if $g(x)$ is concave (and positive).
2. $\log \sum_{i=1}^n e^{g_i(x)}$ is convex if $g_i$ are convex.

Proof. It suffices to prove that $f(z) = \log \sum_{i=1}^n e^{z_i}$ is convex. Let $c = [1,\dotsc, 1]^T$
$$\nabla f(z)^T = \left( \frac{\partial}{\partial c^T e^z}\log c^T e^z \right) \cdot \left(
\frac{\partial}{\partial  e^z}c^Te^z\right) \cdot \left( \frac{\partial}{\partial z}e^z\right)
=\frac{1}{c^T e^z} c^T {\rm diag}(e^z)=\frac{(e^z)^T}{c^T e^z}.
$$
$$\nabla^2 f(z) = \nabla \frac{e^z}{c^Te^z}=\frac{1}{(c^Te^z)^2}\left({\rm diag}[e^{z_i}\sum_{j= 1}^ne^{z_j}] 
- [e^{z_i}e^{z_j}]\right).$$
The Hessian is positive semidefinite by Gerschgorin theorem.

In [10]:
import numpy as np
f = lambda x: np.log(np.sum(np.exp(x)))
x = np.random.randn(10)
y = np.random.randn(10)
alpha = np.random.random()
print(alpha * f(x) + (1 - alpha) * f(y), '>=', f(alpha * x + (1 - alpha) * y))

2.1491623212020623 >= 2.045361140510942


### Partial Minimization

If $f(x,y)$ is convex for $(x,y)$ and $C$ is a **convex subset** in the domain, then 
$$g(x) = \inf_{y\in C}f(x,y)$$
is convex.

Proof. For $\forall \epsilon > 0$, there exists $y_1,y_2\in C$ such that
$g(x_1)\leqslant f(x_1,y_1)\leqslant g(x_1)+\epsilon$ and 
$g(x_2)\leqslant f(x_2,y_2)\leqslant g(x_2)+\epsilon$. 

Hence,
$$\begin{aligned}\alpha g(x_1) + (1-\alpha)g(x_2)&
\geqslant \alpha(f(x_1,y_1)-\epsilon) + (1-\alpha)(f(x_2,y_2)-\epsilon)\\ &
\geqslant f(\alpha x_1+(1-\alpha)x_2, \ \alpha y_1+(1-\alpha)y_2) - \epsilon\\ &
\geqslant \inf_{y\in C}f(\alpha x_1+(1-\alpha)x_2,y) - \epsilon.\end{aligned}
$$
Take the limit and we obtain 
$$\alpha g(x_1) + (1-\alpha)g(x_2)  
\geqslant \inf_{y\in C}f(\alpha x_1+(1-\alpha)x_2,y) = g(\alpha x_1+(1-\alpha)x_2).$$

#### Examples 

1. Distance to a convex set ${\rm dist}_S(x) = \inf_{y\in S}\Vert x - y\Vert$.

### Perspective Function

If $f(x):\ \mathbb R^n \rightarrow \mathbb R$ is convex and $t\in \mathbb R$, then the function 
$$g(x,t) = tf\left(\frac{x}{t}\right):\ \mathbb R^n\times \mathbb R_{++}\rightarrow \mathbb R$$ 
is convex.

Proof. Let $x = x_1+\alpha (x_2 - x_1)$ and $t = t_1 + \alpha (t_2 - t_1)$. Note that 
$(1-\alpha)t_1 + \alpha t_2 = t$, we get
$$g(x,t) = tf\left(\frac{x}{t}\right)
=tf\left(\frac{(1-\alpha)t_1}{t}\cdot \frac{x_1}{t_1}+\frac{\alpha t_2}{t}\cdot \frac{x_2}{t_2} \right)
\leqslant 
(1-\alpha)t_1 f\left(\frac{x_1}{t_1}\right)+\alpha t_2 f\left(\frac{x_2}{t_2}\right).
$$


#### Examples 

1. Relative entropy $g(x,t) = t\log \frac {t}{x}$ is convex.
2. Quadratic over linear $g(x,t) = \frac{x^Tx}{t}$ is convex.

### Linear Restriction

$f(x):\ \mathbb R^n\rightarrow \mathbb R$ is convex iff for any $x_0\in {\rm dom}(f)$ and $h\in \mathbb R^n$, the linear restricted function $$ f_{x_0,h}(t) = f(x_0 + th):\ \mathbb R\rightarrow \mathbb R$$
 is convex in $t$.

Proof. $\Rightarrow$ is trivial. As for the  $\Leftarrow$ side:

$$f(x+\alpha (y-x)) = f_{x,y-x}(\alpha) \leqslant (1-\alpha)f_{x,y-x}(0) +\alpha f_{x,y-x}(1) =
(1-\alpha)f(x) + \alpha f(y).$$

### Conjugate

If a function $f$ is convex, then its **conjugate** defined by
$$f^*(y) = \sup_{x\in {\rm dom}(f)} (y^Tx - f(x))$$
is also convex.

Proof: Each $y^Tx - f(x)$ is convex with regard to $y$, so the supremum is convex.

### Dual Norm
Define the dual norm of a norm $\Vert \cdot \Vert$ by 
$$\Vert x\Vert_* = \sup \{z^Tx:\quad \Vert z\Vert\leqslant 1\} = \sup \{|z^Tx|:\quad \Vert z\Vert\leqslant 1\}=\sup_{z\neq 0}\frac{z^Tx}{\Vert z\Vert}.$$

Then one can verify that 
* Dual norm $\Vert \cdot\Vert_*$ is indeed a norm.
* For arbitrary $x,y$ we have $x^Ty\leqslant \Vert x\Vert \Vert y\Vert_*$.
* In a finite dimensional space, the dual of the dual is the origin, $\Vert \cdot \Vert_{**} = \Vert \cdot \Vert$.
* The conjugate of a norm is related to its dual norm: Let $f(x) = \Vert x \Vert$ and 
$$f^*(x) = \left\{\begin{array}{ll}0 & \Vert x\Vert_*\leqslant 1,\\ 
+\infty  & \Vert x\Vert_*>1.\end{array}\right.$$

Proof: 
* For the first one, $\Vert x\Vert_*\geqslant 0$ is ensured because one can easily choose $z$ (according to the entries of $x$) such that $z^Tx\geqslant 0$ and thus $\sup z^Tx\geqslant 0$. And apparently $\Vert x\Vert_* = 0$ iff all entries of $x$ are zero. The homogenousity is trivial. The triangular inequality can be obtained by 
$$\Vert x\Vert_*+\Vert y\Vert_*\geqslant \sup \{z^Tx + z^Ty:\quad \Vert z\Vert = 1\} =\Vert x+y\Vert_*.$$

* The second one is trivial from the definition. 

* For the third, on the one hand we have 
$$\Vert x\Vert_{**} = \sup \{z^Tx:\quad \Vert z\Vert_*\leqslant 1\}
\leqslant \sup \{\Vert x\Vert \Vert z\Vert_*:\quad \Vert z\Vert_*\leqslant 1\}\leqslant\Vert x\Vert.$$
On the other hand, .... (not yet solved)

* Lastly, if $\Vert x\Vert_*\leqslant 1$, then $z^Tx  \leqslant \Vert x\Vert_* \Vert z\Vert = \Vert z\Vert$ and hence 
  $$f^*(x) = \sup_z \{z^Tx - \Vert z\Vert\}\leqslant 0.$$
The equality can be achieved at $z = 0$. Conversely if $\Vert x\Vert_*>1$, there exists some $z$ such that 
$\frac{z^Tx}{\Vert z\Vert}=\Vert x_*\Vert >1$. Hence, let $r\rightarrow +\infty$ we obtain
$$(rz)^Tx - \Vert rz\Vert = r\left(z^Tx - \Vert z \Vert\right)\rightarrow +\infty.$$

#### Examples 


* For **vector norms**, Lp norms $\Vert \cdot\Vert_p$ and $\Vert \cdot \Vert_q$ are dual if $\frac 1p+\frac1q = 1$.

Proof: It is guaranteed by Holder's inequality, $\Vert z\Vert_p\Vert x\Vert_q \geqslant z^Tx$, and thus 
$$\Vert x\Vert_p^*\leqslant \sup\{\Vert x\Vert_q\Vert z\Vert_p:\quad \Vert z\Vert_p\leqslant 1\}=\Vert x\Vert_q$$
with inequality at $z = x^\frac qp/\Vert x^\frac qp\Vert_p$ or $x = 0$.

* As matrix norms, $\Vert \cdot\Vert_2$ and the nuclear norm are dual, where the inner product of two matrices are defined by $A\bullet B =\sum a_{ij}b_{ij}= {\rm tr}(A^TB)$. (Nuclear norm is the sum of singular values.)

Proof: [[3](https://math.stackexchange.com/questions/1158798/show-that-the-dual-norm-of-the-spectral-norm-is-the-nuclear-norm)] On the one hand, let $A = UDV^T$ be the SVD where $D$ is all the singular values of $A$, and  then 
$$\begin{aligned}\Vert A\Vert_* &= \sup_{\Vert B\Vert_2\leqslant 1}A\bullet B
=\sup_{\Vert B\Vert_2\leqslant 1}{\rm tr}(VDU^TB)
=\sup_{\Vert B\Vert_2\leqslant 1}{\rm tr}(D(U^TBV))
\leqslant \sup_{\Vert C \Vert_2\leqslant 1}{\rm tr}(DC)\\
&=\sup_{\Vert C\Vert_2\leqslant 1} {\rm diag}(D)^T{\rm diag}(C).
\end{aligned}$$

Now we recall that all entries of $C$ is bounded by $\Vert C\Vert_2=1$, so $\sup_{\Vert C\Vert_2\leqslant 1} {\rm diag}(D)^T{\rm diag}(C)\leqslant {\rm tr}(D)$. 

On the other hand, the equality is achievable when $B= VU^T$, so we conclude that the dual of the spectral norm is the nuclear norm.