Skip to content

MATHEMATICS FOR NEURAL NETWORK

ZicolinPower edited this page Apr 30, 2023 · 1 revision

Problem 1 Let C : $\mathbb{R}^{d} → \mathbb{R}$ be a differentiable function and $x_{0} ∈ \mathbb{R}^{d}$. Solve the optimization problem

\begin{align*} \min_{y \in \mathbb{R}^d: ||y||=1} y^T \cdot \nabla C(x_0) \end{align*}

Solution:

To solve the optimization problem \begin{align*} \min_{y \in \mathbb{R}^d: ||y||=1} y^T \cdot \nabla C(x_0) \end{align*}

Using the fact that for any unit vector $y \in \mathbb{R}^d$, the dot product

\begin{align*} y^T \cdot \nabla C(x_0) \end{align*}

is maximized when $y$ points in the same direction as $\nabla C(x_0)$, and minimized when $y$ points in the opposite direction.

This is because the dot product measures the projection of $\nabla C(x_0)$ onto the unit vector $y$.

Therefore, In obtaining the the optimal $y$ that minimizes the function, we will be using a lagrange multiplier to solve this problem:

We define our lagrange $$L(y,\lambda) = y^T \cdot \nabla C(x_0)+\lambda(||y||-1)$$ $\lambda\in R$, $y\in R^d$

$$\frac{\partial L(y,\lambda)}{\partial y} = \nabla_{y} C(x_0)+\lambda\frac{\partial || y ||}{\partial y}=\nabla_{y} C(x_0)+\lambda\frac{y}{|| y ||}$$ $$\frac{\partial L(y,\lambda)}{\partial \lambda} = ||y|| - 1$$

Now, we set the two derivative to zero, $$\frac{\partial L(y,\lambda)}{\partial y} =\frac{\partial L(y,\lambda)}{\partial \lambda} =0$$ Evaluating $\frac{\partial L(y,\lambda)}{\partial \lambda} =0$ yield $||y|| =1$.

Putting this value in $\frac{\partial L(y,\lambda)}{\partial y}=0$ gives $$y=-\frac{\nabla_{y} C(x_0)}{\lambda}.$$

Let's calculate the value of $\lambda$

Now, we evaluate $||y|| =1$, $$||-\frac{\nabla_{y} C(x_0)}{\lambda}||=1 $$ since $lambda$ is a constant, we write\ $$\frac{1}{\lambda}|| -\nabla_{y} C(x_0) ||=1$$ $$\implies \lambda = ||\nabla_{y} C(x_0) ||$$

Therefore, the solution to the optimization problem is

\begin{align*} y^{'} = - \frac{\nabla C(x_0)}{||\nabla C(x_0)||}, \end{align*}

and the optimal value is

\begin{align*}

  • ||\nabla C(x_0)||. \end{align*}

Clone this wiki locally