# Newton

The GD methods I have presented are also called first-order methods, because the solution is based on the first derivative of the function. Newton's method is a second-order method, meaning the solution requires taking into account the second derivative.

Recall that, up to this point, we have always solved the derivative of the loss function equal to zero to find local minimun points. (And in many cases, we have treated the resulting solution as a solution to the problem of finding the minimum value of the loss function.) There is a famous algorithm for solving the $f(x) = 0$ problem, called Newton’s method.

### Newton's method for $f(x) = 0$

![newton](./img/NewtonIteration_Ani.gif)

The idea of solving $f(x) = 0$ using Newton's method is as follows. Starting from an initial point $x_0$, which is close to the solution $x^*$, the tangent line (or tangent plane in higher dimensions) to the graph $y = f(x)$ at $x_0$ is constructed. The intersection point $x_1$ of this tangent line with the $x$-axis is taken as a better approximation of the solution $x^*$. The algorithm iteratively updates the point $x_1$, and this process continues until $f(x_t) \approx 0$.

This geometric interpretation of Newton's method leads to a formula that can be iterated. The equation of the tangent line at $x_t$ is:

$$
y = f'(x_t)(x - x_t) + f(x_t)
$$


The intersection of this tangent line with the $x$-axis can be found by solving for $x$ where $y = 0$. This yields the formula:

$$
x = x_t - \frac{f(x_t)}{f'(x_t)} \triangleq x_{t+1}
$$

### Newton's Method for Finding Local Minima

Applying this method to solve $f'(x) = 0$, we have:

$$
x_{t+1} = x_t - \left(f''(x_t)\right)^{-1} f'(x_t)
$$

In higher dimensions, where $\theta$ is a vector, the update rule becomes:

$$
\theta = \theta - \mathbf{H}(J(\theta))^{-1} \nabla_\theta J(\theta)
$$

where $\mathbf{H}(J(\theta))$ is derivative in second order of $J(\theta)$, called the Hessian matrix. This equation is a matrix if $\theta$ is a vector. And $\mathbf{H}(J(\theta))^{-1}$ is the inverse of the Hessian matrix.

### Limited of Newton

- The initial point $x_0$ must be close to the solution $x^*$.