# Gauss Newton

Gauss-Newton is an algorithm for solving [non-linear least squares](https://en.wikipedia.org/wiki/Non-linear_least_squares) problems. It is an [SQP](https://en.wikipedia.org/wiki/Sequential_quadratic_programming) method which means it is iterative algorithm and that it opperates on the [QP](https://en.wikipedia.org/wiki/Quadratic_programming) subproblem. 

## Introduction: 

Consider that one has a residual function $r(\theta)$ that represents the error of each data point with the respect to the model parameters($\theta$). Now this function may be non-linear and its nature may be unpredictable. The idea of Gauss Newton is to approximate this function of unknown characteristics with a quadratic. This quadratic is formulated using the taylor series expansion of a matrix based function. Now, since the residual function represents error, naturally the overall goal is to minimize this function. As we know from basic calculus, taking the derivative of a function and setting it equal to zero will yield an extrema(possible local or global maxima or minima) of that function. 

### Implementation:

The following is the 2 term taylor series expansion of a multiple variable function. 

$$ T(x) = f(a) + (x-a)^T \nabla f(a) + \frac{1}{2!}(x-a)^T (\nabla^2 f(a))(x-a)   $$

Next substitute $a = \theta-p$ and the expression becomes:

$$ T(\theta + p) = f(\theta) + (p)^T \nabla f(\theta) + \frac{1}{2!}p^T (\nabla^2 f(\theta))( p)   $$

Recall, to minimize a function(with respect to a parameter $p$), one takes the derivative(with respect to the parameter $p$) and sets it equal to zero. Finally, one solves for that parameter . In this case, we are solving for the step($p$) that takes the function to a minimum. 

$$ 0 = \nabla f(\theta) + \nabla^2 f(\theta) p$$
$$ p = -(\nabla^2 f(\theta))^{-1} \nabla f(\theta)  $$

The first and second order derivatives of a residual function can be approximated as follows: 

$$ \nabla f = J^T r $$
$$ \nabla^2 f \approx J^T J $$

Which yields the following step:

$$ p = -(J^T J)^{-1} J^T r $$
$$ \theta_{k+1} = \theta_{k} + p $$

It follows then if one updates iteratively $ \theta $ by $p$, one would reach the minimum error. The following is some basic code to implement this concept in matlab/octave: 

```matlab
function theta = gaussNewton(J,y,theta,itLimit)

for i = 1:itLimit; 
    r = y - (-J*theta); % Calculate residual
    g = (J')*r;
    Hinv = pinv((J')*J);
    p = -Hinv*g;
    theta = theta + p; % Update Theta
end

end
```



#### Limitations: 

It is important to note that this method is intrinsically not perfect because of the following reasons:
1. A taylor series expansion is an approximation. It becomes less accurate the further $x$ is away from $a$. 
2. This method relies on an initial guess of $\theta_0$. The performance of the algorithm is subject to that initial guess.
3. $J^T J$ is an approximation of the hessian and $J^T$ itself is often approximated as well. 
4. This method is often is operating on nonlinear objective functions which are not convex. 

Practically speaking these limitations mean the following: 
1. With a poor initial guess, the algorithm may not converge. 
2. With a poor initial guess, the algorithm may not reach a desired global optimum. 
3. With a highly nonlinear function, the algorithm may be incapable of reaching the desired global optimum.




