# Non-linear Systems

(This is where iterative methods become especially useful.)

### Outline
- Review Jacobians
- Motivating example - clustered data
- Nonlinear systems of equations
- Picard's method
- Newton's method
- Inexact Newton methods
- Line search
- Semi-smooth Newton's method

## Jacobians

**Gradient**:  Let $f: \mathbb{R}^m \rightarrow \mathbb{R}$.  The gradient of $f$ at $x$ in $\mathbb{R}^m$ (if it exists) is a vector $g$ such that
$$\lim_{h \rightarrow 0} \frac{\left | f(x+h)-f(x) - \langle g,h \rangle \right |}{\| h \|} = 0$$

*Example*: take $f(z)=z_1 z_2 z_3 \dots z_m$.  At a point $x$, the gradient of $f$, call it $g$, has elements
$$\begin{align}
g_1 &=x_2 x_3 \dots x_m \\
g_n &=x_1 x_2 \dots x_{n-1} x_{n+1} \dots x_m \\
g_m &= x_1 x_2 \dots x_{m-1}
\end{align}$$

In [5]:
Fz <- function(z) {prod(z)}
x <- c(1,2,3)
Fz(x)

In [11]:
g <- function(z) {
    nz <- length(z)
    gz <- rep(NA,nz)
    for (i in 1:nz) {
        gz[i] <- prod(z[-i])
    }
    return(gz)
}
print(g(x))

[1] 6 3 2


The gradient, $g$, indicates the direction of fastest/steepest increase in $f$ at point $x$.

Going back to our definition, we have
$$\begin{align}
& \lim_{h \rightarrow 0} \frac{\left | f(x+h)-f(x) - \langle g,h \rangle \right |}{\| h \|} \\
= & \lim_{h \rightarrow 0} \frac{\left | (x_1+h_1)(x_2+h_2)\dots (x_m+h_m) - 
 x_1x_2\dots x_m - \sum_{k=1}^m h_k \prod_{i \neq k} x_i \right |}{\sqrt{\sum{h_i^2}}} \\
& \frac{\left | \sum{\sum{O(h_k h_i)}} \right |}{\sqrt{\sum{h_i^2}}}
\end{align} $$

Why does this go to zero?

**Homework**:  find a function whose partial derivatives exist, but whose gradient does not exist at some point.

The [Wikepedia example is](https://calculus.subwiki.org/wiki/Existence_of_partial_derivatives_not_implies_differentiable#Example_of_a_function_for_which_the_partial_derivatives_exist_but_it_is_not_continuous)

$$f(x,y)=\frac{xy}{x^2+y^2}$$

The gradient is given by
$$\nabla{f}=\left [ \frac{y}{x^2+y^2} - \frac{2x^2y}{(x^2+y^2)^2} ,
\frac{x}{x^2+y^2} - \frac{2xy^2}{(x^2+y^2)^2} \right ]$$

In [12]:
Fz <- function(z) {
    x <- z[1]
    y <- z[2]
    return(x*y/(x^2+y^2))
    }
x <- c(0,1)
Fz(x)

In [13]:
g <- function(z) {
    x <- z[1]
    y <- z[2]
    g1 <- y/(x^2+y^2)-2*x^2*y/(x^2+y^2)^2
    g2 <- x/(x^2+y^2)-2*x*y^2/(x^2+y^2)^2
    return(c(g1,g2))
}
print(g(x))

[1] 1 0


In [16]:
x <- c(0,0)
print(Fz(x))
print(g(x))

[1] NaN
[1] NaN NaN


### Jacobians

**Jacobian**: Let $f: \mathbb{R}^m \rightarrow \mathbb{R}^n$.  The *Jacobian* at a point $x \in \mathbb{R}^m$ (if it exists) satisfies
$$\lim_{h \rightarrow 0} \frac{\left \| f(x+h)-f(x) - J(h) \right \|_{(n)}}{\| h \|_{(m)}} = 0 $$

**Example**: Consider the function $f(z): \mathbb{R}^m \rightarrow \mathbb{R}^{m-1}$ where the elements of $z$ are given by $f_k(z_k)=z_kz_{k+1}$.

Then the *Jacobian* elements at $x$ are given by:

$$ J_{ij} = \left \{ \begin{array}{rl} 0, & i\neq j, i+1 \neq j \\
x_{i+1}, & i=j \\
x_i, & i+1=j
\end{array} \right. $$


In [19]:
Fz <- function(z) {
    nz <- length(z)
    f <- rep(NA, nz-1)
    for (i in 1:(nz-1)){
        f[i] <- z[i]*z[i+1]
    }
    return(f)
    }
x <- c(1:5)
print(Fz(x))

[1]  2  6 12 20


In [25]:
J <- function(z) {
    nz <- length(z)
    g <- matrix(0, ncol=nz, nrow=nz-1)
    for (i in 1:(nz-1)){
        g[i,i] <- z[i+1]
        g[i,i+1] <- z[i]
    }
    return(g)
}
print(J(x))

     [,1] [,2] [,3] [,4] [,5]
[1,]    2    1    0    0    0
[2,]    0    3    2    0    0
[3,]    0    0    4    3    0
[4,]    0    0    0    5    4


**Homework**:  What is the relationship between the Jacobian and the gradient?  Each row of the Jacobian *is* a gradient.