In [1]:
from sympy import *
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
import mpld3
mpld3.enable_notebook()
from sympy.plotting import plot
import ipywidgets as widgets
from ipywidgets import interact

In [2]:
%%html
<style> table {display: block} </style>

# Question 1

Solve the system

$$x - \frac{\sin(y)}{2} = 0.4,$$

$$y + \cos(1+x) = 0$$

by simple iteration!

Perform twelve steps, starting from $x_0 = 0.2, y_0 = -0.3$!

Prove the local convergence to the solution $x_* \approx 0.24, y_* \approx 0.32!$

How many steps guarantee an accuracy better than an $l_2$-error of $10^{-6}$?


## Solution

#### Note: Whenever simple iteration is mentioned, fixed-point iteration should come to mind

We proceed as always, reformulate the problem into a fixed-point form and check if the assumptions of the Banach's theorem hold.

$x = \overbrace{0.4 + \frac{\sin(y)}{2}}^{g_1}$

$y = \overbrace{- \cos(1+x)}^{g_2}$

$\rightarrow 
\begin{equation*}
\begin{pmatrix}
x^{(k+1)} \\
y^{(k+1)} \\
\end{pmatrix}
=
\begin{pmatrix}
0.4 + \frac{\sin(y^{(k)})}{2} \\
- \cos(1+x^{(k)}) \\
\end{pmatrix}
\end{equation*}$


#### Proof of local convergence

Proving the local convergence means showing that the Banach theorem assumptions hold within a suitable interval. For the single variable case, we checked this by differentiating the function and evaluating the differential within the chosen interval.

For multivariable systems of equations, we speak of the Jacobian and refer to the Ostrowski's theorem.

##### Ostrowski's Theorem
Consider $f : \mathbb{R}^n \rightarrow \mathbb{R}^n$ to be differentiable at $x^*$. If the spectral radius of the Jacobi matrix of f $\rho(J (x^∗)) < 1$ then $x^{(k+1)} = f(x^{(k)})$ is locally convergent at $x^*$. The spectral radius of a square matrix is the largest absolute value of its eigenvalues.

$\begin{equation*}
J = 
\begin{pmatrix}
\frac{\partial g_1}{x} & \frac{\partial g_1}{y} \\
\frac{\partial g_2}{x} & \frac{\partial g_2}{y} 
\end{pmatrix} = 
\begin{pmatrix}
0 & \frac{\cos(y)}{2} \\
\sin(1+x) & 0 
\end{pmatrix}
\overbrace{\rightarrow}^{@x^* = (0.24, -0.32)^T} 
\begin{pmatrix}
0 & 0.4746 \\
0.9458 & 0 
\end{pmatrix}
\end{equation*}$

Before proceeding, let's take a little detour
***
##### Vector Norms
Norms are distance measures. They defined as follows:

Let $x$ be a vector such that $x =(x_1, x_2 \cdots x_n)$, the $k$-norm of the vector is given as

$\Vert x\Vert_k = \left(\sum_{i=1}^n \vert x_i \vert^k\right)^{\frac{1}{k}} $

*******************
##### Matrix Norms

For any matrix $A$ in $\mathbb{R}^{n\times n}$, the following are the definitions some matrix norms

1-norm: $\Vert A\Vert_1 \max_j \sum_{i=1}^n \vert a_{ij} \vert$

2-Norm: $\Vert A\Vert_2 = \sqrt{\rho(A^TA})$

Frobenius norm: $\Vert A\Vert_F = \left(\sum_{i,j=1}^n \vert a_{ij} \vert^2\right)^{\frac{1}{2}} = \sqrt{tr(A^TA}) = \sqrt{tr(AA^T})$

$\infty$-norm: $\Vert A\Vert_1 \max_i \sum_{j=1}^n \vert a_{ij} \vert$

*******************
For any matrix norm $||⋅||_k$, we have $\rho(A) \le \Vert A\Vert_k$ and 
*******************

$\Vert J\Vert_{\infty} = 0.9458 \rightarrow$ which is less than 1 hence, locally convergent. 
***

#### To calculate the number of steps that guarantees an accuracy better than an $l_2$-error of $10^{-6}$
We use 

$\Vert x^{(k)} - x^{(*)}\Vert_2 \le \frac{L^k}{1-L} \Vert x^{(1)} - x^{(0)}\Vert_2$

where $L$ is calculated here as the matrix $2$-norm

$L = \Vert J\Vert_2 = \sqrt{\rho(J^TJ}) = \sqrt{\rho\left(
\begin{pmatrix}
0 & 0.4746 \\
0.9458 & 0 
\end{pmatrix}
\begin{pmatrix}
0 & 0.4746 \\
0.9458 & 0 
\end{pmatrix} \right)} = 
\sqrt{\rho\left(
\begin{pmatrix}
0.8945 & 0 \\
0 & 0.2253 
\end{pmatrix} \right)} = \sqrt{0.8945} = 0.9456$

Therefore,

$\Vert x^{(k)} - x^{(*)}\Vert_2 \le \frac{L^k}{1-L} \Vert x^{(1)} - x^{(0)}\Vert_2 \le 10^{-6} \rightarrow \frac{0.9458^k}{1-0.9458}\times 0.0814 \le 10^{-6}$

$\Vert x^{(1)} - x^{(0)}\Vert_2 = \left\Vert \begin{pmatrix}
0.0522 \\
-0.0624 
\end{pmatrix} \right\Vert_2 = 0.0814$

$0.9458^k \le 10^{-6}\times 0.6658 \rightarrow k \log (0.9458) \le \log(10^{-6} \times 0.6658)$

$\rightarrow k \ge \frac{\log(10^{-6} \times 0.6658)}{\log(0.9458)} \rightarrow k \ge 255.22 \rightarrow k \ge 256$

# Question 2

Solve the previous problem by Newton's method after reduction to a single scalar equation $f(x) = 0$.

Compare the classical Newton's method, the simplified method and the two variants of Regula-Falsi, considering the numbers of steps needed to reach a threshold of $|f(x)| \lt 10^{-9}$.

## Solution

$x = 0.4 + \frac{\sin(y)}{2}, y = - \cos(1+x) \rightarrow x = 0.4 + \frac{\sin(-\cos(1+x)}{2}$

$\rightarrow f(x) = x -0.4 + \frac{\sin(-\cos(1+x))}{2} = 0$

Using Newton's method:

$x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{f'(x^{(k)})} = x^{(k)} - \frac{x^{(k)} -0.4 + \frac{\sin(-\cos(1+x^{(k)}))}{2}}{1 +\frac{1}{2}(-\sin(1+x^{(k)}))\cos(cos(1+x^{(k)}))}$

$\rightarrow$ It needs 4 steps to reach $\vert f(x)\vert < 10^{-9} $ for $x^{(0)} = 0$

$\rightarrow x^* = 0.2408 \rightarrow y^* = -cos(1+x^*) = -0.324$

### Regula Falsi 1:

$x^{(k+1)} = x^{(k)} - f(x^{(k)}) \frac{x^{(k)} - x^{(0)}}{f(x^{(k)}) - f(x^{(0)})}$ $\rightarrow$ Requires two initial points $x^{(1)} = 0.1$ and $x^{(0)} = 0$

$k=1 \rightarrow x^{(2)} = x^{(1)} - f(x^{(1)}) \frac{x^{(1)}-x^{(0)}}{f(x^{(1)}) - f(x^{(0)})}$

Needs 7 steps. (Depending on the initial points)

### Regula Falsi 2:

$x^{(k+1)} = x^{(k)} - f(x^{(k)}) \frac{x^{(k)} - x^{(k-1)}}{f(x^{(k)}) - f(x^{(k-1)})}$ $\rightarrow$ Requires two initial points $x^{(1)} = 0.1$ and $x^{(0)} = 0$

$k=1 \rightarrow x^{(2)} = x^{(1)} - f(x^{(1)}) \frac{x^{(1)}-x^{(0)}}{f(x^{(1)}) - f(x^{(0)})}$

$k=2 \rightarrow x^{(3)} = x^{(2)} - f(x^{(2)}) \frac{x^{(2)}-x^{(1)}}{f(x^{(2)}) - f(x^{(1)})}$

$\rightarrow$ Needs 4 steps to reach $\vert f(x) \vert < 10^{-9} $ (Depending on two initial points)

### The simplified method: 

$x^{(k+1)} = x^{(k)} - f(x^{(k)}) \frac{x^{(1)} - x^{(0)}}{\underbrace{f(x^{(1)}) - f(x^{(0)})}_{\text{Derivative is only calculated at starting point}}}$  Requires two initial points

$k=1, x^{(2)} = x^{(1)} - f(x^{(1)}) \frac{x^{(1)}-x^{(0)}}{f(x^{(1)}) - f(x^{(0)})}$

$k=2, x^{(3)} = x^{(2)} - f(x^{(2)}) \frac{x^{(1)}-x^{(0)}}{f(x^{(1)}) - f(x^{(0)})}$

$\rightarrow$ Needs 5 steps (depends on the initial points)

Let's see visually how the different Regula-Falsi approximates the Newton Method

In [3]:
# first, a function to show the calculated gradients at for each iteration step


def newton_gradient_at_step(x, y, m):
    c = y - m * x

    y1 = m * x + c
    x2 = -c / m
    y2 = 0

    return [x, x2], [y1, y2]


def regula_falsi(x0, xn, f):
    y0 = f.evalf(subs={x: x0})
    yn = f.evalf(subs={x: xn})

    # calculate gradient and intercept
    m = (yn - y0) / (xn - x0)
    c = y0 - m * x0

    x2 = -c / m

    return [x0, x2], [y0, 0]

In [4]:
def newton_1D(ax, f, x0, n, s0=1, plot=True, gradient=False):
    # redefine function using s0
    fp = f.diff(x)

    xn = x0
    y = f.evalf(subs={x: x0})

    xn_list = [x0]
    yn_list = [y]

    for i in range(n):
        xn = xn - s0 * f.evalf(subs={x: xn}) / (fp.evalf(subs={x: xn}))
        yn = f.evalf(subs={x: xn})

        # update convergence values
        xn_list.append(xn)
        yn_list.append(yn)

        if gradient:
            # newton gradient
            xg, yg = newton_gradient_at_step(xn, yn, fp.evalf(subs={x: xn}))
            ax.plot(xg, yg, 'g--')

            # regula falsi 1
            xg_rf1, yg_rf1 = regula_falsi(x0, xn, f)
            ax.plot(xg_rf1, yg_rf1, 'k--')

            # regula falsi 2
            xg_rf1, yg_rf1 = regula_falsi(xn_list[i], xn_list[i + 1], f)
            ax.plot(xg_rf1, yg_rf1, 'y--')

            # simplified
            # calculate gradient once
            x0 = xn_list[0]
            x1 = xn_list[1]
            y0 = f.evalf(subs={x: x0})
            y1 = f.evalf(subs={x: x1})

            m = (y1 - y0) / (x1 - x0)
            xg_s, yg_s = newton_gradient_at_step(xn, yn, m)
            ax.plot(xg_s, yg_s, 'c--')

    if plot:
        # plot function
        z = np.linspace(-2, 2, 100)
        fpoints = evaluate_func(z, f)
        ax.plot(z, fpoints, 'b')

        ax.plot(xn_list, yn_list, 'ro--')

    plt.show()
    return xn, xn_list


def create_figure():
    # create figure
    fig, ax = plt.subplots(1, 1)
    fig.set_figheight(6)
    fig.set_figwidth(6)

    # some display settings
    ax.spines['left'].set_position('zero')
    ax.spines['right'].set_color('none')
    ax.spines['bottom'].set_position('zero')
    ax.spines['top'].set_color('none')

    return fig, ax


def evaluate_func(z, f):
    fpoints = [f.evalf(subs={x: i}) for i in z]

    return fpoints

In [10]:
# make use of the functions
plt.close()
x = symbols("x")
f = (1 - (x)**2)
x0 = 2
n = 2
s0 = 1.1

# create widgets for interaction
x0 = widgets.FloatSlider(min=-2, max=3, value=2, step=.001)
s0 = widgets.FloatSlider(min=0, max=1, value=1, step=.001)

# create figure
fig, ax = create_figure()


@interact
def interactive(x0=x0, s0=s0):
    # option 2, remove all lines and collections
    for artist in plt.gca().lines + plt.gca().collections:
        artist.remove()

    # newton
    x_n, xn_list = newton_1D(ax, f, x0, n, s0, True, True)

    # recompute the ax.dataLim
    ax.relim()
    # update ax.viewLim using the new dataLim
    ax.autoscale_view()

<IPython.core.display.Javascript object>

interactive(children=(FloatSlider(value=2.0, description='x0', max=3.0, min=-2.0, step=0.001), FloatSlider(val…

# Question 3

Give a graphical interpretation of the system

$$4x_1 - x_2 + x_1^2 + x_2^2 - 4 = 0$$

$$-8x_2+e^{x_1} = 0$$


## Solution

The form that the second equation takes, I hope, is a bit more intuitive to guess than the first equation. Everyone knows the good old positive exponential curve. Rearranging a bit results in

$x_2 = \frac{1}{8}e^{x_1}$

The less intuitive curve to devipher, however, is the first equation but we have a clue as to what it may look like. That's right, it's one of the conic (circle, hyperbola, parabola or ellipse) curves. But how do we know? We know by the presence of the squares in the equation. Let's rearrange a bit to


$x_1^2 + 4x_1 + x_2^2 - x_2  - 4 = 0$

We see that the square terms both have positive signs hinting at an ellipse or circle. We write the general equation for an ellipse and not that for a circe because, well, a circle is a special ellipse. We position the ellipse at any random position $(a, b)$.

$\left(\frac{x_1-a}{h}\right)^2 + \left(\frac{x_2-b}{k}\right)^2 - r^2 = 0$

$\rightarrow k^2x_1^2 - 2ak^2x_1 + h^2x_2^2 - 2bh^2x_2  - (hrk)^2 + (ak)^2 + (bh)^2 = 0$

Comparing to our original equation, we find that

$k, h = 1, a = -2, b = \frac{1}{2}, r = \sqrt{\frac{33}{4}}$

Since $k, h = 1$, the equation is that of a circle with radius $r = \sqrt{\frac{33}{4}}$ and the equation is simplified to 

$(x_1+2)^2 + (x_2-\frac{1}{2})^2 = \frac{33}{4}$

### Plotting

In [9]:
plt.close()
x, y = symbols("x y")

# for display purposes, a radius of 9 is used instead of 33/4. Remember, floating point numbers problem
y1 = (9 - (x + 2)**2)**0.5 + 0.5
y2 = -(9 - (x + 2)**2)**0.5 + 0.5
y3 = 1 / 8 * exp(x)

z = np.linspace(-2 - 3, -2 + 3, 100)
z2 = np.linspace(-6, 2, 100)
y1p = [y1.evalf(subs={x: i}) for i in z]
y2p = [y2.evalf(subs={x: i}) for i in z]
y3p = [y3.evalf(subs={x: i}) for i in z2]

# create figure
fig, ax = create_figure()

# some display settings
ax.set_xlim(-6, 2)
ax.set_ylim(-3, 5)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")

# plotting
ax.plot(z, y1p, 'b')
ax.plot(z, y2p, 'b')
ax.plot(z2, y3p, 'g')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1ef1c07ec48>]

# Question 4

Find all solutions to the previous system!

Apply simple iteration (Banach's principle) in suitable form, starting from $(1, 1)^T$, to find a solution around $x = (0.87, 0.3)^T$.

Using other initial guesses, (e.g. $x^{(0)} = (-3, -2)^T$ or $x^{(0)} = (-5, 0)^T$ and applying Newton's method, find a different solution!

## Solution 

### Simple Iteration

The set of equations is rearranged to the floating point iteration form


$\rightarrow 
\begin{equation*}
\begin{pmatrix}
x_1^{(k+1)} \\
x_2^{(k+1)} \\
\end{pmatrix}
=
\begin{pmatrix}
\frac{1}{4}(x^{(k)}_k - x_1^{2(k)} - x_2^{2(k)} + 4 \\
\frac{1}{8} e^{x_1^{(k)}} \\
\end{pmatrix}
\end{equation*}$

Bear in mind that an alternative form, $x_1^{(k+1)} = \ln(8x_2^{(k)})$ does not work. It is important to have the conditions for the simple iteration to work in mind always. Always do a quick check.

$(x_1^{(0)}, x_2^{(0)}) = (1, 1) \cdots (x_1^{(19)}, x_2^{(19)}) = (0.86509, 0.2969) $ (Residual $< 10^{-6}$)

### Newton method

$X^{(k+1)} = X^{(k)} - J^{-1}(X^{(k)}) F(X^{(k)})$

where $X = 
\begin{pmatrix} x_1 \\
x_2 
\end{pmatrix}, 
F(X) = \begin{pmatrix}
4x_1 - x_2 + x_1^2 + x_2^2 - 4 \\
-8x_2+e^{x_1}\\
\end{pmatrix}$

$J(X) = 
\begin{pmatrix}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} \\
\frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2}
\end{pmatrix}
=
\begin{pmatrix}
4+ 2x_1 & -1+2x_2 \\
e^{-x_1} & -8
\end{pmatrix}
$

$J^{-1}(X) = \frac{1}{-8(4+ 2x_1)-e^{-x_1}(-1+2x_2)}
\begin{pmatrix}
-8 & 1-2x_2 \\
-e^{-x_1} & 4 + 2x_1
\end{pmatrix}
$

For $(x_1^{(0)}, x_2^{(0)}) = (-3, -2)$, 

 $
\begin{pmatrix}
x_1^{(1)}\\
x_2^{(1)}
\end{pmatrix} =
\begin{pmatrix}
-3 \\ -2
\end{pmatrix} - 
\begin{pmatrix}
-0.4923 & 0.3077\\
-0.0031 & -0.1231
\end{pmatrix}
\begin{pmatrix}
-1 \\
16.0498
\end{pmatrix} = 
\begin{pmatrix}
-8.431\\
-0.0276
\end{pmatrix}$

$
\begin{pmatrix}
x_1^{(7)}\\
x_2^{(7)}
\end{pmatrix} =
\begin{pmatrix}
-4.8286 \\ 
0.001
\end{pmatrix}$

For $(x_1^{(0)}, x_2^{(0)}) = (-5, 0)$,

$
\begin{pmatrix}
x_1^{(4)}\\
x_2^{(4)}
\end{pmatrix} =
\begin{pmatrix}
-4.8286 \\ 
0.001
\end{pmatrix}$

# Question 5

Consider the minimization problem

$$f(x) = \frac{1}{2} \sum_{i=1}^n (x_1 + x_2 e^{-x_3t_i} - s_i)^2 = \min $$

by solving the three-dimensional nonlinear system $\nabla f(x) = 0$!

The given data are $n = 4, t = (0, 1, 2, 3), s= (0.2, 0.6, 0.8, 0.9)$.

Find a reasonable initial guess for $x\in \mathbb{R}^3$ and give a graphical interpretation of the solution!

## Solution

So much is going on in the question. Let's take a step back, shall we?
***
### Curve fitting
Let's try to understand what is going on in the right hand side of the equation. For this we look at the simple problem of fitting a straight line to a couple of points, say 4 points given.

<img align="right" width="500" height="400" src="https://blog.mbedded.ninja/mathematics/curve-fitting/linear-curve-fitting/linear_curve_fitting_graph_of_points_and_line_hub1a21f251f8992c40bc48170862c1506_29573_600x0_resize_catmullrom_2.png">

How do we approach this problem? We could start by drawing as many lines as possible but how do we determine the best of these lines? Yeah, that's right, a metric of some sort.

For such problems, we seek a straight line $y = mx+c$ such that some quantity is minimized. This quantity is the sum of the deviation of the proposed solution from the original solution. This deviation is defined by some suitable metric (norms).

We define a straight line 

$$\tilde y = mx+c$$

The problem is therefore to find such a $\tilde y$ such that the error is minimized.

$$\min \sum_{i=1}^n \Vert \tilde y - y_i\Vert_n$$

$$\rightarrow \min \sum_{i=1}^n \Vert mx_i + c - y_i\Vert_n$$

The term $\Vert mx_i + c - y_i\Vert_n$ is commonly known as the residual so we can designate it as $r(\textbf{x})$ brevity.

Comparing this to the question, we can say that the problem is basically fitting a curve defined as $x_1 + x_2 e^{-x_3t}$ on point pairs $(t_i, s_i)$. The aim is therefore to find $x_1, x_2, x_3$ for which the metric 

$$\sum_{i=1}^n (x_1 + x_2 e^{-x_3t_i} - s_i)^2 = min$$

If we replace $x_1 + x_2 e^{-x_3t_i} - s_i$ with $r_i(\textbf{x})$, we have

$$\sum_{i=1}^n \Vert r_i(\textbf{x})\Vert_2^2 = \sum_{i=1}^n r_i(\textbf{x})^2 = \langle r(\textbf{x}), r(\textbf{x})\rangle = r(\textbf{x})^Tr(\textbf{x})$$

In other words, the square of the 2-norm is to be minimized. Therefore, $f(\textbf{x}) = \frac{1}{2} r(\textbf{x})^Tr(\textbf{x})$



### Minimization
The problem is a minimization problem. Minimizing a certain function $f(x)$ which is defined above. We already know something about minimization, sort of, yeah? We've definitely encountered problems requiring us to find the minimum or maximum of a function.

What can we recall? At turning points (minimum or maximum) the differential of the function is zero at that point.

$$f'(x) = 0$$

so our problem is simple. We are no strangers to differentiating multivariate functions.

$$\nabla f(\textbf{x}) = \nabla \left(\frac{1}{2} r(\textbf{x})^Tr(\textbf{x})\right) = 0$$

From matrix calculus, $\nabla (r^Tr) = 2 r^T \nabla r$ This can be arrived at using chain rule but one has to be careful with the order of arrangement.

Therefore,

$$\nabla f(\textbf{x}) = r^T \nabla r = 0$$

where $\textbf{x}$ has been dropped from $r_i$ for brevity.

We're finally at the point where it's business as usual using Newton's method to iterate to a solution. If we let $\nabla f(\textbf{x})$ be $g(\textbf{x})$, then our Newton iteration is written as

$$\textbf{x}^{(k+1)} = \textbf{x}^{(k)} - J^{-1}(\textbf{x}^{(k)})g(\textbf{x}^{(k)})$$

$$J(\textbf{x}^{(k)}) = \nabla (\nabla f(\textbf{x})) = \nabla(r^T \nabla r)$$

Using chain rule,

$$J(\textbf{x}^{(k)}) = (\nabla r)^T \nabla r + r^T \nabla^2 r$$

$$\rightarrow \textbf{x}^{(k+1)} = \textbf{x}^{(k)} - ((\nabla r(\textbf{x}^{(k)}))^T \nabla r(\textbf{x}^{(k)}) + r_i^T(\textbf{x}^{(k)}) \nabla^2 r(\textbf{x}^{(k)}))^{-1} r^T(\textbf{x}^{(k)}) \nabla r(\textbf{x}^{(k)})$$

Well, would you look at that! A bit messy isn't it? It's a good thing then that there is an approximation for it.

$$\rightarrow \textbf{x}^{(k+1)} = \textbf{x}^{(k)} - ((\nabla r(\textbf{x}^{(k)}))^T \nabla r(\textbf{x}^{(k)}) + r^T(\textbf{x}^{(k)}))^{-1} r^T(\textbf{x}^{(k)}) \nabla r(\textbf{x}^{(k)})$$

$\nabla^2 r(\textbf{x}^{(k)})$ is omitted because for small residual problems. This formulation is known as the Gauss-Newton Method for obvious reasons. https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm

It is usually seen in this form 

$$x^{(k+1)} = x^{(k)} - (\nabla^T r(x^{(k)}) \nabla r(x^{(k)}))^{-1} \nabla r(x^{(k)})^T r(x^{(k)})$$

The form only depends on how you choose to write your vectors, column or row.

### Application

We're done with all the hard work. Now, for the application we only need to find $\nabla r$

For our problem: 
$
\begin{equation*}
r(x) = 
\begin{pmatrix}
x_1 + x_2e^{-x_3t_1} - s_1\\
x_1 + x_2e^{-x_3t_2} - s_2\\
x_1 + x_2e^{-x_3t_3} - s_3\\
x_1 + x_2e^{-x_3t_4} - s_4
\end{pmatrix}_{4 \times 1}
\end{equation*}$

$
\begin{equation*}
\nabla r(x) = 
\begin{pmatrix}
1&e^{-x_3t_1} & -t_1e^{-x_3t_1}\\
1&e^{-x_3t_2} & -t_2e^{-x_3t_2}\\
1&e^{-x_3t_3} & -t_3e^{-x_3t_3}\\
1&e^{-x_3t_4} & -t_4e^{-x_3t_4}
\end{pmatrix}_{4 \times 3} = 
\begin{pmatrix}
1& 1& 0\\
1& e^{-x_3}& -x_2e^{-x_3}\\
1& e^{-2x_3}& -2x_2e^{-2x_3}\\
1&e^{-3x_3} & -3x_2e^{-3x_3}
\end{pmatrix}_{4 \times 3}
\end{equation*}$

Taking an initial guess of $(x_1^{(0)}, x_2^{(0)}, x_3^{(0)})^T = (0, 1, 0)$, we have that $\nabla r$ evaluates to

$
\begin{equation*}
\nabla r(x) = 
\begin{pmatrix}
1& 1& 0\\
1& 1& -1\\
1& 1& -2\\
1& 1& -3
\end{pmatrix}_{4 \times 3}
\end{equation*}$

The resulting matrix is rank deficient because of two identical columns hence the method will fail. Taking $(x_1^{(0)}, x_2^{(0)}, x_3^{(0)})^T = (0, 1, 1)$ instead, we have

$
\begin{equation*}
\nabla r(x) = 
\begin{pmatrix}
1& 1& 0\\
1& e^{-1}& -1e^{-1}\\
1& e^{-2}& -2e^{-2}\\
1& e^{-3}& -3e^{-3}
\end{pmatrix}
\end{equation*}$

$x^{(1)} = 
\begin{pmatrix}
0\\
1\\
1
\end{pmatrix} - 
\left(
\begin{pmatrix}
1 & 1& 1& 1\\
1 & e^{-1}& e^{-2}& e^{-3}\\
0& -1e^{-1}& -2e^{-2}& -3e^{-3}
\end{pmatrix}
\begin{pmatrix}
1& 1& 0\\
1& e^{-1}& -1e^{-1}\\
1& e^{-2}& -2e^{-2}\\
1& e^{-3}& -3e^{-3}
\end{pmatrix}
\right)^{-1}
\begin{pmatrix}
1 & 1& 1& 1\\
1 & e^{-1}& e^{-2}& e^{-3}\\
0& -1e^{-1}& -2e^{-2}& -3e^{-3}
\end{pmatrix}
\begin{pmatrix}
1 - 0.2\\
e^{-1} - 0.6\\
e^{-2} - 0.8\\
e^{-3} - 0.9
\end{pmatrix}
$


$x^{(1)} = 
\begin{pmatrix}
0.9738\\
-0.7736\\
1.2459
\end{pmatrix}
\cdots
x^{(*)} = 
\begin{pmatrix}
1\\
-0.8\\
0.6931
\end{pmatrix}
$

### Animation

Run the code below to see an animation of the minimization process.

In [7]:
import time


# function to plot curve at each step
def plot(x1, x2, x3, ax):
    # plot original points
    ti = [0, 1, 2, 3]
    si = [0.2, 0.6, 0.8, 0.9]

    # calculate curve points
    t_list = np.linspace(0, 5, 20)
    fpoints = [x1 + x2 * exp(-x3 * t) for t in t_list]

    ax.clear()
    # plot original points
    ax.scatter(ti, si)

    # plot curve
    ax.plot(t_list, fpoints)

    time.sleep(1)

In [8]:
# first, we plot the points (ti, si) alongside our initial guess for the values (x1, x2, x3)

x1, x2, x3 = symbols("x1 x2 x3")
r1 = x1 + x2 - 0.2
r2 = x1 + x2 * exp(-x3) - 0.6
r3 = x1 + x2 * exp(-2 * x3) - 0.8
r4 = x1 + x2 * exp(-3 * x3) - 0.9

r = Matrix([[r1], [r2], [r3], [r4]])
nabla_r = Matrix([[r1.diff(x1), r1.diff(x2),
                   r1.diff(x3)], [r2.diff(x1),
                                  r2.diff(x2),
                                  r2.diff(x3)],
                  [r3.diff(x1), r3.diff(x2),
                   r3.diff(x3)], [r4.diff(x1),
                                  r4.diff(x2),
                                  r4.diff(x3)]])

# Gauss-Newton iteration
n = 6
x = np.array([[0], [1], [0.5]])

J = (nabla_r.T * nabla_r).inv() * (nabla_r.T * r)

# create figure
plt.close()
fig = plt.figure()
ax = fig.add_subplot(111)
plt.ion()

fig.show()
fig.canvas.draw()
plt.ion()

for i in range(n):
    rhs = np.array(
        J.evalf(subs={
            x1: x[0][0],
            x2: x[1][0],
            x3: x[2][0]
        }).tolist())
    x = x - rhs

    plot(x[0][0], x[1][0], x[2][0], ax)
    fig.canvas.draw()

<IPython.core.display.Javascript object>