Skip to content

Latest commit

 

History

History
executable file
·
268 lines (170 loc) · 14.3 KB

9_Numerical_Optimization.md

File metadata and controls

executable file
·
268 lines (170 loc) · 14.3 KB

Chapter 9 Numerical Optimization

Newton's Method In Optimization

Newton's method is an iterative method for finding the roots of a differentiable function https://latex.codecogs.com/svg.latex?F. Applying Newton's method to the derivative https://latex.codecogs.com/svg.latex?f^\prime of a twice-differentiable function https://latex.codecogs.com/svg.latex?f to find the roots of the derivative, (solutions to https://latex.codecogs.com/svg.latex?f^\prime(x)=0) will give us critical points (minima, maxima, or saddle points)

The second-order Taylor expansion of https://latex.codecogs.com/svg.latex?{\displaystyle f:\mathbb {R} \to \mathbb {R} } around https://latex.codecogs.com/svg.latex?{\displaystyle x_{k}} is:

https://latex.codecogs.com/svg.latex?{\displaystyle f(x_{k}+t)\approx f(x_{k})+f'(x_{k})t+{\frac {1}{2}}f''(x_{k})t^{2}.}

setting the derivative to zero:

https://latex.codecogs.com/svg.latex?{\displaystyle \displaystyle 0={\frac {\rm {d}}{{\rm {d}}t}}\left(f(x_{k})+f'(x_{k})t+{\frac {1}{2}}f''(x_{k})t^{2}\right)=f'(x_{k})+f''(x_{k})t,}

This will give us:

https://latex.codecogs.com/svg.latex?{\displaystyle t=-{\frac {f'(x_{k})}{f''(x_{k})}}.}

We start the next point https://latex.codecogs.com/svg.latex?x_{k+1} at where the second order approximation is zero, so https://latex.codecogs.com/svg.latex?x_{k+1}=t+x_{k}

By putting everything together:

https://latex.codecogs.com/svg.latex?{\displaystyle x_{k+1}=x_{k}-{\frac {f'(x_{k})}{f''(x_{k})}}.}

So we iterate this until the changes are small enough.

Gauss-Newton Algorithm

The Gauss-Newton algorithm is a modification of Newton's method for finding a minimum of a function. Unlike Newton's method, second derivatives, (which can be challenging to compute), are not required.

Starting with an initial guess https://latex.codecogs.com/svg.latex?{\displaystyle {\boldsymbol {\beta }}^{(0)}} for the minimum, the method proceeds by the iterations

https://latex.codecogs.com/svg.latex?{\displaystyle {\boldsymbol {\beta }}^{(s+1)}={\boldsymbol {\beta }}^{(s)}-\left(\mathbf {J_{f}} ^{\mathsf {T}}\mathbf {J_{f}} \right)^{-1}\mathbf {J_{f}} ^{\mathsf {T}}\mathbf {f} \left({\boldsymbol {\beta }}^{(s)}\right)}

The https://latex.codecogs.com/svg.latex?{\displaystyle \left(\mathbf {J_{f}} ^{\mathsf {T}}\mathbf {J_{f}} \right)^{-1}\mathbf {J_{f}} ^{\mathsf {T}}} is called Pseudoinverse. To compute this matrix we use the following property:

Let https://latex.codecogs.com/svg.latex?A be a https://latex.codecogs.com/svg.latex?m\times n matrix with the SVD:

https://latex.codecogs.com/svg.latex?A = U \Sigma V^T

and

https://latex.codecogs.com/svg.latex?A^+ = (A^T A)^{-1} A^T

Then, we can write the pseudoinverse as:

https://latex.codecogs.com/svg.latex?A^+ = V \Sigma^{-1} U^T

Proof:

https://latex.codecogs.com/svg.latex?\begin{align}
A^+ &= (A^TA)^{-1}A^T\\ 
&=(V\Sigma U^TU\Sigma V^T)^{-1} V\Sigma U^T \\
&=(V\Sigma^2 V^T)^{-1} V\Sigma U^T \\
&=(V^T)^{-1} \Sigma^{-2} V^{-1} V\Sigma U^T \\
&= V \Sigma^{-2}\Sigma U^T \\
&= V\Sigma^{-1}U^T\\
\end{align}

Refs: 1

Example of Gauss-Newton, Inverse Kinematic Problem

In the following we solve the inverse kinematic of 3 link planner robot.

vector_t q=q_start;
vector_t delta_q(3);


double epsilon=1e-3;

int i=0;
double gamma;
double stepSize=10;

while( (distanceError(goal,forward_kinematics(q)).squaredNorm()>epsilon)  && (i<200)  )
{
    Eigen::MatrixXd jacobian=numericalDifferentiationFK(q);
    Eigen::MatrixXd j_pinv=pseudoInverse(jacobian);
    Eigen::VectorXd delta_p=transformationMatrixToPose(goal)-transformationMatrixToPose(forward_kinematics(q) );

    delta_q=j_pinv*delta_p;
    q=q+delta_q;
    i++;
}

Full source code here

Quasi-Newton Method

Curve Fitting

You have a function https://latex.codecogs.com/svg.latex?y = f(x, \boldsymbol \beta) with https://latex.codecogs.com/svg.latex?n unknown parameters, https://latex.codecogs.com/svg.latex?\boldsymbol \beta=(\beta_1,\beta_2,...\beta_n) and a set of sample data (possibly contaminated with noise) from that function and you are interested to find the unknown parameters such that the residual (difference between output of your function and sample data) https://latex.codecogs.com/svg.latex?\boldsymbol r=(r_1,r_2,...r_m) became minimum. We construct a new function, where it is sum square of all residuals:

https://latex.codecogs.com/svg.latex?{\displaystyle S({\boldsymbol {\beta }})=\sum _{i=1}^{m}r_{i}({\boldsymbol {\beta }})^{2}.}

residuals are:

https://latex.codecogs.com/svg.latex?{\displaystyle r_{i}({\boldsymbol {\beta }})=y_{i}-f\left(x_{i},{\boldsymbol {\beta }}\right).}

We start with an initial guess for https://latex.codecogs.com/svg.latex?\boldsymbol {\beta }^0

Then we proceeds by the iterations:

https://latex.codecogs.com/svg.latex?{\displaystyle {\boldsymbol {\beta }}^{(s+1)}={\boldsymbol {\beta }}^{(s)}-\left(\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {J_{r}} \right)^{-1}\mathbf {J_{r}} ^{\mathsf {T}}\mathbf {r} \left({\boldsymbol {\beta }}^{(s)}\right)}

Example of Substrate Concentration Cuver Fitting

Lets say we have the following dataset:

i 1 2 3 4 5 6 7
[S] 0.038 0.194 0.425 0.626 1.253 2.500 3.740
Rate 0.050 0.127 0.094 0.2122 0.2729 0.2665 0.3317

And we have the folliwng function to fit the data:

https://latex.codecogs.com/svg.latex?y=\frac{\beta_1 \times x}{\beta_2+x}

So our https://latex.codecogs.com/svg.latex?\mathbf r_{2\times7} is:

https://latex.codecogs.com/svg.latex?\left\{\begin{matrix}
r_1=y_1 - \frac{\beta_1 \times x_1}{\beta_2+x_1}\\ 
r_2=y_2 - \frac{\beta_1 \times x_2}{\beta_2+x_2}\\ 
r_3=y_3 - \frac{\beta_1 \times x_3}{\beta_2+x_3}\\ 
r_4=y_4 - \frac{\beta_1 \times x_4}{\beta_2+x_4}\\ 
r_5=y_5 - \frac{\beta_1 \times x_5}{\beta_2+x_5}\\ 
r_6=y_6 - \frac{\beta_1 \times x_6}{\beta_2+x_6}\\ 
r_7=y_7 - \frac{\beta_1 \times x_7}{\beta_2+x_7}\\ 
\end{matrix}\right.

and the jacobian is https://latex.codecogs.com/svg.latex?\mathbf J_{7\times2}:

https://latex.codecogs.com/svg.latex?\left\{\begin{matrix}
\frac{\sigma r_i}{\sigma  \beta_1}= \frac{-x_i}{\beta_2+x_i}  
\\ 
\frac{\sigma r_i}{\sigma \beta_2} = \frac{\beta_1*x_i}{(\beta_2 \times x_i)^2}  
\end{matrix}\right.:

Now let's implement it with Eigen, First a functor that calculate the https://latex.codecogs.com/svg.latex?\mathbf r given https://latex.codecogs.com/svg.latex?\boldsymbol {\beta }:

struct SubstrateConcentrationFunctor : Functor<double>
{
    SubstrateConcentrationFunctor(Eigen::MatrixXd points): Functor<double>(points.cols(),points.rows())
    {
        this->Points = points;
    }

    int operator()(const Eigen::VectorXd &z, Eigen::VectorXd &r) const
    {
        double x_i,y_i,beta1,beta2;
        for(unsigned int i = 0; i < this->Points.rows(); ++i)
        {
            y_i=this->Points.row(i)(1);
            x_i=this->Points.row(i)(0);
            beta1=z(0);
            beta2=z(1);
            r(i) =y_i-(beta1*x_i) /(beta2+x_i);
        }

        return 0;
    }
    Eigen::MatrixXd Points;

    int inputs() const { return 2; } // There are two parameters of the model, beta1, beta2
    int values() const { return this->Points.rows(); } // The number of observations
};

Now we have to set teh data:

//the last column in the matrix should be "y"
Eigen::MatrixXd points(7,2);

points.row(0)<< 0.038,0.050;
points.row(1)<<0.194,0.127;
points.row(2)<<0.425,0.094;
points.row(3)<<0.626,0.2122;
points.row(4)<<1.253,0.2729;
points.row(5)<<2.500,0.2665;
points.row(6)<<3.740,0.3317;

Let's create an instance of our functor and compute the derivatves numerically:

SubstrateConcentrationFunctor functor(points);
Eigen::NumericalDiff<SubstrateConcentrationFunctor,Eigen::NumericalDiffMode::Central> numDiff(functor);

Set the initial values for beta:

Eigen::VectorXd beta(2);
beta<<0.9,0.2;

And the main loop:

    //βᵏ⁺¹=βᵏ- (JᵀJ)⁻¹Jᵀr(βᵏ)
    for(int i=0;i<10;i++)
    {
        numDiff.df(beta,J);
        std::cout<<"J: \n" << J<<std::endl;
        functor(beta,r);
        std::cout<<"r: \n" << r<<std::endl;
        beta=beta-(J.transpose()*J).inverse()*J.transpose()*r ;
    }
    std::cout<<"beta: \n" << beta<<std::endl;

Non Linear Least Squares

Non Linear Regression

Levenberg Marquardt

The Levenberg-Marquardt algorithm aka the damped least-squares (DLS) method, is used to solve non-linear least squares problems. The LMA is used in many mainly for solving curve-fitting problems. The LMA finds only a local minimum (which may not be the global minimum). The LMA interpolates between the Gauss-Newton algorithm and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

<< Previous Home