# Chapter 4. Numerical Computation

* 딥러닝 이론 [1]
* 김무성

# Contents
* 4.1 Overflow and Underflow
* 4.2 Poor Conditioning
* 4.3 Gradient-Based Optimization
* <font color="red">4.4 Constrained Optimization</font>
* 4.5 Example: Linear Least Squares

# 4.4 Constrained Optimization

#### constrained optimization

* Sometimes we wish not only to maximize or minimize a function $f(x)$ over all possible values of $x$. 
* Instead we may wish to ﬁnd the maximal or minimalvalue of $f(x)$ for values of $x$ in some set $S$.

<img src="http://www.bath.ac.uk/mech-eng/constraintmodelling/images/conopt.jpg" width=300 />

#### feasible

Points $x$ that lie within the set $S$ are called <font color="red">feasible points</font> in constrained optimization terminology.

<img src="http://flylib.com/books/3/287/1/html/2/images/10fig06.jpg" width=400 />

#### small solution (norm constraint)

* We often wish to ﬁnd a solution that is small in some sense. 
* A commonapproach in such situations is to impose a norm constraint, such as 
    - $||x|| ≤ 1$.

#### simple approach

* One simple approach to constrained optimization is simply to <font color="red">modify gradient descent taking the constraint into account</font>.
* step size
    - If we use a <font color="red">small constant step size</font> 	&#949;, we can make gradient descent steps, then project the result back into $S$.
* line search
    - If we use a line search, 
        - we can search only over step sizes &#949; that yield new $x$ points that are feasible, 
        - or we can project each point on the line back into the constraint region.
        <img src="http://image.slidesharecdn.com/131110gradientdescentmethod-141128132442-conversion-gate01/95/gradient-descent-method-24-638.jpg?cb=1417181134" width=600 />
        <img src="http://image.slidesharecdn.com/131110gradientdescentmethod-141128132442-conversion-gate01/95/gradient-descent-method-31-638.jpg?cb=1417181134" width=600 />
        
* tengent space
    - When possible, this method can be made more eﬃcient <font color="red">by projecting the gradient into the tangent space of the feasible region</font> before taking the step or beginningthe line search
    <img src="http://www.frontiersin.org/files/Articles/82010/fphy-02-00019-HTML/image_m/fphy-02-00019-g003.jpg" width=600 />

### unconstrained optimization problem

#### unconstrained optimization ->  constrained optimization problem
A more sophisticated approach is to design a diﬀerent, unconstrained optimization problem whose solution can be <font color="red">converted into a solution to the original, constrained optimization problem</font>.
* For example, 
    - if we want to minimize 
        - $f(x)$ for $x ∈ R^2$ 
            - with $x$ constrained to have exactly unit $L^2$ norm, 
    - we can instead minimize 
        - $g(θ) = f([cos θ, sin θ]^T)$ 
            - with respect to $θ$, 
    - then return $[cosθ, sinθ]$ as the solution to the original problem.
    
This approach requires creativity
* the transformation between optimization problems <font color="red">must be designed speciﬁcally for each case we encounter</font>.

#### Karush–Kuhn–Tucker(KKT)

##### 참고
* [2] Lagrange Multipliers and the Karush-Kuhn-Tucker conditions - http://www.csc.kth.se/utbildning/kth/kurser/DD3364/Lectures/KKT.pdf

TheKarush–Kuhn–Tucker(KKT) approach provides a very general solution to constrained optimization. With the KKT approach, we introduce a new function called the <font color="red">generalized Lagrangian or generalized Lagrange function</font>.

<img src="figures/lag.png" width=600 />

#### KKT multipliers & generalized Lagrangian 

We introduce new variables $λ_i$ and $α_j$ for each constraint, these are called the KKT multipliers. The generalized Lagrangian is then deﬁned as

<img src="figures/cap4.4.1.png" width=600 />    

<img src="figures/cap4.4.2.png" width=600 />    

To perform constrained maximization, we can construct the generalized La-grange function of $−f(x)$, which leads to this optimization problem:

<img src="figures/cap4.4.3.png" width=600 />    

We may also convert this to a problem with maximization in the outer loop:

<img src="figures/cap4.4.4.png" width=600 />    

#### active

* The inequality constraints are particularly interesting. 
* We say that a constraint $h^{(i)}(x)$ is active if $h^{(i)}(x∗) = 0$. 
* If a constraint is not active, then the solution tothe problem found using that constraint would remain at least a local solution if that constraint were removed. 
* It is possible that an inactive constraint excludes other solutions.

<img src="figures/cap4.4.5.png" width=300 />    

#### KKT conditions

A simple set of properties describe the optimal points of constrained optimization problems. These properties are called the Karush-Kuhn-Tucker (KKT)conditions.
* The gradient of the generalized Lagrangian is zero
* All constraints on both $x$ and the KKT multipliers are satisﬁed.
* The inequality constraints exhibit “complementary slackness”:
    <img src="figures/cap4.4.6.png" width=150 />

# 참고자료
* [1] 4 Numerical Computation (Deep Learning Book) - http://www.deeplearningbook.org/contents/numerical.html
* [2] Lagrange Multipliers and the Karush-Kuhn-Tucker conditions - http://www.csc.kth.se/utbildning/kth/kurser/DD3364/Lectures/KKT.pdf