# Convex Optimization Project
## Support Vector Machines solvers

Given $m$ data points $x_i \in \mathbb{R}^n$ with labels $y_i \in \{-1,1\}$, write a function to solve the classification problem

$$ \begin{array}l
\mathrm{minimize} & \frac12 {||w||}_2^2 + C \mathbf{1}^Tz \\
\mathrm{subject\ to} & y_i(w^Tx_i) \geq 1 - z_i, \quad \forall i \in \{1,\ldots,m\} \\
& z \succcurlyeq 0
\end{array} $$

in the variables $w \in \mathbb{R}^n$, $z \in \mathbb{R}^m$, and its dual (warning: this problem is a bit different from the one in exercise 1).

Solving this problem trains a classifier vector $w$ such that, up to some errors

$$ \begin{array}l
w^Tx_i > 0 & \mathrm{when}\ y_i = 1 \\
w^Tx_i < 0 & \mathrm{when}\ y_i = -1.
\end{array} $$

This classifier can then be used to classify new points $x$ as positives or negatives by simply computing the scalar product $w^Tx$.

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


- Use the barrier method to solve both primal and dual problems.

In [3]:
def newton(f, gradient_and_hessian, ɛ, x0, α=0.45, β=0.8):
    """ Newton descent method.
    @param f is the function to minimize
    @param gradient_and_hessian is a function that returns the gradient and the hessian at x
    @param ɛ is the required absolute precision
    @param x0 is a strictly feasible point, i.e., f(x0) < +inf
    @param α is a parameter for the backtracking line search
    @param β is a parameter for the backtracking line search
    @return an array of values
    """    
    x = x0
    values = [[copy(x), f(x)]]
    while True:
        # Direction computation.
        gradient, hessian = gradient_and_hessian(x)
        hessian_inv = inv(H)
        dx = -hessian_inv.dot(gradient)
        
        # Stopping criterion.
        λsquare = gradient.dot(-dx)
        if λsquare/2 <= ɛ:
            break
        
        # Backtracking line search.
        t = 1.0
        f0 = f(x)
        Δ = -λsquare
        while f(x + t*dx) > f0 + α*t*Δ:
            t *= β
        
        # Update.
        x += t*dx
        values.append([copy(x), f(x)])
    
    return array(values)

In [6]:
def barrier_method(objective, objective_gh, m, constraints, constraints_gh, ɛ, x0, t0=1.0, μ=10.0, *newton_params):
    """ Solve the SVM classifier problem.
    @param objective is the function to minimize
    @param objective_gh is a function which returns the gradient and the hessian of the objective
    @param m is the number of (inequality) constraints
    @param constraints is a function which returns the array of the f_i(x) where f_i(x) <= 0
    @param constraints_gh is a function which returns the array of gradients and hessians of the -log(-fi(x))
    @param x0 is a strictly feasible point, i.e., objective(x0) < +inf and constraints(x0) < 0
    @param t0 is a barrier parameter
    @param μ is a barrier parameter
    @param *newton_params are the additional newton method parameters
    @return an array of values
    """
    x = x0
    t = t0
    values = []
    
    def f(x):
        cons = constraints(x)
        if any(cons) <= 0:
            return float("inf")
        return objective(x) - sum(log(cons))/t
    def gradient_and_hessian(x):
        obj_g, obj_h = objective_gh(x)
        cons_g, cons_h = constraints_gh(x)
        return obj_g + cons_g/t, obj_h + cons_h/t
    
    while True:
        x = newton(f, gradient_and_hessian, ɛ, x, *newton_params)[-1][0]
        values.append([copy(v), objective(v)])
        # Stopping criterion.
        if m/t <= ɛ:
            break
        t *= μ
    
    return array(values)

To solve the original problem, we need to solve the following problems for different values of $t$:

$$ \begin{array}c
\mathrm{minimize} & \frac12 {||w||}_2^2 + C \mathbf{1}^Tz - \frac1t \left[ \sum_{i=1}^m \log \left( y_i(w^Tx_i) - 1 + z_i \right) + \sum_{i=1}^m \log(z_i) \right]
\end{array} $$

Now, we can reformulate the problem to:

$$ \begin{array}c
\mathrm{minimize} & \frac12{||w||}_2^2 + C\mathbf{1}^Tz - \frac1t \sum_{i=1}^{2m} \log(b_i - a_i^Tv)
\end{array} \\ \\
\mathrm{with}\quad v = (w,z) \in \mathbb{R}^{n+m}\quad b_i = \left\{ \begin{array}l -1 & \text{if}\ i \leq m \\ 0 & \text{if}\ i > m \end{array}\right. \quad a_i = \left\{\begin{array}l -(y_ix_i,e_i) & \text{if}\ i \leq m \\ (0,e_{i-m}) & \text{if}\ i > m \end{array}\right.\quad (e_i)_{1\leq i\leq m} \text{ is the canonical base of } \mathbb{R}^m
$$

$$ f(w,z) = \frac12 {||w||}_2^2 + C \mathbf{1}^Tz - \frac1t \sum_{i=1}^{2m} \log(b_i - a_i^Tv) $$

$$ \nabla_v f(v) = (w,C\mathbf{1}) + \frac1t\sum_{i=1}^{2m} \frac{a_i}{b_i - a_i^Tv} $$

$$ H_f(v) = \left( \begin{array}c
I_n & 0 \\
0 & 0 \\
\end{array} \right) + \frac1t \sum_{i=1}^{2m} \frac{a_i a_i^T}{\left( b_i - a_i^Tv \right)^2} $$

In [10]:
def svm_barrier(x, y, c):
    """ Solve the SVM classifier problem.
    @param x is an array of shape (m,n): the m points in dimension n
    @param y is an array of shape m and of values in {-1,1}: the labels
    @param c is the margin parameter
    @return (w,z) where w is the classifier vector (array of shape n) and z the margin vector (array of shape m)
    """
    m, n = x.shape
    # w = 0 and z_i = 2 is a strictly feasible point.
    v0 = concatenate((zeros(n), 2*ones(m)))
    a = concatenate((
            concatenate(((y*x.T).T, eye(m)), axis=1), 
            concatenate((zeros((m, n)), eye(m)), axis=1)), 
            axis=0)
    b = concatenate(-ones(m), zeros(m))
    
    scalars = array([a[i].reshape(n,1).dot(a[i].reshape(1,n)) for i in range(m)])
    def objective(v):
        w, z = x[:n], x[n:]
        return sum(w**2)/2 + c*sum(z)
    def objective_gh(v):
        w, z = x[:n], x[n:]
        gradient = concatenate(w, c*ones(m))
        hessian = concatenate((
            concatenate((eye(n), zeros((m, m))), axis=1),
            zeros((m, n+m)),
        ), axis=0)
        return gradient, hessian
    def constraints(v):
        return a.dot(v) - b
    def constraints_gh(v):
        gradient = sum((a.T / div).T, axis=0)/t
        hessian = sum((scalars.T / (div**2)).T, axis=0)/t
        return gradient, hessian
    
    ɛ = 1e-10
    values = barrier_method(objective, objective_gh, m, constraints, constraints_gh, ɛ, v0)
    v = values[-1][0]
    return v[:n], v[n:]

- Test your code on random clouds of points (e.g. generate two classes of data points by picking two bivariate Gaussian samples with different moments).

In [None]:
def data_points(n):
    """ Generates a dataset composed of two bivariate gaussian samples with different means.
    @param n is the number of points in each class
    @return """
    σ = 
    μ1 = 
    μ2 = 

- Try various values of $C > 0$ and measure out-of-sample performance (i.e. classification errors on points the algorithm has not seen).

- Plot duality gap versus iteration number as well as a separation example in 2D (you may add a constant coefficient to the data points $x$ to allow classifiers that do not go through the origin).

(Optional) Use CVX (MATLAB or OCTAVE) or CVXOPT (python), as well as LIBSVM and/or LIBLINEAR to check your results and compare performance.

(Optional) Use the coordinate descent method to solve the dual. Plot duality gap versus iteration number and compare performance with the barrier method for various problem sizes (vary the number of samples and record to total CPU time required by each code to reduce the gap by a factor ${10}^{-3}$).

(Optional) Use the logarithmic barrier code you wrote in HW1 to solve a small random instance of the primal problem using the ACCPM algorithm. Plot an upper bound on the distance to optimality in semilog scale and try various constraint dropping strategies. Compare convergence with the two other methods.