# Support Vector Machines
## Course recap
This lab consists in implementing the **Support Vector Machines** (SVM) algorithm. 

Given a training set $ D = \left\{ \left(x^{(i)}, y^{(i)}\right), x^{(i)} \in \mathcal{X}, y^{(i)} \in \mathcal{Y}, i \in \{1, \dots, n \}  \right\}$, where $\mathcal{Y} = \{ 1, \dots, k\}$ . Recall (from lecture 7), SVM aims at minimizing the following cost function $J$:
$$
\begin{split}
J(\theta_1, \theta_2, \dots, \theta_k) 
	&= \sum_{i = 1}^n L_i \\
	&= \sum_{i = 1}^n \sum_{j \neq y_i} \max(0, \theta_j^Tx^{(i)} - \theta_{y^{(i)}}^T x^{(i)} + \Delta)
\end{split}
$$

## Defining the training set
Let us define variables `X` and `Y` that will contain the features $\mathcal{X}$ and labels $\mathcal{Y}$ of the training set. Again, we will be having an intercept.

In [117]:
k_classes = 2
X = [[1., 50.], [1., 76.], [1., 26.], [1., 102.]]
Y = [1, 2, 1, 1]

In this simple example, the dimensionality is $d = 1$ (which means 2 features: don't forget the intercept!) and the number of samples is $n = 4$.

## Prediction function
**Exercise**: Define a function `score` that takes as parameter *the feature vector* $x$ as well as *a model* $\theta$ and outputs the score:
$$ h(x) = \theta^T x = \sum_{j = 0}^d \theta_j x_j$$

In [118]:
def score(x, theta):
    d = len(x)
    thetaTx = 0
    for idx in range(d):
        print "x " + str(x)
        print "theta " + str(theta)
        thetaTx += x[idx] * theta[idx]
    return thetaTx

## Defining the cost function
### Cost function on a single sample
**Exercise**: Define a function `cost_function` that takes as parameter *the predicted label* $y$ and *the actual label* $\hat{y}$ of a single sample and returns the value of the cost function for this pair. Recall from lectures 1 and 2 that it is given by:
$$ L_i = \sum_{j \neq y_i} \max(0, \theta_j^Tx^{(i)} - \theta_{y^{(i)}}^T x^{(i)} + \Delta) $$

In [119]:
def cost_function(x, y, thetas, delta):
    thetayTx = predict(x, thetas[y])
    loss = 0
    d = len(x)
    for j in range(d):
        if j is not y:
            print "x " + str(x)
            print "thetas " + str(thetas)
            thetajTx = predict(x, thetas[idx])
            loss += max(0, thetajTx - thetayTx + delta)
    return loss

Now we are able to compute the loss for a single training sample, we can get the total cost.

**Exercise**:

In [120]:
def cost_function_total(X, Y, thetas, delta):
    cost = 0 # initialize the cost with 0
    n = len(Y)
    for i in range(n): # iterate over the training set
        x = X[i] # get the ith feature vector
        y = Y[i] # get the ith label
        print "x " + str(x)
        print "y " + str(y)
        print "thetas " + str(thetas)
        cost += cost_function(x, y, thetas, delta) # add the cost of the current sample to the total cost
    return cost

In [121]:
def initialize_thetas(X, k_classes):
    d = len(X[1])
    theta = [0] * d
    return [theta] * k_classes

In [122]:
thetas_0 = initialize_thetas(X, 2)

In [123]:
def predict(x, thetas):
    k_classes = len(thetas)
    prediction = 0
    highest_score = score(x, thetas[prediction]) # initialize with the first class
    for idx_class in range(k_classes):
        class_score = score(x, thetas[idx_class])
        if class_score > highest_score:
            prediction = idx_class
    return prediction + 1

In [124]:
predict(X[0], thetas_0)

x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]


1

## Gradient

In [125]:
def sum_vectors(x1, x2):
    d = len(x1)
    sum_vector = x1
    for idx in range(d):
        sum_vector[idx] += x2[idx]
    return sum_vector

In [126]:
def gradients(x, y, thetas, delta):
    d = len(x)
    k_classes = len(thetas)
    predicted_class = predict(x, thetas)
    grads = [[0] * d] * k_classes # initialize a list of k_class gradients with zeros everywhere
    for idx_class in range(k_classes): # iterate over all the classes to compute the gradient for each class
        # there are 2 formulas: one for the true class (given by 'y') and another one for the other classes
        if idx_class + 1 == y: # if idx_class is equal to the actual class
            p = 0
            for j in range(k_classes):
                if j + 1 != y: # are counting over the classes different than the actual class
                    if score(x, thetas[j]) - score(x, thetas[y - 1]) + delta > 0:
                        p += 1
            for idx in range(d):
                grads[idx_class][idx] = - p * x[idx]
        else: # if idx_class is not the actual class
            if score(x, thetas[idx_class]) - score(x, thetas[y - 1]) + delta > 0:
                for idx in range(d):
                    grads[idx_class][idx] = x[idx]
            # we do not need an else statement here because the gradient would be equal to 0 in this case, 
            # and the gradient has been initialized with zeros
    return grads

In [127]:
print gradients(X[0], Y[0], thetas_0, 4.0)

x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
[[1.0, 50.0], [1.0, 50.0]]


In [128]:
def gradient_total(X, Y, thetas, delta):
    n = len(Y) # number of training samples
    d = len(X[1])
    k_classes = len(thetas)
    grads_sum = [[0] * d] * k_classes 
    for i in range(n):
        x = X[i]
        y = Y[i]
        grads = gradients(x, y, thetas, delta)
        for j in range(k_classes):
            grads_sum[j] = sum_vectors(grads[j], grads_sum[j])
    return grads_sum

In [129]:
gradient_total(X, Y, thetas_0, 4.0)

x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x

[[7.0, 250.0], [7.0, 250.0]]

In [130]:
def axpb(a, x, b):
    d = len(x)
    sum_vector = b
    for idx in range(d):
        sum_vector[idx] += a * x[idx]
    return sum_vector

In [133]:
def gradient_descent(X, Y, delta, learning_rate):
    k_classes = len(set(Y))
    thetas = initialize_thetas(X, k_classes)
    for i_iter in range(5):
        grads = gradient_total(X, Y, thetas, delta)
        for j in range(k_classes):
            thetas[j] = axpb(-learning_rate, grads[j], thetas[j])
        print "X " + str(X)
        cost = cost_function_total(X, Y, thetas, delta)
        print "iteration " + str(i_iter) + ", cost = " + str(cost)
    return thetas

In [134]:
gradient_descent(X, Y, 5.0, 0.1)

x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 50.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 76.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x [1.0, 26.0]
theta [0, 0]
x

TypeError: 'float' object has no attribute '__getitem__'