# CS229 Problem Set 2
## Problem 2：垃圾邮件分类

### 问题定义

对这些数据训练出来一个SVM模型，要求使用高斯Kernel，用随机梯度下降算法来进行参数更新，并输出训练误差。

**目标函数-hinge损失函数：**
$$
J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left\{\left[1-y^{(i)} b-y^{(i)} K^{(i)} \alpha\right]_{+}+\frac{m \lambda}{2} \alpha^{T} K \alpha\right\}
$$
**随机梯度-参数更新公式：**
$$
\begin{aligned} \nabla_{\alpha} J(\theta) &=\nabla_{\alpha}\left[1-y^{(i)} b-y^{(i)} K^{(i)} \alpha\right]_{*}+\nabla_{\alpha} \frac{m \lambda}{2} \alpha^{T} K \alpha \\ &=\left\{\begin{array}{ll}{-y^{(i)} K^{(i)}} & {\text { if } y^{(i)} K^{(i)} \alpha<1} \\ {0} & {\text { otherwise }} \end{array}+m \lambda K \alpha \right.\end{aligned}
$$

**SVM测试**：
对于样本$x^{(j)}$，测试集的Kernel矩阵K来说，当$K(j ; :) \alpha >0 $时$y^{(i)}=1$否则$y^{(i)}=0$

In [1]:
from read_Matrix import read_matrix
import numpy as np
import matplotlib.pyplot as plt
import time

In [2]:
def svm_train(filename):
    _,_,Y_label,matrix = read_matrix(filename)
    Y_label = np.array(Y_label)*2-1

    m,n = matrix.shape
    matrix = 1. * (matrix>0)

    square_matrix = np.sum(matrix**2,1).reshape(m,1)
    gram = matrix.dot(matrix.T)
    tau = 8
    m_lambda = 1/(64*m)
    Kerenl_matrix = np.exp(-(np.repeat(square_matrix,m,1)+np.repeat(square_matrix.T,m,0)-2*gram)/(2*tau**2))

    alpha = np.zeros((m,1))
    avg_alpha = np.zeros((m,1))
    for i in range(40*m):
        ind = int(np.ceil(np.random.rand()*m) )-1   #随机抽取样本进行训练
        margin = Y_label[ind]*np.dot(Kerenl_matrix[ind,:],alpha)
        grad_alpha = (-1*(margin<1)*Y_label[ind]*Kerenl_matrix[:,ind]).reshape((m,1)) + m*m_lambda*Kerenl_matrix[:,ind].reshape((m,1))*alpha
        alpha = alpha - grad_alpha/np.sqrt(i+1)
        avg_alpha += alpha
    avg_alpha /= (40*m)
    return avg_alpha,square_matrix,matrix

In [3]:
def svm_test(alpha,square_train,matrix_train):
    _,_,Y_test,matrix_test = read_matrix("data/MATRIX.TEST")
    Y_test = (np.array(Y_test)*2-1)
    matrix_test = 1. *(matrix_test>0)
    m_train = square_train.shape[0]
    m_test = matrix_test.shape[0]

    square_test = np.sum(matrix_test**2,1).reshape((m_test,1))
    gram_test = matrix_test.dot(matrix_train.T)
    tau = 8
    Kernel_test = np.exp(-(np.repeat(square_test,m_train,1)+np.repeat(square_train.T,m_test,0)-2*gram_test)/(2*tau**2))
    Y_pred = Kernel_test.dot(alpha)

    num = 0
    Y = Y_pred*Y_test.reshape((m_test,1))
    for i,y in enumerate(Y):
        if y<=0:
            num +=1
    error = num/m_test
    return error

In [4]:
def main():
    train_size = [50,100,200,400,800,1400]
    #train_size = [800]
    #error = []
    for i,size in enumerate(train_size):
        start = time.time()
        avg_alpha,square_train,matrix_train = svm_train("data/MATRIX.TRAIN."+str(size))
        error = svm_test(avg_alpha,square_train,matrix_train)*100
        times = time.time()-start
        print("Size:"+str(size)+"  error:"+str(error)+"  times:"+str(times))
main()

Size:50  error:1.875  times:0.2608492374420166
Size:100  error:0.5  times:0.3048243522644043
Size:200  error:1.875  times:0.4667317867279053
Size:400  error:0.25  times:1.9708657264709473
Size:800  error:0.0  times:2.3656392097473145
Size:1400  error:0.0  times:6.1504597663879395
