# <center>支持向量机(二分类)</center>

## 模型
设:

$$s(x)=\frac{1}{1+e^{-w^T x}}$$

$$P(Y=1|x)=s(x)$$

$$P(Y=0|x)=1-s(x)$$

$
$y=
    \begin{cases}
        1, \quad & \text{if P(Y=1|x)>P(Y=0|x)}\\
        0, \quad & \text{if P(Y=1|x) $\leq$ P(Y=0|x)}\\
    \end{cases}
$$

$$
y=
    \begin{cases}
        1, \quad & \text{if P(Y=1|x)>0.5}\\
        0, \quad & \text{if P(Y=1|x) $\leq$ 0.5}\\
    \end{cases}
$$

$$
y=
    \begin{cases}
        1, \quad & \text{if $w^Tx$ > 0}\\
        0, \quad & \text{if $w^Tx$ $\leq$ 0}\\
    \end{cases}
$$

## 策略
把单个样本看做一个事件，那么这个事件发生的概率是:

$$
P(y|x)=
    \begin{cases}
        p, \quad & \text{if y = 1}\\
        1-p, \quad & \text{if y = 0}\\
    \end{cases}
$$

这个函数不方便计算，合并一下

$$P(yi|xi)=p^{y_i}(1-p)^{1-y_i}$$

似然函数(这组样本总的概率):

$$P_总=\prod_{i=1}^N p^{y_i} (1-p)^{1-y_i}$$

对数似然函数:

$$\begin{align}
J(w)&=\ln (P_总)\\
&=\sum_{i=1}^N \ln (p^{y_i} (1-p)^{1-y_i})\\
&=\sum_{i=1}^N (y_i \ln (p) + (1-y_i) \ln (1-p))\\
\end{align}$$

最大似然估计:
$$w=\arg \max_w J(w)$$

不过梯度下降是求最小值，改写一下
$$w=\arg \min_w (-J(w))$$

设

$$L(w)=-J(w) = \sum_{i=1}^N (-y_i \ln (p) - (1-y_i) \ln (1-p))$$

$$w=\arg \min_w L(w)$$

## 算法
$$\frac{\partial (w^T x)}{\partial w} = x$$

$$p=\frac{1}{1+e^{-w^T x}}$$

$$\begin{align}
\frac{\partial p}{\partial w}&= (\frac{1}{1+e^{-w^T x}})'\\
&= ((1+e^{-w^T x})^{-1})'\\
&= -1 \cdot (1+e^{-w^T x})^{-2} \cdot e^{-w^T x} \cdot (-x)  \\
&= \frac{1}{1+e^{-w^T x}} \cdot \frac{e^{-w^T x}}{1+e^{-w^T x}} \cdot x\\
&= p(1-p)x\\
\end{align}$$

同时得出

$$\frac{\partial (1-p)}{\partial w} = - \frac{\partial p}{\partial w} = - p (1-p) x$$

$$\begin{align}
\frac{\partial L(w)}{\partial w} &= \sum_{i=1}^N (-y_i \ln ' (p) - (1-y_i) \ln' (1-p))\\
&= \sum_{i=1}^N (-y_i \frac{1}{p} p' - (1-y_i) \frac{1}{1-p} (1-p)')\\
&= \sum_{i=1}^N (-y_i \frac{1}{p} p (1-p) x_i - (1-y_i) \frac{1}{1-p} (- p (1-p) x))\\
&= \sum_{i=1}^N (-y_i (1-p) x_i + (1-y_i) p x_i))\\
&= \sum_{i=1}^N (p - y_i) x_i\\
&= \sum_{i=1}^N (\frac{1}{1+e^{-w^T x_i}} - y_i) x_i\\
\end{align}$$

批量梯度下降

$$w_{t+1} = w_t - \lambda \frac{\partial L(w)}{\partial w} = w_t - \lambda \sum_{i=1}^N (\frac{1}{1+e^{-w^T x_i}} - y_i) x_i$$

其中$\lambda$为学习率

随机梯度下降

每次选一个样本$(x_i,y_i)$,把它的值乘以N，相当于获得了整体梯度的无偏估计
$$w_{t+1} = w_t - \lambda N (\frac{1}{1+e^{-w^T x_i}} - y_i) x_i$$

这样就可以不用求和了，同时$\lambda$N是常数，可以直接用常数$\lambda$替换，更新之后的公式为:
$$w_{t+1} = w_t - \lambda (\frac{1}{1+e^{-w^T x_i}} - y_i) x_i$$



In [23]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split



class LogisticRegressionClassifier:
    def __init__(self,epochs=200,learning_rate=0.01):
        self.epochs=epochs
        self.learning_rate=learning_rate
        self.w=None

    def fit(self, X, y):
        X=np.hstack((np.ones((len(X),1)),X))
        self.w=np.random.randn(len(X[0]))
        for epoch in range(self.epochs):
            for i in range(len(X)):
                self.w-=self.learning_rate*(self._sigmoid(np.dot(self.w,X[i]))-y[i])*X[i]


    def predict(self, X):
        y_pred = []
        for sample in X:
            y_pred.append(1 * (np.dot(sample, self.w) < 0)) #阈值为0
        return y_pred


    def score(self,X_test,y_test):
        X_test=np.hstack((np.ones((len(X_test),1)),X_test))
        right=0
        for Xi,yi in zip(X_test,y_test):
            result=np.dot(self.w,Xi)
            if (result>0 and yi==1) or (result<=0 and yi==0):
                right+=1
        return float(right)/len(X_test)

    def _sigmoid(self,x):
        return 1/(1+np.exp(-x))


def create_data():
    data = datasets.load_iris()
    X = data.data
    y = data.target
    X = X[y < 2]
    y = y[y < 2]
    return X,y

X,y=create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = LogisticRegressionClassifier()
model.fit(X_train, y_train)

print(f'Score:{model.score(X_test,y_test)}')


Score:1.0
