## Note
### 1. 线性回归模型
- x：特征矩阵
- w：权值向量，同线性模型的参数向量
- b：截距
$$y=x\cdot w+b$$

### 2. 二分类逻辑回归模型
- Y：标签
- 左侧为Y=1的条件概率
$$P(Y=1|x)=\frac{1}{1+e^{-(x\cdot w+b)}}$$

### 3. 极大似然与损失函数
- 设：
$$P(Y=1|x)=p(x)$$
$$P(Y=0|x)=1-p(x)$$
- 通过极大似然使得概率值逼近标签值
    - xi:一个样本
    - yi：i样本对应的真实标签
$$MLE=ArgMax \prod [p(x_{i})]^{y_{i}}*[1-p(x_{i})]^{1-y_{i}}$$
- 损失函数 logloss
    - n：样本数
$$LogLoss = ArgMin-\frac{1}{n}(\sum_{i=1}^{n}(y_{i}*ln(p(x_{i}))+(1-y_{i})*ln(1-p(x_{i})))$$

### 4. 梯度下降
$$g_{i} = \frac{\partial LogLoss}{\partial w}=\frac{1}{n}\sum_{i=1}^{n}x_{i}*(p(x_{i})-y_{i})$$

- 通过迭代，更新w得到最优损失函数
    - j：第j轮迭代
    - lambda：学习率
$$w_{i}^{j+1}=w_{i}^{j}-\lambda g_{i}$$

### 5. 逻辑回归最终预测模型
$$LogisticModel=\left \{ 
\begin{aligned}
1 \quad \frac{1}{1+e^{-(x\cdot w+b)}}>0.5 \\
-1  \quad \frac{1}{1+e^{-(x\cdot w+b)}}\leq 0.5
\end{aligned}
\right.$$

## 逻辑回归算法 python实现

In [4]:
import numpy as np
import pandas as pd


def Ken_Logistic(Xtrain,Ytrain,Xtest,n_iter=100000,lam=0.0001):
    ones = np.ones((Xtrain.shape[0],1))
    Xtrain_ = np.hstack((ones,Xtrain))
    Xtrain_ = np.mat(Xtrain_)
    w_ = np.mat(np.ones((Xtrain_.shape[1],1)))
    Ytrain_ = np.mat(Ytrain).T
    
    ones1 = np.ones((Xtest.shape[0],1))
    Xtest_ = np.hstack((ones1,Xtest))
    Xtest_ = np.mat(Xtest_)
    
    #梯度下降，更新w
    for i in range(n_iter):
        gd = 1/Xtrain_.shape[0] * Xtrain_.T * ( sigmoid(Xtrain_,w_) - Ytrain_ )
        w_ = w_ - gd*lam
    
    Y_pred_train = sigmoid(Xtrain_ , w_)
    Y_pred_test = sigmoid(Xtest_ , w_)
    return w_,predict_label(Y_pred_train),predict_label(Y_pred_test)
        

def sigmoid(X,w):
    return 1 / (1+np.exp(-X*w))

def predict_label(Y_pred,threshold=0.5):
    return (Y_pred>threshold).astype('int')



## 使用乳腺癌数据集测试效果

In [2]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

#导入数据
bc = load_breast_cancer()
X = bc.data
Y = bc.target

#划分训练集和测试集
Xtrain,Xtest,Ytrain,Ytest = train_test_split(X,Y,test_size=0.3,random_state=45)

#去量冈
std = StandardScaler().fit(Xtrain)
Xtrain = std.transform(Xtrain)
Xtest = std.transform(Xtest)

lr = LogisticRegression().fit(Xtrain,Ytrain)
lr.score(Xtrain,Ytrain),lr.score(Xtest,Ytest)


(0.9874371859296482, 0.9883040935672515)

In [5]:
w,Y_pred_train,Y_pred_test = Ken_Logistic(Xtrain,Ytrain,Xtest)
accuracy_score(Ytrain,Y_pred_train),accuracy_score(Ytest,Y_pred_test)

(0.9597989949748744, 0.9649122807017544)