## 逻辑回归

一、逻辑算法(logistic regression):  
    1.目的：经典的**二分类**算法;  
    2.机器学习算法选择：先逻辑回归再用复杂的算法;  
    3.逻辑回归的决策边界；可为线性也可为非线性。  

二、Sigmoid函数:  
    1.预测函数：$h_{\theta}(x) = g(\theta^{T}X) = \frac{1}{1+e^{-\theta^{T}X}}$，其中$\theta_{0}+\theta_{1}X_{1}+...+\theta_{n}X_{n}$;  
    2.分类任务：将$p(y=1|x;\theta) = h_{\theta}(x)$和$p(y=0|x;\theta) = 1-h_{\theta}(x)$整合为$p(y|x;\theta) = (h_{\theta}(x))^{y}(1-h_{\theta}(x))^{1-y}$;  
    3.解释：对于二分类任务(0,1)，整合后可以分为$y = 0\rightarrow p(y|x;\theta) = (1-h_{\theta}(x))^{1-y}$和$y = 1\rightarrow p(y|x;\theta) = (h_{\theta}(x))^{y}$;  
    4.公式及其解释：对于公式$g(z) = \frac{1}{1+e^{-z}}$，其中自变量取值为任意实数，值域为[0,1]，可以将任意的输入映射到[0,1]区间中。在线性回归中可以得到一个预测值，再将该值映射到Sigmoid函数中，**完成由值到概率的转换**，即为完成分类任务。  

三、逻辑回归与似然函数:  
    1.似然函数：$L(\theta) = \prod_{i=1}^m{p(y_{i}|x_{i}; \theta)} = \prod_{i=1}^m{(h_{\theta}(x_{i}))^{y_{i}}(1-h_{\theta}(x_{i}))^{1-y_{i}}}$;  
    2.对数似然：$lnL(\theta) = \sum_{i=1}^m{(y_{i}lnh_{\theta}(x_{i})+(1-y_{i})(1-lnh_{\theta}(x_{i})))}$;  
    3.此时应用梯度上升求取最大值，引入$J(\theta) = -\frac{1}{m}lnL(\theta)$转为梯度下降任务;  
    4.求导过程：  
    $$
    \frac{\delta J(\theta)}{\delta \theta_{j}} = -\frac{1}{m}\sum_{i=1}^m{(y_{i}\frac{\frac{\delta h_{\theta}(x_{i})}{\delta \theta_{j}}}{lnh_{\theta}(x_{i})}-(1-y_{i})\frac{\frac{\delta h_{\theta}(x_{i})}{\delta \theta_{j}}}{1-lnh_{\theta}(x_{i})})} = -\frac{1}{m}\sum_{i=1}^m{(\frac{y_{i}}{g(\theta^{T}X_{i})}-\frac{1-y_{i}}{1-g(\theta^{T}X_{i})})\frac{\delta g(\theta^{T}X_{i})}{\delta \theta}}
    $$  
    $$
     = -\frac{1}{m}\sum_{i=1}^m{(\frac{y_{i}}{g(\theta^{T}X_{i})}-\frac{1-y_{i}}{1-g(\theta^{T}X_{i})})g(\theta^{T}X_{i})(1-g(\theta^{T}X_{i}))\frac{\delta \theta^{T}X_{i}}{\delta \theta}} = -\frac{1}{m}\sum_{i=1}^m{(y_{i}-g(\theta^{T}X_{i}))x_{i}^{j}}
    $$  
    5.参数更新：$\theta_{j}^{'} = \theta_{j}-\alpha\frac{1}{m}\sum_{i=1}^m{(h_{\theta}(x_{i})-y_{i})x_{i}^{j}}$;  
    6.多分类的softmax：  
    $$
    h_{\theta}(x^{(i)}) = 
    \begin{bmatrix}
    p(y^{(i)}=1|x^{(i)};\theta)\\
    p(y^{(i)}=2|x^{(i)};\theta)\\
    ...\\
    p(y^{(i)}=k|x^{(i)};\theta)
    \end{bmatrix}
    =
    \frac{1}{\sum_{j=1}^{k}{\theta_{j}^{T}X^{(i)}}}
    \begin{bmatrix}
    \theta_{1}^{T}X^{(i)}\\
    \theta_{2}^{T}X^{(i)}\\
    ...\\
    \theta_{j}^{T}X^{(i)}\\
    \end{bmatrix}
    $$  
    7.优点：做事速度快且模型可解释性强。

In [36]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

In [39]:
# 加载数据集
data = load_breast_cancer() 
X,y = data.data,data.target

# 划分训练集和测试集 前者用于训练机器学习模型后者评估模型的泛化能力
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

# 数据标准化
scaler = StandardScaler() # 使数据集均值为0，标准差为1
X_train = scaler.fit_transform(X_train) # 计算标准差和均值并储存，使用得到的均值和标准差将X_train进行标准化
X_test = scaler.transform(X_test)

# 初始化参数
n_features = X_train.shape[1]
w = np.zeros(n_features)
b = 0.0

# 定义 sigmoid 函数
def sigmoid(z):
    return 1/(1+np.exp(-z))
    
# 定义预测函数
def predict(X,w,b):
    z = np.dot(X,w)+b
    return sigmoid(z)
    
# 梯度下降参数
alpha = 0.01 # 学习率
n_iterations = 1000 # 迭代次数
m = X_train.shape[0]

# 梯度下降训练
for i in range(n_iterations):
    # 前向传播：计算预测值
    y_pred = predict(X_train,w,b)
    # 计算梯度
    error = y_pred - y_train
    dw = (1/m)*np.dot(X_train.T,error)
    db = (1/m)*np.sum(error)
    # 更新参数
    w = w-alpha*dw
    b = b-alpha*db
    # 每100次输出一次损失
    if i%100 == 0:
        loss = -(1/m)*np.sum(y_train*np.log(y_pred+1e-8)+(1-y_train)*np.log(1-y_pred+1e-8))
        print(f"Iteration{i}:Loss = {loss:.4f}")
        
# 在测试集上预测
y_test_pred_prob = predict(X_test,w,b)
y_test_pred = (y_test_pred_prob>=0.5).astype(int)

# 计算准确率
accuracy = accuracy_score(y_test,y_test_pred)
print(f"\nTest Accuracy:{accuracy:.4f}")

Iteration0:Loss = 0.6931
Iteration100:Loss = 0.2543
Iteration200:Loss = 0.1917
Iteration300:Loss = 0.1633
Iteration400:Loss = 0.1465
Iteration500:Loss = 0.1350
Iteration600:Loss = 0.1267
Iteration700:Loss = 0.1202
Iteration800:Loss = 0.1150
Iteration900:Loss = 0.1107

Test Accuracy:0.9825
