## Logistic Regression

给定数据$X = x_1, x_2, ..., Y = y_1, y_2, ...$，考虑二分类任务，即$y_i \in 0,1, i = 1, 2 ...$

### 假设函数(Hypothesis function)

假设函数就是其基本模型，如下：
$$ h_{\theta}(x) = g({\theta}^Tx) $$

其中${\theta}^Tx = w^Tx+b$，而$g(z) = \frac{1}{1 + e^{-z}}$为sigmoid函数，也称激活函数。

### 损失函数
损失函数又叫代价函数，用于衡量模型的好坏，这里可以用极大似然估计法来定义损失函数。

代价函数可定义为极大似然估计，即
$$
L(\theta) = \prod_{i=1}^{m} p(y_i \mid x_i)
$$

其中$x_1$对应的标签$y_1 = 1$, $x_2$对应的标签$y_2 = 0$。

设定正例的概率：
$$
p(y_i = 1 \mid x_i) = h_\theta(x_i)
$$
负例的概率：
$$
p(y_i = 0 \mid x_i) = 1 - h_\theta(x_i)
$$
根据极大似然估计原理，目标函数是：
$$
\theta^{*} = \arg\max_{\theta} L(\theta)
$$
为简化运算，对数变换：

$$
\theta^{*} = \arg\max_{\theta} \log L(\theta)
= \arg\min_{\theta} \bigl( -\log L(\theta) \bigr)
$$

进一步化简为常用的损失函数表达式：

$$
-\ln L(\theta)
= \ell(\theta)
= \sum_{i=1}^{m} \left( - y_i \theta^{T} x_i + \ln(1 + e^{\theta^{T} x_i}) \right)
$$

### 求解：梯度下降

根据凸优化理论，该函数可以由梯度下降法，牛顿法得出最优解。

对于梯度下降来讲，其中 $\eta$ 为学习率：

$$
\theta^{t+1} = \theta^{t} - \eta \frac{\partial \ell(\theta)}{\partial \theta}
$$

其中

$$
\frac{\partial \ell(\theta)}{\partial \theta}
= \sum_{i=1}^{m} \left( - y_i x_i + \frac{e^{\theta^T x_i} x_i}{1 + e^{\theta^T x_i}} \right)
= \sum_{i=1}^{m} x_i(-y_i + h_\theta(x_i))
= \sum_{i=1}^{m} x_i(-\,\text{error})
$$


这里梯度上升更方便点：

$$
\theta^{t+1} = \theta^{t} + \eta \left( -\frac{\partial \ell(\theta)}{\partial \theta} \right)
$$

其中

$$
-\frac{\partial \ell(\theta)}{\partial \theta}
= \sum_{i=1}^{m} \left( y_i x_i - \frac{e^{\theta^T x_i} x_i}{1 + e^{\theta^T x_i}} \right)
= \sum_{i=1}^{m} x_i (y_i - h_\theta(x_i))
= \sum_{i=1}^{m} x_i * \text{error}
$$

### 伪代码
训练算法如下：
- 输入：训练数据$X = x_1, x_2, ...,x_n$，训练标签$Y = y_1, y_2, ...$， 注意均为矩阵形式。
- 输出: 训练好的模型参数$\theta$，或者$h_{\theta}(x)$
- 初始化模型参数$\theta$，迭代次数$n_iters$，学习率$\eta$
- FOR $i_iter$ in range(n_iters)
   - FOR i in range(n)   $\rightarrow n=len(X)$
      - error = $y_i -h_{\eta}(x_i)$
      - grad = error * $x_i$
      - $\theta \leftarrow \theta*grad $          $\rightarrow$梯度上升
      - END FOR
- END FOR



In [None]:
import sys
from pathlib import Path
curr_path = str(Path().absolute())
parent_path = str(Path().absolute().parent)
p_parent_path = str(Path().absolute().parent.parent)
sys.path.append(p_parent_path)
print(f"主目录为：{p_parent_path}")

#### 下面为sci-kit版本

In [None]:
import numpy as np
from sklearn.datasets import fetch_openml

# 加载数据集
mnist = fetch_openml('mnist_784')
X, y = mnist['data'], mnist['target']
X_train = np.array(X[:60000], dtype = float)
y_train = np.array(y[:60000], dtype = float)
X_test = np.array(X[60000:], dtype = float)
y_test = np.array(y[60000:], dtype = float)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

In [None]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(penalty = 'l1', solver = 'saga', tol = 0.1)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
print("Test score with l1 penalty: {:.4f}".format(score))

下面为pytorch版本

In [None]:
from torch.utils.data import DataLoader
from torchvision import datasets
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
import numpy as np

In [None]:
train_dataset = datasets.MNIST(root = p_parent_path + '/datasets', train = True, transform = transforms.ToTensor(), download = False)
test_dataset = datasets.MNIST(root = p_parent_path + 'datasets', train = False, transform = transforms.ToTensor(), download = False)

batch_size = len(train_dataset)
train_loader = DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = True)
X_train, y_train = next(iter(train_loader))
X_test, y_test = next(iter(test_loader))

# 打印前100张图片
images, labels = X_train[:100], y_train[:100]
# 使用images生成宽度为10张图的网格大小
img = torchvision.utils.make_grid(images, nrow = 10)
# cv2.imshow()的格式是(size1,size1,channels),而img的格式是(channels,size1,size1)
# 所以需要使用.transpose()转换，将颜色通道数放至第三维
img = img.numpy().transpose(1, 2, 0)
print(images.shape)
print(labels.reshape(10, 10))
print(img.shape)
plt.imshow(img)
plt.show()

In [None]:
X_train, y_train = X_train.cpu().numpy(), y_train.cpu().numpy()  # tensor转为array形式
X_test, y_test = X_test.cpu().numpy(), y_test.cpu().numpy()   

In [None]:
X_train = X_train.reshape(X_train.shape[0], 784)
X_test = X_test.reshape(X_test.shape[0], 784)

In [None]:
# solver：即使用的优化器，lbfgs：拟牛顿法， sag：随机梯度下降
model = LogisticRegression(solver = 'lbfgs', max_iter = 400)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

In [None]:
ones_col = [[1] for i in range(len(X_train))] # 生成全1列向量
X_train = np.append(X_train, ones_col, axis = 1)
x_train = np.mat(X_train)
X_test = np.append(X_test, ones_col, axis = 1)
x_test = np.mat(X_test)

# Mnsit有0-9十个标记，由于是二分类任务，所以可以将标记0的作为1，其余为0用于识别是否为0的任务
y_train = np.array([1 if y_train[i] == 1 else 0 for i in range(len(y_train))])
y_test = np.array([1 if y_test[i] == 1 else 0 for i in range(len(y_test))])

In [None]:
# solver：即使用的优化器，lbfgs：拟牛顿法， sag：随机梯度下降
model = LogisticRegression(solver = 'lbfgs', max_iter = 100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))