# Logistic Regression With CrossEntropy Loss
使用 Numpy 实现交叉熵损失的逻辑回归，包括前向传播、反向传播、损失计算、训练过程

In [1]:
!pip install numpy

Collecting numpy
  Downloading numpy-2.0.2-cp39-cp39-macosx_14_0_arm64.whl.metadata (60 kB)
Downloading numpy-2.0.2-cp39-cp39-macosx_14_0_arm64.whl (5.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: numpy
Successfully installed numpy-2.0.2


In [2]:
import numpy as np

## Cross Entropy Loss
二分类的交叉熵损失
$$
\mathcal{L} = -(y_{true}\log(y_{pred}) + (1-y_{true})\log(1-y_{pred}))
$$
交叉熵损失的导数
$$
\mathcal{L}^\prime = -(\frac{y_{true}}{y_{pred}} - \frac{1-y_{true}}{1-y_{pred}}) = \frac{y_{pred} - y_{true}}{y_{pred}(1 - y_{pred})}
$$
在二分类任务中不需要通过函数来定义交叉熵损失的导数，因为sigmoid会和导数的分母抵消。

In [None]:
def binary_crossentropy_loss(y_true, y_pred):
    # 防止 log(0) 出现
    epsilon = 1e-7
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon) # clip的作用：限制 y_pred 的范围
    # loss
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

## Sigmoid

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(fx):
    return fx * (1 - fx)

## LogisticRegression

In [None]:
class LogisticRegression:
    def __init__(self, in_dim, hidden_dim, learning_rate=0.01):
        self.in_dim = in_dim
        self.learning_rate = learning_rate
        self.w1 = np.random.randn(in_dim, hidden_dim)
        self.b1 = np.random.randn(1, hidden_dim)
        self.w2 = np.random.randn(hidden_dim, 1)
        self.b2 = np.random.randn(1, 1)

    def forward(self, x):
        # x: [bs, in_dim]
        self.z1 = np.dot(x, self.w1) + self.b1
        self.a1 = sigmoid(self.z1) # [bs, hidden_dim]
        self.z2 = np.dot(self.a1, self.w2) + self.b2
        self.a2 = sigmoid(self.z2) # [bs, 1]
        return self.a2

    def backward(self, x, y_true, y_pred):
        # layer2
        layer2_error = y_pred - y_true # [bs, 1] # 对 z2 求导
        dw2 = np.dot(self.a1.T, layer2_error) # [hidden_dim, 1]
        db2 = np.sum(layer2_error, axis=0, keepdims=True) # [1, 1]
        # layer1
        layer1_error = np.dot(layer2_error, self.w2.T) * sigmoid_derivative(self.z1) # [bs, hidden_dim] # 对 z1 求导：先对a1求导，再对 z1 求导
        dw1 = np.dot(x.T, layer1_error) # [in_dim, hidden_dim]
        db1 = np.sum(layer1_error, axis=0, keepdims=True) # [1, hidden_dim]
        # update
        self.w2 -= self.learning_rate * dw2
        self.b2 -= self.learning_rate * db2
        self.w1 -= self.learning_rate * dw1
        self.b1 -= self.learning_rate * db1