# Logistic Regression With CrossEntropy Loss
使用 Numpy 实现交叉熵损失的逻辑回归，包括前向传播、反向传播、损失计算、训练过程

In [2]:
!pip install numpy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [14]:
!which pip
!python -m pip install loguru

/Users/mkpk/miniconda3/bin/pip


In [15]:
import numpy as np
from loguru import logger

ModuleNotFoundError: No module named 'loguru'

## Cross Entropy Loss
二分类的交叉熵损失
$$
\mathcal{L} = -(y_{true}\log(y_{pred}) + (1-y_{true})\log(1-y_{pred}))
$$
交叉熵损失的导数
$$
\mathcal{L}^\prime = -(\frac{y_{true}}{y_{pred}} - \frac{1-y_{true}}{1-y_{pred}}) = \frac{y_{pred} - y_{true}}{y_{pred}(1 - y_{pred})}
$$
在二分类任务中不需要通过函数来定义交叉熵损失的导数，因为sigmoid会和导数的分母抵消。

In [16]:
def binary_crossentropy_loss(y_true, y_pred):
    # 防止 log(0) 出现
    epsilon = 1e-7
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)  # clip的作用：限制 y_pred 的范围
    # loss
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

## Sigmoid

In [17]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def sigmoid_derivative(fx):
    return fx * (1 - fx)

## LogisticRegression

In [18]:
class LogisticRegression:
    def __init__(self, in_dim, hidden_dim, learning_rate=0.01):
        self.in_dim = in_dim
        self.learning_rate = learning_rate
        self.w1 = np.random.randn(in_dim, hidden_dim)
        self.b1 = np.random.randn(1, hidden_dim)
        self.w2 = np.random.randn(hidden_dim, 1)
        self.b2 = np.random.randn(1, 1)

    def forward(self, x):
        # x: [bs, in_dim]
        self.z1 = np.dot(x, self.w1) + self.b1
        self.a1 = sigmoid(self.z1)  # [bs, hidden_dim]
        self.z2 = np.dot(self.a1, self.w2) + self.b2
        self.a2 = sigmoid(self.z2)  # [bs, 1]
        return self.a2

    def backward(self, x, y_true, y_pred):
        # layer2
        error_layer2 = y_pred - y_true  # [bs, 1] # 对 z2 求导
        dw2 = np.dot(self.a1.T, error_layer2)  # [hidden_dim, 1]
        db2 = np.sum(error_layer2, axis=0, keepdims=True)  # [1, 1]
        # layer1
        error_layer1 = np.dot(error_layer2, self.w2.T) * sigmoid_derivative(
            self.a1)  # [bs, hidden_dim] # 对 z1 求导：先对a1求导，再对 z1 求导
        # sigmoid_derivative的参数是sigmoid(input)，所以输入 self.a1
        dw1 = np.dot(x.T, error_layer1)  # [in_dim, hidden_dim]
        db1 = np.sum(error_layer1, axis=0, keepdims=True)  # [1, hidden_dim]
        # update
        self.w2 -= self.learning_rate * dw2
        self.b2 -= self.learning_rate * db2
        self.w1 -= self.learning_rate * dw1
        self.b1 -= self.learning_rate * db1

    def calculate_loss(self, X, y):
        output = self.forward(X)
        loss = binary_crossentropy_loss(y_true=y, y_pred=output)
        self.backward(X, y_true=y, y_pred=output)
        return loss

    def train(self, X, y, epochs):
        for epoch in range(epochs):
            loss = self.calculate_loss(X, y)
            if epoch % 100 == 0:
                print(f"Epoch {epoch}/{epochs}, Loss: {loss:.4f}")

In [30]:
# 示例数据
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])  # 输入
y = np.array([[0], [1], [1], [0]])  # 二分类目标

# 创建神经网络
input_size = X.shape[1]
hidden_size = 4
output_size = 1

nn = LogisticRegression(input_size, hidden_size, output_size)

# 训练网络
nn.train(X, y, epochs=1000)

# 测试
print("Testing the network:")
for i in range(len(X)):
    output = nn.forward(X[i:i + 1])
    pred = 1 if output[0][0] > 0.5 else 0
    res = bool(X[i][1] == int(pred))
    print(f"Input: {X[i]}, Predicted Output: {output[0][0]:.4f}, Result: {res}")

Epoch 0/1000, Loss: 1.1382
Epoch 100/1000, Loss: 0.3747
Epoch 200/1000, Loss: 0.0287
Epoch 300/1000, Loss: 0.0115
Epoch 400/1000, Loss: 0.0070
Epoch 500/1000, Loss: 0.0050
Epoch 600/1000, Loss: 0.0038
Epoch 700/1000, Loss: 0.0031
Epoch 800/1000, Loss: 0.0026
Epoch 900/1000, Loss: 0.0023
Testing the network:
Input: [0 0], Predicted Output: 0.0004, Result: True
Input: [0 1], Predicted Output: 0.9987, Result: True
Input: [1 0], Predicted Output: 0.9974, Result: False
Input: [1 1], Predicted Output: 0.0036, Result: False
