## logistic regression multi-dimensional data

 logistic regression multi-dimensional data
 
 
 $$ F(X)=X \times W $$
 $$ H(x)= \frac{1}{1+ e ^{-F(x)}} $$
 $$ C= -\frac{1}{n} \sum_{i,j} (Y \odot log(H(x)) + (1-Y) \odot log(1-H(x)) ) $$

$X_{n \times k}$

$W_{k \times p}$

$Y_{n \times p}$

In [1]:
import numpy as np  # 导入numpy库，用于数值计算 / Import numpy library for numerical computations
import random  # 导入random库，用于随机数生成 / Import random library for random number generation

In [2]:
n, k, p=100, 8, 3  # 定义数据维度：n=样本数，k=特征数，p=类别数 / Define data dimensions: n=samples, k=features, p=classes

In [3]:
X=np.random.random([n,k])  # 生成随机特征矩阵，形状为(n,k) / Generate random feature matrix with shape (n,k)
W=np.random.random([k,p])  # 初始化权重矩阵，形状为(k,p) / Initialize weight matrix with shape (k,p)

y=np.random.randint(p, size=n)  # 生成随机标签，形状为(n,) / Generate random labels with shape (n,)
Y=np.zeros((n,p))  # 创建one-hot编码矩阵，形状为(n,p) / Create one-hot encoding matrix with shape (n,p)
Y[np.arange(n), y]=1  # 将标签转换为one-hot编码 / Convert labels to one-hot encoding

max_itr=5000  # 最大迭代次数 / Maximum number of iterations
alpha=0.01  # 学习率 / Learning rate
Lambda=0.01  # 正则化参数 / Regularization parameter

Gradient is as follows:
$$ X^T (H(x)-Y) + \lambda 2 W$$

In [4]:
# F(x)= w[0]*x + w[1]  # 线性变换函数 / Linear transformation function
def F(X, W):
    return np.matmul(X,W)  # 计算线性组合：X*W / Compute linear combination: X*W

def H(F):
    return 1/(1+np.exp(-F))  # sigmoid激活函数 / sigmoid activation function

def cost(Y_est, Y):
    # 计算交叉熵损失 + L2正则化 / Compute cross-entropy loss + L2 regularization
    E= - (1/n) * (np.sum(Y*np.log(Y_est) + (1-Y)*np.log(1-Y_est)))  + Lambda * np.linalg.norm(W,2)
    return E, np.sum(np.argmax(Y_est,1)==y)/n  # 返回损失和准确率 / Return loss and accuracy

def gradient(Y_est, Y, X):
    # 计算梯度：数据项 + 正则化项 / Compute gradient: data term + regularization term
    return (1/n) * np.matmul(X.T, (Y_est - Y) ) + Lambda * W

In [5]:
def fit(W, X, Y, alpha, max_itr):  # 训练函数 / Training function
    for i in range(max_itr):  # 迭代训练 / Iterative training
        
        F_x=F(X,W)  # 计算线性变换 / Compute linear transformation
        Y_est=H(F_x)  # 应用sigmoid激活 / Apply sigmoid activation
        E, c= cost(Y_est, Y)  # 计算损失和准确率 / Compute loss and accuracy
        Wg=gradient(Y_est, Y, X)  # 计算梯度 / Compute gradient
        W=W - alpha * Wg  # 更新权重 / Update weights
        if i%1000==0:  # 每1000次迭代打印一次 / Print every 1000 iterations
            print(E, c)  # 打印损失和准确率 / Print loss and accuracy
        
    return W, Y_est  # 返回训练后的权重和预测结果 / Return trained weights and predictions

To take into account for the biases, we concatenate X by a 1 column, and increase the number of rows in W by one

In [6]:
X=np.concatenate( (X, np.ones((n,1))), axis=1 )  # 添加偏置项：在X末尾添加一列1 / Add bias term: append column of 1s to X
W=np.concatenate( (W, np.random.random((1,p)) ), axis=0 )  # 添加偏置权重：在W末尾添加一行随机权重 / Add bias weights: append row of random weights to W

W, Y_est = fit(W, X, Y, alpha, max_itr)  # 开始训练模型 / Start training the model

4.9567403100766345 0.38
1.9038314476848 0.47
1.8697051475836861 0.47
1.8455193283395868 0.52
1.8281890508195582 0.51
