## Linear regression python multi-dimensional data

Linear Regression with two variables in one dimensional data

 
 
 $$ F(X)=X \times W $$
 $$ C=|| F(X) - Y ||_2^2 + \lambda ||W||_2^2$$

$X_{n \times k}$

$W_{k \times p}$

$Y_{n \times p}$

In [14]:
import numpy as np
import random

Gradient is as follows:
$$ X^T 2 E + \lambda 2 W$$

In [15]:
# 线性回归核心函数 - 使用统一的变量命名风格
def predict(X, weights):
    """
    计算线性回归的预测值
    输入: 特征矩阵 X (n×k) 和权重矩阵 weights (k×p)
    输出: 预测值矩阵 (n×p)
    数学公式: y_pred = X × weights
    """
    return np.matmul(X, weights)

# Lambda = 0: 无正则化，可能过拟合
# Lambda > 0: 有正则化，平衡拟合和复杂度
# Lambda过大: 欠拟合，模型过于简单
def ridge_loss(y_pred, y_true, weights, alpha):
    """
    岭回归损失函数
    输入: 预测值 y_pred, 真实值 y_true, 权重 weights, 正则化参数 alpha
    输出: 总损失值
    数学公式: C = MSE + α∑w²
    """
    mse = np.mean((y_pred - y_true) ** 2)  # 均方误差
    l2_penalty = alpha * np.sum(weights ** 2)  # L2正则化惩罚
    return mse + l2_penalty

def compute_gradient(X, y_pred, y_true, weights, alpha):
    """
    计算权重更新所需的梯度
    输入: 特征矩阵 X, 预测值 y_pred, 真实值 y_true, 权重 weights, 正则化参数 alpha
    输出: 梯度矩阵
    数学公式: ∇weights = Xᵀ × 2(y_pred - y_true) + α × 2weights
    """
    error = y_pred - y_true  # 计算误差
    data_gradient = 2 * np.matmul(X.T, error)  # 数据拟合梯度
    regularization_gradient = alpha * 2 * weights  # 正则化梯度
    return data_gradient + regularization_gradient

In [16]:
def fit(X, y_true, weights, learning_rate, alpha, max_iterations):
    """
    梯度下降训练函数
    输入: 特征矩阵 X, 真实标签 y_true, 初始权重 weights, 学习率 learning_rate, 正则化参数 alpha, 最大迭代次数 max_iterations
    输出: 训练后的权重 weights
    功能: 使用梯度下降算法优化岭回归模型
    """
    for i in range(max_iterations):
        
        y_pred = predict(X, weights)  # 计算预测值
        loss = ridge_loss(y_pred, y_true, weights, alpha)  # 计算损失
        gradient = compute_gradient(X, y_pred, y_true, weights, alpha)  # 计算梯度
        weights = weights - learning_rate * gradient  # 更新权重
        
        if i % 100 == 0:
            print(f"迭代 {i}: 损失 = {loss:.6f}")  # 每100次迭代打印损失值
        
    return weights

To take into account for the biases, we concatenate X by a 1 column, and increase the number of rows in W by one

In [17]:
# 设置数据维度参数
n_samples, n_features, n_outputs = 100, 8, 3  # 样本数, 特征数, 输出维度

# 生成随机数据
X = np.random.random([n_samples, n_features])  # 特征矩阵 (100×8)
weights = np.random.random([n_features, n_outputs])  # 初始权重矩阵 (8×3)
y_true = np.random.random([n_samples, n_outputs])  # 真实标签矩阵 (100×3)

# 设置训练参数
max_iterations = 1000  # 最大迭代次数
learning_rate = 0.0001  # 学习率
alpha = 0.01  # L2正则化参数

# 添加偏置项 (bias term)
X = np.concatenate((X, np.ones((n_samples, 1))), axis=1)  # 在X末尾添加全1列作为偏置项
weights = np.concatenate((weights, np.random.random((1, n_outputs))), axis=0)  # 在weights末尾添加偏置权重

# 开始训练模型
weights = fit(X, y_true, weights, learning_rate, alpha, max_iterations)

迭代 0: 损失 = 5.260350
迭代 100: 损失 = 0.168555
迭代 200: 损失 = 0.148338
迭代 300: 损失 = 0.133698
迭代 400: 损失 = 0.122893
迭代 500: 损失 = 0.114794
迭代 600: 损失 = 0.108635
迭代 700: 损失 = 0.103885
迭代 800: 损失 = 0.100176
迭代 900: 损失 = 0.097245
