## 多元线性回归

多元线性回归是一种统计技术，用于通过多个自变量预测因变量的值。

In [1]:
import numpy
import matplotlib.pyplot
import pandas

matplotlib.pyplot.style.use("matplotlib.mplstyle")

### 将数据转换为矩阵

矩阵$ X $表示形式如下：

$$
X=\begin{pmatrix}
    x_0^{(0)}&x_1^{(0)}&...&x_{n-1}^{(0)} \\
    x_0^{(1)}&x_1^{(1)}&...&x_{n-1}^{(1)} \\
    ... \\
    x_0^{(m-1)}&x_1^{(m-1)}&...&x_{n-1}^{(m-1)} \\
\end{pmatrix}
$$

In [2]:
data = pandas.read_csv("data/202501061035.csv")

X = (data[["size", "rooms"]]).to_numpy()
Y = (data["price"]/1000).to_numpy()

设置变量$ W $如下：

$$
W=\begin{pmatrix}
    w_0 \\
    w_1 \\
    ... \\
    w_{n-1}
\end{pmatrix}
$$

### 多元线性回归相关公式

|名称|定义|
|-|-|
|多元线性函数|$ f_{W,B}(X^{(i)})=W\cdot{X^{(i)}}+B $|
|均方误差公式|$ MSE(W,B)=\frac{1}{2n}\sum_{i=1}^n(f_{W,B}(X^{(i)}) - \hat{Y}^{(i)})^2 $|
|梯度下降公式|$ W=W-\alpha\cdot\frac{\partial{MSE(W,B)}}{\partial{W}},B=B-\alpha\cdot\frac{\partial{MSE(W,B)}}{\partial{B}} $|

In [3]:
import math

def compute_mse(X, Y, w, b):
    n = X.shape[0]
    f_wb = numpy.dot(X, w) + b
    mse = numpy.sum((f_wb - Y) ** 2) / (2 * n)
    return mse

def compute_gradient(X, Y, w, b):
    n = X.shape[0]
    f_wb = numpy.dot(X, w) + b
    dw = numpy.dot(X.T, (f_wb - Y)) / n
    db = numpy.sum(f_wb - Y) / n
    return dw, db

def compute_gradient_descent(X, Y, alpha, iters):
    tw, tb = numpy.zeros(X.shape[1]), 0
    history = []
    for i in range(iters):
        dw, db = compute_gradient(X, Y, tw, tb)
        tw = tw - alpha * dw
        tb = tb - alpha * db
        if i % math.ceil(iters / 20) == 0:
            mse = compute_mse(X, Y, tw, tb)
            history.append([i, mse])
            print(f"mse:{mse:8.2f}")
    return tw, tb, numpy.array(history)

In [None]:
fw, fb, history = compute_gradient_descent(X, Y, 5.0e-10, 4000)

print(f"w: {fw}, b: {fb}")

In [None]:
matplotlib.pyplot.plot(history[:, 0], history[:, 1], linestyle='-', color='b', label='MSE Line')
matplotlib.pyplot.scatter(history[:, 0], history[:, 1], marker="x", c="r")
matplotlib.pyplot.title("Gradient Descent MSE History")
matplotlib.pyplot.ylabel("MSE")
matplotlib.pyplot.xlabel("Iteration")
matplotlib.pyplot.show()

In [None]:
sample = [2132, 4]
price = numpy.dot(sample, fw) + fb
print(f"Price for {sample[0]} sqft and {sample[1]} rooms: {price:.2f}k")