# 机器学习第三次实验

## 理解并描述BP算法原理

BP算法，即反向传播算法（Backpropagation），是一种用于训练人工神经网络的广泛应用的学习算法。其核心目的是通过网络中**权重的优化来最小化损失函数**，使模型的预测值尽可能接近真实值。

其基本原理：

1. **前向传播**：输入数据在神经网络中从输入层向隐藏层再到输出层传播。每一层的神经元接收到上一层的输出，通过加权和并应用激活函数处理生成本层的输出。

2. **计算损失**：在输出层，根据网络的输出和实际的目标值（标签）计算损失。

3. **反向传播**：算法核心，其目标是**计算损失函数关于网络中每个权重的梯度**。这个过程从输出层开始，逆向经过每一层，直到达到输入层。梯度表示损失函数增加最快的方向，因此，通过调整权重与梯度方向相反，可以使损失减小。对于每一层，梯度是通过**链式法则**来计算的。

4. **权重更新**：一旦计算得到梯度，就使用这些梯度来更新网络中的权重。这通常通过梯度下降或其他优化算法（如Adam、RMSprop等）来完成。更新公式大致为：$ W = W - \eta \cdot \frac{\partial L}{\partial W} $，其中 $ \eta $ 是学习率，$ L $ 是损失函数，$ W $ 是待更新的权重。

5. **迭代优化**：重复执行上述步骤（前向传播、计算损失、反向传播、更新权重）直到模型性能达到满意的程度或达到一定的迭代次数。

## BP算法设计

In [1]:
import numpy as np

# 激活函数及其导数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# 初始化参数
def initialize_parameters(input_size, hidden_size, output_size):
    params = {
        "W1": np.random.randn(hidden_size, input_size) * 0.1,
        "b1": np.zeros((hidden_size, 1)),
        "W2": np.random.randn(output_size, hidden_size) * 0.1,
        "b2": np.zeros((output_size, 1))
    }
    return params

# 前向传播
def forward_propagation(X, params):
    Z1 = np.dot(params["W1"], X) + params["b1"]
    A1 = sigmoid(Z1)
    Z2 = np.dot(params["W2"], A1) + params["b2"]
    A2 = sigmoid(Z2)
    cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
    return A2, cache

# 计算损失
def compute_loss(Y, A2):
    m = Y.shape[1]
    cost = -np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2)) / m
    return cost

# 反向传播
def backward_propagation(params, cache, X, Y):
    m = X.shape[1]
    dZ2 = cache["A2"] - Y
    dW2 = np.dot(dZ2, cache["A1"].T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    dA1 = np.dot(params["W2"].T, dZ2)
    dZ1 = dA1 * sigmoid_derivative(cache["A1"])
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
    return grads

# 更新参数
def update_parameters(params, grads, learning_rate):
    params["W1"] -= learning_rate * grads["dW1"]
    params["b1"] -= learning_rate * grads["db1"]
    params["W2"] -= learning_rate * grads["dW2"]
    params["b2"] -= learning_rate * grads["db2"]
    return params

# 模型训练
def model(X, Y, hidden_size, learning_rate, num_iterations):
    input_size = X.shape[0]
    output_size = Y.shape[0]
    params = initialize_parameters(input_size, hidden_size, output_size)
    
    for i in range(num_iterations):
        A2, cache = forward_propagation(X, params)
        cost = compute_loss(Y, A2)
        grads = backward_propagation(params, cache, X, Y)
        params = update_parameters(params, grads, learning_rate)
        if i % 1000 == 0:
            print(f"Iteration {i}: Cost {cost:.4f}")
    
    return params

## 数据集获取

1. 鸢尾花数据集（Iris）：包含150个样本，分为3个类别，每个类别50个样本。每个样本有4个特征，分别是花瓣和花萼的长度和宽度。
2. 葡萄酒数据集（Wine）：包含178个样本，分为3个类别，代表了三种不同的意大利葡萄酒。有13个特征，这些特征是从葡萄酒的化学成分分析中得出的，比如酒精度、苹果酸含量等。

## 编程实践，将算法应用于获取的分类数据集

### 首先应用于鸢尾花数据集

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import warnings

warnings.filterwarnings("ignore")

iris = pd.read_csv("data/iris.csv")
print(iris.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


In [4]:
X = iris.iloc[:, :-1].values
y = iris.iloc[:, -1].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42, shuffle=True)

model = Sequential(
    [
        Dense(10, activation='relu', input_shape=(4,)),
        Dense(3, activation='softmax')
    ]
)

model.compile(optimizer=Adam(learning_rate=0.01), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100, batch_size=10, verbose=1)

loss, accuracy = model.evaluate(X_test, y_test)

print(f"Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### 结果分析

通过数据预处理与模型训练，我构建了一个具有一个隐藏层的简单神经网络并使用`Adam`优化算法，将其应用到鸢尾花数据集上，效果显著。

算法在该数据集上的预测准确度为 $100\%$，损失函数的值为 $0.0177$

### 然后应用于葡萄酒数据集

In [5]:
wine = pd.read_csv("data/wine.csv")
print(wine.head())

   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0.30             2.81             5.68  1.03   
3        3.49                  0.24             2.18             7.80  0.86   
4        2.69                  0.39             1.82             4.32  1.04   

   od280/od315_of_diluted_wines  proline  target  
0          

In [6]:
X = wine.iloc[:, :-1].values
y = wine.iloc[:, -1].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42, shuffle=True)

model = Sequential(
    [
        Dense(20, activation='relu', input_shape=(13,)),
        Dense(3, activation='softmax')
    ]
)

model.compile(optimizer=Adam(learning_rate=0.01), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100, batch_size=20, verbose=1)

loss, accuracy = model.evaluate(X_test, y_test)

print(f"Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

### 结果分析

同样的模型训练步骤，将其应用到葡萄酒数据集上，效果依旧显著。

算法在该数据集上的预测准确度为 $98.15\%$，损失函数的值为 $0.0280$