# 两层全连接神经网络
- 输入1000个节点，隐藏层100个，输出10个
- 只考虑W不考虑b
- 全连接ReLU神经网络

---

- $h = W_1X$
- $h_{relu} = max(0, h)$
- $y_{pred} = W_2h_{relu}$
- $f = || y - y_{pred} ||^2_F$

---
- $\frac{\partial f}{\partial y_{pred}} = 2(y_{pred} - y)$
- $\frac{\partial f}{\partial h_{relu}} = \frac{\partial f}{\partial y_{pred}}W_2^T$
- $\frac{\partial f}{\partial W_2} = h_{relu}^T \frac{\partial f}{\partial y_{pred}}$
- $\frac{\partial f}{\partial h} = \frac{\partial f}{\partial h_{relu}} \odot \sigma(h)$
- $\frac{\partial f}{\partial x} = x^T\frac{\partial f}{\partial h}$

In [1]:
import numpy as np
import torch
import matplotlib.pyplot as plt

In [2]:
BATCH = 64
EPOCH = 500
N_in, N_hidden, N_out = 1000, 100, 10
LR = 1e-6

## 一. Numpy手动计算梯度

In [7]:
x = np.random.randn(BATCH, N_in)
y = np.random.randn(BATCH, N_out)

w1 = np.random.randn(N_in, N_hidden)
w2 = np.random.randn(N_hidden, N_out)

for it in range(EPOCH):
    # forward pass
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # compute loss
    loss = np.square(y_pred - y).sum()
    if it % 50 == 0: print("Epoch {}: Loss {}".format(it, loss))

    # backward pass
    grad_y_pred = 2 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # update weights
    w1 -= LR * grad_w1
    w2 -= LR * grad_w2

Epoch 0: Loss 35408475.5708653
Epoch 50: Loss 13210.723218441559
Epoch 100: Loss 424.07916714308783
Epoch 150: Loss 24.458366132394872
Epoch 200: Loss 1.7288604054217553
Epoch 250: Loss 0.13823869620770285
Epoch 300: Loss 0.012206723331863328
Epoch 350: Loss 0.0011717242860535743
Epoch 400: Loss 0.0001203650115225353
Epoch 450: Loss 1.3028675705281583e-05
