# Stage 1: Automatically compute derivatives

In [1]:
import numpy as np

## Step 1: Variables as boxes

将变量比作"箱子"
- "箱子"和数据是不同的东两
- "箱子"里可以存放数据(=赋值)
- 朝"箱子"里看一看就能知道数据是什么(=引用)

In [1]:
class Variable:
    def __init__(self, data):
        self.data = data

In [4]:
data = np.array(1.0)
x = Variable(data)
print(x.data) # 1.0

x.data = np.array(2.0)
print(x.data) # 2.0

1.0
2.0


## Step 2: Function to create a variable

函数是定义一个变量与另一个变量之间的对应关系的规则
- 在 Function 类中实现的方法，其输入应为 Variable 实例，输出应为Variable 实例
- Variable 实例的实际数据存在于实例变量 data 中

In [7]:
# 函数的基类
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        return output

    def forward(self, x):
        raise NotImplementedError()

In [8]:
class Square(Function):
    def forward(self, x):
        return x ** 2

In [9]:
x = Variable(np.array(10))
f = Square()
y = f(x)
print(type(y)) # <class '__main__.Variable'>
print(y.data) # 100

<class '__main__.Variable'>
100


## Step 3: Connecting Functions

函数连续调用，可以视作一个大函数(复合函数)  
计算图：  
x -> A -> a -> B -> b -> C -> y

In [10]:
class Exp(Function):
    def forward(self, x):
        return np.exp(x)

In [11]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)
print(y.data) # 1.648721270700128

1.648721270700128


## Step 4: Numerical Differentiation

$$ \frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$

将h设为一个很小的数，计算导数的近似值(数值微分)  
中心差分误差更小：
$$ \frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x-h)}{2h} $$

数值微分由于"精度丢失"，结果始终容易包含误差，更严重的是计算成本高昂

In [12]:
def numerical_diff(f, x, eps=1e-4):
    x0 = Variable(x.data - eps)
    x1 = Variable(x.data + eps)
    y0 = f(x0)
    y1 = f(x1)
    return (y1.data - y0.data) / (2 * eps)

In [13]:
f = Square()
x = Variable(np.array(2.0))
dy = numerical_diff(f, x)
print(dy) # 4.000000000004

4.000000000004


In [14]:
# 复合函数
def f(x):
    A = Square()
    B = Exp()
    C = Square()
    return C(B(A(x)))

x = Variable(np.array(0.5))
dy = numerical_diff(f, x)
print(dy) # 3.2974426293330694

3.2974426293330694


## Step 5: Theory of Backpropagation

链式法则:
$\frac{dy}{dx}=\frac{dy}{dy} \frac{dy}{db} \frac{db}{da} \frac{da}{dx}$

y对各变量的导数可以通过计算图的反向传播得到 (沿着输出到输入的方向，传播一次即可得到)
<center>
<table>
  <tr>
    <td><img src="res/计算图.png" width="400"/></td>
    <td><img src="res/计算图2.png" width="400"/></td>
  </tr>
</table>
</center>

如上图所示，正向传播与反向传播存在明确的对应关系，我们可以认为变量有普通值和导数值，函数有普通计算(正向传播)和求导计算(反向传播)

反向传播是需要用到正向传播中使用的数据，因此需要先进行正向传播，再进行反向传播，并且存储各个函数输入的变量值

## Step 6: Backpropagation by Hand

In [2]:
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None  # 增加grad保存梯度

In [3]:
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        self.input = input  # 保存输入（input保存到了实例变量中）
        return output

    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):  # gy是从上游传来的导数
        raise NotImplementedError()

In [4]:
class Square(Function):
    def forward(self, x):
        return x ** 2

    def backward(self, gy):
        x = self.input.data
        gx = 2 * x * gy  # y = x^2的导数是2x
        return gx

class Exp(Function):
    def forward(self, x):
        return np.exp(x)

    def backward(self, gy):
        x = self.input.data
        gx = np.exp(x) * gy  # y = exp(x)的导数是exp(x)
        return gx

In [5]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

In [6]:
# 反向调用backward方法
y.grad = np.array(1.0)
b.grad = C.backward(y.grad)
a.grad = B.backward(b.grad)
x.grad = A.backward(a.grad)
print(x.grad) # 3.297442541400256

3.297442541400256


## Step 7: Automation of Backpropagation

## Step 8: From Recusion to Loop

## Step 9: Making Funcions More Useful

## Step 10: Perfrom the test