# Stage 1: Automatically compute derivatives

在这个阶段，我们将创建自动微分的机制。这里所说的自动微分指的是由计算机(而不是人)来计算导数。  
具体来说，就是指在对某个计算(函数)编码后，由计算机自动求出该计算的导数的机制。


In [1]:
import numpy as np

## Step 1: Variables as Boxes

将变量比作"箱子"
- "箱子"和数据是不同的东两
- "箱子"里可以存放数据(=赋值)
- 朝"箱子"里看一看就能知道数据是什么(=引用)

In [2]:
class Variable:
    def __init__(self, data):
        self.data = data

In [3]:
data = np.array(1.0)
x = Variable(data)
print(x.data) # 1.0

x.data = np.array(2.0)
print(x.data) # 2.0

1.0
2.0


## Step 2: Function to Create A Variable

函数是定义一个变量与另一个变量之间的对应关系的规则
- 在 Function 类中实现的方法，其输入应为 Variable 实例，输出应为Variable 实例
- Variable 实例的实际数据存在于实例变量 data 中

In [4]:
# 函数的基类
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        return output

    def forward(self, x):
        raise NotImplementedError()

In [5]:
class Square(Function):
    def forward(self, x):
        return x ** 2

In [6]:
x = Variable(np.array(10))
f = Square()
y = f(x)
print(type(y)) # <class '__main__.Variable'>
print(y.data) # 100

<class '__main__.Variable'>
100


## Step 3: Connecting Functions

函数连续调用，可以视作一个大函数(复合函数)  
计算图：  
x -> A -> a -> B -> b -> C -> y

In [7]:
class Exp(Function):
    def forward(self, x):
        return np.exp(x)

In [8]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)
print(y.data) # 1.648721270700128

1.648721270700128


## Step 4: Numerical Differentiation

$$ \frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} $$

将h设为一个很小的数，计算导数的近似值(数值微分)  
中心差分误差更小：
$$ \frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x-h)}{2h} $$

数值微分由于"精度丢失"，结果始终容易包含误差，更严重的是计算成本高昂

In [9]:
def numerical_diff(f, x, eps=1e-4):
    x0 = Variable(x.data - eps)
    x1 = Variable(x.data + eps)
    y0 = f(x0)
    y1 = f(x1)
    return (y1.data - y0.data) / (2 * eps)

In [10]:
f = Square()
x = Variable(np.array(2.0))
dy = numerical_diff(f, x)
print(dy) # 4.000000000004

4.000000000004


In [11]:
# 复合函数
def f(x):
    A = Square()
    B = Exp()
    C = Square()
    return C(B(A(x)))

x = Variable(np.array(0.5))
dy = numerical_diff(f, x)
print(dy) # 3.2974426293330694

3.2974426293330694


## Step 5: Theory of Backpropagation

链式法则:
$\frac{dy}{dx}=\frac{dy}{dy} \frac{dy}{db} \frac{db}{da} \frac{da}{dx}$

y对各变量的导数可以通过计算图的反向传播得到 (沿着输出到输入的方向，传播一次即可得到)
<center>
<table>
  <tr>
    <td><img src="./res/计算图.png" width="400"/></td>
    <td><img src="./res/计算图2.png" width="400"/></td>
  </tr>
</table>
</center>

如上图所示，正向传播与反向传播存在明确的对应关系，我们可以认为变量有普通值和导数值，函数有普通计算(正向传播)和求导计算(反向传播)

反向传播是需要用到正向传播中使用的数据，因此需要先进行正向传播，再进行反向传播，并且存储各个函数输入的变量值

## Step 6: Backpropagation by Hand

In [12]:
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None  # 增加grad保存梯度

In [13]:
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        self.input = input  # 保存输入（input保存到了实例变量中）
        return output

    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):  # gy是从上游传来的导数
        raise NotImplementedError()

In [20]:
class Square(Function):
    def forward(self, x):
        return x ** 2

    def backward(self, gy):
        x = self.input.data
        gx = 2 * x * gy  # y = x^2的导数是2x
        return gx

class Exp(Function):
    def forward(self, x):
        return np.exp(x)

    def backward(self, gy):
        x = self.input.data
        gx = np.exp(x) * gy  # y = exp(x)的导数是exp(x)
        return gx

In [15]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

In [16]:
# 反向调用backward方法
y.grad = np.array(1.0)
b.grad = C.backward(y.grad)
a.grad = B.backward(b.grad)
x.grad = A.backward(a.grad)
print(x.grad) # 3.297442541400256

3.297442541400256


## Step 7: Automation of Backpropagation

Define-by-Run: 动态计算图  
反向传播自动化：无论普通的计算流程(正向传播)是什么样的，反向传播都能自动进行

从函数的角度，变量是以输入和输出的形式存在的  
从变量的角度，变量是由函数创造的(则没有作为创造者的函数，则认为由非函数创造)
<center>
<table>
  <tr>
    <td><img src="./res/func_var.png" width="400"/></td>
  </tr>
</table>
</center>

动态计算图的原理就是在执行实际的计算时，在变量这个"箱子"里记录它的"连接"(creator)。Chainer和PyTorch也采用了类似的机制

In [19]:
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None  # 增加creator保存生成变量的函数

    def set_creator(self, func):
        self.creator = func

class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(y)
        output.set_creator(self)  # 设置生成变量的函数
        self.input = input
        self.output = output  # 设置输出变量
        return output
    
    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):
        raise NotImplementedError()

In [21]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

assert y.creator == C
assert y.creator.input == b
assert y.creator.input.creator == B
assert y.creator.input.creator.input == a
assert y.creator.input.creator.input.creator == A
assert y.creator.input.creator.input.creator.input == x

In [22]:
'''
获取函数；获取函数的输入；调用函数的backward方法
'''
y.grad = np.array(1.0)
C = y.creator
b = C.input 
b.grad = C.backward(y.grad)
print(b.grad) # 2.568050833375483

B = b.creator
a = B.input
a.grad = B.backward(b.grad)
print(a.grad) # 3.297442541400256

A = a.creator
x = A.input
x.grad = A.backward(a.grad)
print(x.grad) # 3.297442541400256

2.568050833375483
3.297442541400256
3.297442541400256


从一个变量到前一个变量的反向传播逻辑是相同的
1. 获取函数
2. 获取函数的输入
3. 调用函数的backward方法

In [23]:
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None  # 增加creator保存生成变量的函数

    def set_creator(self, func):
        self.creator = func

    def backward(self):
        f = self.creator  # 1. 获取函数
        if f is not None:
            x = f.input  # 2. 获取函数的输入
            x.grad = f.backward(self.grad)  # 3. 调用函数的backward方法
            x.backward()  # 递归调用前面的变量的backward方法

In [24]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

y.grad = np.array(1.0)
y.backward()
print(x.grad) # 3.297442541400256

3.297442541400256


## Step 8: From Recusion to Loop

使用循环代替递归，后续更容易拓展到更复杂的计算图，而且循环的效率更高

In [25]:
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func
    
    def backward(self):
        funcs = [self.creator]
        while funcs:
            f = funcs.pop()  # 获取函数
            x, y = f.input, f.output  # 获取函数的输入和输出
            x.grad = f.backward(y.grad)  # 调用函数的backward方法
            if x.creator is not None:
                funcs.append(x.creator)  # 将前一个函数添加到列表中

In [26]:
A = Square()
B = Exp()
C = Square()

x = Variable(np.array(0.5))
a = A(x)
b = B(a)
y = C(b)

y.grad = np.array(1.0)
y.backward()
print(x.grad) # 3.297442541400256

3.297442541400256


## Step 9: Making Funcions More Useful

前述函数是通过python的类实现的，需要先创建类的实例，再调用实例的方法，比较啰嗦  
通过封装成python函数，使用更自然，还支持连续调用

In [27]:
def square(x):
    return Square()(x)

def exp(x):
    return Exp()(x)

In [28]:
x = Variable(np.array(0.5))
a = square(x)
b = exp(a)
y = square(b)

y.grad = np.array(1.0)
y.backward()
print(x.grad) # 3.297442541400256

x = Variable(np.array(0.5))
y = square(exp(square(x)))
y.grad = np.array(1.0)
y.backward()
print(x.grad) # 3.297442541400256

3.297442541400256
3.297442541400256


为了减少用户在反向传播方面所作的工作(省略y.grad = np.array(1.0)这一步)，在Variable类中添加部分代码

In [29]:
class Variable:
    def __init__(self, data):
        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func
    
    def backward(self):
        if self.grad is None:  # 如果grad为None，则自动生成形状与data相同的导数
            self.grad = np.ones_like(self.data)
        funcs = [self.creator]
        while funcs:
            f = funcs.pop()  # 获取函数
            x, y = f.input, f.output  # 获取函数的输入和输出
            x.grad = f.backward(y.grad)  # 调用函数的backward方法
            if x.creator is not None:
                funcs.append(x.creator)  # 将前一个函数添加到列表中

In [30]:
x = Variable(np.array(0.5))
y = square(exp(square(x)))
y.backward()
print(x.grad)  # 3.297442541400256

3.297442541400256


DeZero的Variable只支持ndarray类型的数据，当用户把ndarray以外的数据传给Variable时，立即抛出异常

In [31]:
class Variable:
    def __init__(self, data):
        if data is not None:
            if not isinstance(data, np.ndarray):
                raise TypeError(f'{type(data)} is not supported')
        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func
    
    def backward(self):
        if self.grad is None:  # 如果grad为None，则自动生成形状与data相同的导数
            self.grad = np.ones_like(self.data)
        funcs = [self.creator]
        while funcs:
            f = funcs.pop()  # 获取函数
            x, y = f.input, f.output  # 获取函数的输入和输出
            x.grad = f.backward(y.grad)  # 调用函数的backward方法
            if x.creator is not None:
                funcs.append(x.creator)  # 将前一个函数添加到列表中

In [None]:
x = Variable(np.array(1.0))  # OK
x = Variable(None)  # OK

x = Variable(1.0)  # TypeError: <class 'float'> is not supported

NumPy的运行方式导致：如果x是零维的ndarray，x ** 2的结果是np.float64类型  
导致DeZero函数的输出Variable可能不是ndarray类型，因此需要对输出进行处理

In [33]:
x = np.array([1.0])
y = x ** 2
print(type(x), x.ndim)  # <class 'numpy.ndarray'> 1
print(type(y))  # <class 'numpy.ndarray'>

x = np.array(1.0)
y = x ** 2
print(type(x), x.ndim)  # <class 'numpy.ndarray'> 0
print(type(y))  # <class 'numpy.float64'>

<class 'numpy.ndarray'> 1
<class 'numpy.ndarray'>
<class 'numpy.ndarray'> 0
<class 'numpy.float64'>


In [34]:
def as_array(x):
    if np.isscalar(x):  # 如果是标量，则转换为数组
        return np.array(x)
    return x

In [35]:
class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)  # foward后得到的y不一定是数组
        output = Variable(as_array(y))  # 转换为数组后再生成Variable实例
        output.set_creator(self)
        self.input = input
        self.output = output
        return output

    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):
        raise NotImplementedError()

## Step 10: Perfrom the test

stage1的代码实现了自动计算导数的功能，接下来将进行测试

梯度检验：将数值微分的结果与反向传播的结果进行比较

In [36]:
class Variable:
    def __init__(self, data):
        if data is not None:
            if not isinstance(data, np.ndarray):
                raise TypeError('{} is not supported'.format(type(data)))

        self.data = data
        self.grad = None
        self.creator = None

    def set_creator(self, func):
        self.creator = func

    def backward(self):
        if self.grad is None:
            self.grad = np.ones_like(self.data)

        funcs = [self.creator]
        while funcs:
            f = funcs.pop()
            x, y = f.input, f.output
            x.grad = f.backward(y.grad)

            if x.creator is not None:
                funcs.append(x.creator)


def as_array(x):
    if np.isscalar(x):
        return np.array(x)
    return x


class Function:
    def __call__(self, input):
        x = input.data
        y = self.forward(x)
        output = Variable(as_array(y))
        output.set_creator(self)
        self.input = input
        self.output = output
        return output

    def forward(self, x):
        raise NotImplementedError()

    def backward(self, gy):
        raise NotImplementedError()


class Square(Function):
    def forward(self, x):
        y = x ** 2
        return y

    def backward(self, gy):
        x = self.input.data
        gx = 2 * x * gy
        return gx


def square(x):
    return Square()(x)


class Exp(Function):
    def forward(self, x):
        y = np.exp(x)
        return y

    def backward(self, gy):
        x = self.input.data
        gx = np.exp(x) * gy
        return gx


def exp(x):
    return Exp()(x)

In [37]:
import unittest

def numerical_diff(f, x, eps=1e-4):
    x0 = Variable(x.data - eps)
    x1 = Variable(x.data + eps)
    y0 = f(x0)
    y1 = f(x1)
    return (y1.data - y0.data) / (2 * eps)


class SquareTest(unittest.TestCase):
    def test_forward(self):
        x = Variable(np.array(2.0))
        y = square(x)
        expected = np.array(4.0)
        self.assertEqual(y.data, expected)
    
    def test_backward(self):
        x = Variable(np.array(3.0))
        y = square(x)
        y.backward()
        expected = np.array(6.0)
        self.assertEqual(x.grad, expected)

    def test_gradient_check(self):
        x = Variable(np.random.rand(1))  # 随机生成一个数
        y = square(x)
        y.backward()
        num_grad = numerical_diff(square, x)
        flg = np.allclose(x.grad, num_grad)
        self.assertTrue(flg)

In [38]:
unittest.main(argv=[''], exit=False)

...
----------------------------------------------------------------------
Ran 3 tests in 0.009s

OK


<unittest.main.TestProgram at 0x251ff340248>

计算机求导的方法：
1. 数值微分：通过增加一个很小的数，计算导数的近似值
2. 符号微分：通过解析式计算导数的精确值
3. 自动微分：
   - 反向模式：从输出到输入的方向，计算导数(反向传播)
   - 前向模式：从输入到输出的方向，计算导数