<a href="https://colab.research.google.com/github/duwei0227/machine_learning/blob/master/chapter03-Tensor%E5%92%8CAutograd/Autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### AutoGrad
能够根据输入和前向传播过程自动构建计算图，并执行反向传播

#### requires_grad
autograd记录对tensor的操作记录用来构建计算图。   
Variable提供了大部分tensor支持的函数，但不支持inplace函数，因这些函数会修改tensor本身；而在反向传播中，variable需要缓存原来的tensor来计算反向传播的梯度。如果要计算各个节点的梯度。只需要调用根节点的backward方法，autograd会自动沿着计算图反向传播，计算每一个叶子节点的梯度。  
  
    
  variable.backward(gradient=None, retain_graph=None, create_graph=None)：
  * grad_variables: 形状与variable一致，对于y.backward(),grad_variables相当于链式法则 $\frac{dz}{dx} = \frac{dz}{dy} * \frac{dy}{dx}$ 中的$\frac{dz}{dy}$  
  grad_variables也可以是tensor或序列。
  * retain_graph: 反向传播需要缓存一些结果，反向传播后，这些缓存会被清空，可通过指定这个参数不清空缓存，用来多次反向传播  
  * create_graph：对反向传播过程再次构建计算图，可通过 backward of backward实现高阶求导

In [0]:
from __future__ import print_function
import torch

In [0]:
# 在创建tensor的时候指定 requires_grad
a = torch.randn(3, 4, requires_grad=True)
# 或者
b = torch.randn(3, 4).requires_grad_()
# 或者
c = torch.randn(3, 4)
c.requires_grad = True

In [0]:
a, b, c

(tensor([[ 0.7523,  2.1689, -1.2582, -1.3262],
         [ 0.1519,  0.5212,  0.5778, -0.0766],
         [ 0.2492, -0.4664, -1.9198, -0.2537]], requires_grad=True),
 tensor([[-0.5903, -0.1624, -0.4211, -1.1536],
         [-0.8510,  0.7200, -2.5447,  1.7686],
         [-0.0898,  0.4605, -0.3195, -0.9958]], requires_grad=True),
 tensor([[-1.8250, -0.0493,  2.7586,  0.7275],
         [ 1.4920, -0.0932, -0.5842, -1.8870],
         [ 0.5982,  2.0370,  0.5853,  0.0427]], requires_grad=True))

In [0]:
d = torch.zeros(3, 4, requires_grad=True)
d

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], requires_grad=True)

In [0]:
# e = a + d
e = a.add(d)
e

tensor([[ 0.7523,  2.1689, -1.2582, -1.3262],
        [ 0.1519,  0.5212,  0.5778, -0.0766],
        [ 0.2492, -0.4664, -1.9198, -0.2537]], grad_fn=<AddBackward0>)

In [0]:
f = e.sum()
f.backward()  # 反向传播

In [0]:
a.grad

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

#### 示例：
计算下面这个函数的导函数：
$$ y = x^2 * e^x$$
它的导函数是：
$$ \frac{dy}{dx} = 2x * e^x + x^2 * e^x$$

In [0]:
def f(x):
  """计算y"""
  y = x ** 2 * torch.exp(x)
  return y

In [0]:
def gradf(x):
  """手动求导函数"""
  dx = 2 * x * torch.exp(x) + x ** 2 * torch.exp(x)
  return dx

In [0]:
x = torch.randn(3, 4, requires_grad=True)
y = f(x)
y

tensor([[0.0254, 2.8214, 0.0041, 0.2447],
        [0.0294, 0.0088, 0.8906, 2.2071],
        [0.2332, 0.0055, 0.5210, 0.9136]], grad_fn=<MulBackward0>)

In [0]:
y.backward(torch.ones(y.size()))  # gradient形状与y一致
x.grad

tensor([[-0.2667,  8.3948,  0.1358, -0.4514],
        [-0.2829, -0.1701,  3.5341,  6.9426],
        [-0.4551,  0.1589, -0.1168,  3.5998]])

In [0]:
gradf(x)

tensor([[-0.2667,  8.3948,  0.1358, -0.4514],
        [-0.2829, -0.1701,  3.5341,  6.9426],
        [-0.4551,  0.1589, -0.1168,  3.5998]], grad_fn=<AddBackward0>)

In [0]:
x = torch.ones(1)
b = torch.rand(1, requires_grad=True)
w = torch.rand(1, requires_grad=True)

y = w * x
z = y + b


In [0]:
x.requires_grad, w.requires_grad, b.requires_grad

(False, True, True)

In [0]:
# y依赖于需要求导的w
y.requires_grad

True

In [0]:
x.is_leaf, w.is_leaf, b.is_leaf

(True, True, True)

In [0]:
# grad_fn 可以查看这个variab的反向传播函数
z.grad_fn

<AddBackward0 at 0x7f540423db00>

In [0]:
# next_functions保存grad_fn的输入，是一个tuple，tuple的元素也是Function
# z = y + b = w * x + b
z.grad_fn.next_functions

((<MulBackward0 at 0x7f540423db38>, 0),
 (<AccumulateGrad at 0x7f540423d668>, 0))

In [0]:
z.grad_fn.next_functions[0][0] == y.grad_fn

True

In [0]:
y.grad_fn.next_functions

((<AccumulateGrad at 0x7f540423d7f0>, 0), (None, 0))

In [0]:
# 叶子节点的grad_fn是None
w.grad_fn, b.grad_fn

(None, None)

计算w的梯度时，需要用到x的数值(${\partial y\over \partial w} = x $)，这些数值在前向传播过程中会保存成buffer，在计算完梯度后会自动清空。为了能够多次反向传播需要指定retain_graph来保留这些buffer

In [0]:
z.backward(retain_graph=True)
w.grad

tensor([1.])

In [0]:
# 多次反向传播，梯度累加，，这也就是w中AccumulateGrad标识的含义
z.backward()
w.grad

tensor([2.])

Pytor使用的是动态图，它的计算在每次前向传播时都是从头开始构建，所以它能够使用python控制语句，根据需求创建计算图。图在运行时构建

In [0]:
def abs(x):
  if x.data[0] > 0: return x
  else: return -x

In [0]:
x = torch.ones(1, requires_grad=True)
y = abs(x)
y.backward()
x.grad

tensor([1.])

In [0]:
x = -1 * torch.ones(1)
x = x.requires_grad_()
y = abs(x)
y.backward()
x.grad

tensor([-1.])

In [0]:
def f(x):
  result = 1
  for ii in x:
    if ii.item() > 0:
      result *= ii
  return result

In [0]:
x = torch.arange(-2, 4, requires_grad=True, dtype=torch.float32)
y = f(x)
x, y

(tensor([-2., -1.,  0.,  1.,  2.,  3.], requires_grad=True),
 tensor(6., grad_fn=<MulBackward0>))

In [0]:
y.backward()
x.grad

tensor([0., 0., 0., 6., 3., 2.])

In [0]:
x = torch.ones(1, requires_grad=True, dtype=torch.float32)
w = torch.rand(1, requires_grad=True)

In [0]:
y = x * w
# y依赖于w， 而w.requires_grad=True
x.requires_grad, w.requires_grad, y.requires_grad

(True, True, True)

In [0]:
with torch.no_grad():
  x = torch.ones(1)
  w = torch.rand(1, requires_grad=True)
  y = x * w
# y依赖于w和x，虽然w.requires_grad = True，但是y的requires_grad依旧为False
x.requires_grad, w.requires_grad, y.requires_grad

(False, True, False)

In [0]:
#torch.set_grad_enabled(True)  # 设置默认配置

如果我们想要修改tensor的值，但是又不希望被autograd记录，那么我们可以对tensor.data进行操作

In [0]:
a = torch.ones(3, 4, requires_grad=True, dtype=torch.float32)
b = torch.ones(3, 4, requires_grad=True, dtype=torch.float32)
c = a * b
a.data


tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [0]:
a.data.requires_grad # 已经独立于计算图之外

False

In [0]:
a.requires_grad

True

In [0]:
d = a.data.sigmoid_() 
d.requires_grad

False

In [0]:
a

tensor([[0.7311, 0.7311, 0.7311, 0.7311],
        [0.7311, 0.7311, 0.7311, 0.7311],
        [0.7311, 0.7311, 0.7311, 0.7311]], requires_grad=True)

### 扩展autograd

In [0]:
from torch import Function

In [0]:
class Mul(Function):
  @staticmethod
  def forward(ctx, w, x, b, x_requires_grad=True):
    ctx.x_requires_grad = x_requires_grad
    ctx.save_for_backward(w, x)
    output = w * x + b
    return output
  
  @staticmethod
  def backward(ctx, grad_output):
    w, x = ctx.saved_tensors
    grad_w = grad_output * x
    if ctx.x_requires_grad:
      grad_x = grad_output * x
    else:
      grad_x = None
      
    grad_b = grad_output * 1
    return grad_w, grad_x, grad_b, None