## Tensors
- 这里我们学习一下pyTorch的Tensors基本数据操作

### 1.实现一个简单的神经网络反向传播

In [1]:
import torch

In [2]:
# batch size
batch_size = 100
# 输入维度
input_size = 64
# 隐层维度
hidden_size = 1000
# 输出维度
output_size = 10

In [3]:
# 正态分布随机定义训练数据
# type可以将tensor转换成指定的数据格式
x = torch.randn(batch_size, input_size).type(torch.FloatTensor)
y = torch.randn(batch_size, output_size).type(torch.FloatTensor)

In [4]:
# 可训练参数定义
w1 = torch.randn(input_size, hidden_size)
w2 = torch.randn(hidden_size, output_size)

In [5]:
learning_rate = 1e-6

- 我们看一下反向传播的计算步骤，图片截自课程[UFLDL](http://deeplearning.stanford.edu/wiki/index.php/%E5%8F%8D%E5%90%91%E4%BC%A0%E5%AF%BC%E7%AE%97%E6%B3%95)

![反向传播计算步骤](https://github.com/nanyoullm/nanyoullm.github.io/blob/master/img/backpro.png?raw=true)
<br\>

In [6]:
for i in range(500):
    # torch.mm（a, b）矩阵点乘操作
    h = x.mm(w1)
    # 实现relu，及取 x 和 0 之间的大值
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)
    # 计算loss
    loss = (y_pred - y).pow(2).sum()
    print('step: {}, loss: {}'.format(i, loss))
    
    # 计算梯度，激活层的函数为relu
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # 权重更新
    w1 = w1 - learning_rate * grad_w1
    w2 = w2 - learning_rate * grad_w2

step: 0, loss: 36958734.10462731
step: 1, loss: 36420465.53241235
step: 2, loss: 41324088.96093157
step: 3, loss: 50436694.73321782
step: 4, loss: 62901233.96598019
step: 5, loss: 77497475.02231407
step: 6, loss: 91933759.40653643
step: 7, loss: 103017011.3388789
step: 8, loss: 107079230.78604555
step: 9, loss: 102112495.1021345
step: 10, loss: 88753806.65675187
step: 11, loss: 70579758.22284675
step: 12, loss: 51928516.71890558
step: 13, loss: 35959434.3580873
step: 14, loss: 23853022.111877263
step: 15, loss: 15420653.957485316
step: 16, loss: 9848866.04301719
step: 17, loss: 6291443.947651656
step: 18, loss: 4060000.1642449554
step: 19, loss: 2672350.366183058
step: 20, loss: 1809777.4696968
step: 21, loss: 1270769.3968613164
step: 22, loss: 929764.6189778342
step: 23, loss: 710120.4443849543
step: 24, loss: 565077.3844164119
step: 25, loss: 466229.23316568375
step: 26, loss: 396269.8328382359
step: 27, loss: 344723.6662716504
step: 28, loss: 305095.22214080935
step: 29, loss: 27345

step: 408, loss: 0.00018406569165169487
step: 409, loss: 0.0001790995343335447
step: 410, loss: 0.00017431083454039556
step: 411, loss: 0.00017092933508962294
step: 412, loss: 0.00016668803457794285
step: 413, loss: 0.00016254807320860692
step: 414, loss: 0.00015842748927182362
step: 415, loss: 0.00015539130098653556
step: 416, loss: 0.00015207937386729764
step: 417, loss: 0.0001485826352114368
step: 418, loss: 0.000144928396111247
step: 419, loss: 0.0001415188327201257
step: 420, loss: 0.0001386868424060328
step: 421, loss: 0.00013603302705075815
step: 422, loss: 0.00013280042765970168
step: 423, loss: 0.00013002682423117733
step: 424, loss: 0.00012692382282437128
step: 425, loss: 0.00012433125632048663
step: 426, loss: 0.00012187766575927263
step: 427, loss: 0.00011962163805845621
step: 428, loss: 0.00011687347424223487
step: 429, loss: 0.00011466791384343034
step: 430, loss: 0.00011253789761694888
step: 431, loss: 0.00011074808426424568
step: 432, loss: 0.00010834410935067068
step: 

- 以上对pyTorch的Tensor操作有一个简单认识，如mm点乘操作。

### 2.自动求导

- pyTorch有一个重要的对象，就是Variable，将Tensor转化为Variable后，pyTorch可以根据我们自定义的公式或者网络结构，实现自动梯度求导；

<br\>
![variable](https://raw.githubusercontent.com/nanyoullm/nanyoullm.github.io/master/img/variable.png)
<br\>

- 上图是Variable的重要属性。对于一个Variable对象，.data可以获得原始的tensor对象;当计算梯度后，该变量的梯度可以累计到，grad；
- 我们先感受一下这个强大的功能，**很真实**；

In [7]:
from torch.autograd import Variable
x = Variable(torch.ones(2,2), requires_grad=True)
print(x.data)


 1  1
 1  1
[torch.FloatTensor of size 2x2]



- 现在我们自定义一个$y$，$y=x+2$
- 自定义一个$z$，$z=3*y^{2}$
- $out=\frac{1}{4}*z$

In [8]:
y = x + 2
print(y)
z = 3 * y * y
print(z)
out = z.mean()
print(out)

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]

Variable containing:
 27
[torch.FloatTensor of size 1]



- 现在我们对out进行反向求导；
- $out$对$x$的求导结果为：$\frac{3}{2}*(x+2)$，$x$取矩阵中对应的值1，故为4.5

In [9]:
out.backward()
print(x.grad)
print(y.grad)

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]

None


- 请注意上面的结果，在out进行反向计算后，我们可以看到关系链最前段的x变量的梯度，但是我们无法看到y的梯度，返回了一个None，这是为什么呢？这里有一个合理的解释[pytorch_hook](https://www.zhihu.com/question/61044004)
- pyTorch的开发者解释道：中间变量在完成了自身的反向传播使命后会被释放掉，因此我们想看中间变量的梯度，可以为其添加一个钩（hook)；直观理解，就是在这个中间变量完成反向传播计算的时候，再额外完成另一些任务，我们修改代码如下。

In [10]:
# 用于记录y变量的梯度
y_grads = []

x = Variable(torch.ones(2,2), requires_grad=True)
y = x + 2
z = 3 * y * y
out = z.mean()
# 在反向传播之前，为y注册一个hook，任务是记录梯度
y.register_hook(lambda grad: y_grads.append(grad))

out.backward()
print(x.grad)
print(y_grads)

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]

[Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]
]


- 有公式我们知道，x和y的梯度应该是一样的；
- 同时我们再思考一下，这个hook还有什么用呢？
> 当你训练了一个网络，想提取中间层的参数或者CNN中的feature map时，hook就可以排上用场啦！

- 了解了Variable，现在我们使用Variable来重新实现上面的反向求导;

In [11]:
x = torch.randn(batch_size, input_size).type(torch.FloatTensor)
x = Variable(x, requires_grad=False)
y = torch.randn(batch_size, output_size).type(torch.FloatTensor)
y = Variable(y, requires_grad=False)

w1 = torch.randn(input_size, hidden_size).type(torch.FloatTensor)
w1 = Variable(w1, requires_grad=True)
w2 = torch.randn(hidden_size, output_size).type(torch.FloatTensor)
w2 = Variable(w2, requires_grad=True)

In [12]:
for i in range(500):
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    loss = (y_pred - y).pow(2).sum()
    print('step: {}, loss: {}'.format(i, loss.data[0]))
    
    loss.backward()
    w1.data = w1.data - learning_rate * w1.grad.data
    w2.data = w2.data - learning_rate * w2.grad.data
    
    # 注意！！！ 前面我们说到，Variable的grad是会累计的，所以每次计算之后需要清零
    w1.grad.data.zero_()
    w2.grad.data.zero_()

step: 0, loss: 31380328.0
step: 1, loss: 27662854.0
step: 2, loss: 28318600.0
step: 3, loss: 32223158.0
step: 4, loss: 38894172.0
step: 5, loss: 48152132.0
step: 6, loss: 59519988.0
step: 7, loss: 72213096.0
step: 8, loss: 84501200.0
step: 9, loss: 94287640.0
step: 10, loss: 98908912.0
step: 11, loss: 96884448.0
step: 12, loss: 88021744.0
step: 13, loss: 74388632.0
step: 14, loss: 58713200.0
step: 15, loss: 43826708.0
step: 16, loss: 31272546.0
step: 17, loss: 21627858.0
step: 18, loss: 14643799.0
step: 19, loss: 9803993.0
step: 20, loss: 6536228.0
step: 21, loss: 4368750.5
step: 22, loss: 2944001.5
step: 23, loss: 2011553.5
step: 24, loss: 1400832.125
step: 25, loss: 999364.9375
step: 26, loss: 733470.625
step: 27, loss: 555392.4375
step: 28, loss: 434254.0625
step: 29, loss: 350252.78125
step: 30, loss: 290586.375
step: 31, loss: 247038.171875
step: 32, loss: 214274.5625
step: 33, loss: 188842.109375
step: 34, loss: 168495.921875
step: 35, loss: 151757.0625
step: 36, loss: 137649.828

step: 390, loss: 0.00034263409906998277
step: 391, loss: 0.0003325456054881215
step: 392, loss: 0.0003236168413423002
step: 393, loss: 0.000315542274620384
step: 394, loss: 0.0003058273287024349
step: 395, loss: 0.0002973714435938746
step: 396, loss: 0.0002901450789067894
step: 397, loss: 0.000281019601970911
step: 398, loss: 0.0002729268744587898
step: 399, loss: 0.00026575347874313593
step: 400, loss: 0.0002589224313851446
step: 401, loss: 0.00025108037516474724
step: 402, loss: 0.00024512867094017565
step: 403, loss: 0.00023825983225833625
step: 404, loss: 0.00023214535031002015
step: 405, loss: 0.0002258922322653234
step: 406, loss: 0.00022039844770915806
step: 407, loss: 0.0002148994244635105
step: 408, loss: 0.0002099351113429293
step: 409, loss: 0.00020412332378327847
step: 410, loss: 0.00019952727598138154
step: 411, loss: 0.00019556129700504243
step: 412, loss: 0.00019042038184124976
step: 413, loss: 0.0001861666387412697
step: 414, loss: 0.0001815163268474862
step: 415, loss:

- 前面我们说到，Variable的grad是会累计的，所以每次计算之后需要清零;
- 另外Variable.data可以获取到tensor对象;
- 通过自动计算梯度，我们可以免去自行推公式的繁琐，并且保证不会出错;

### Tensor VS Numpy
pyTorch内设置了tensor和numpy的array的转换桥梁

In [13]:
import numpy as np
# array to tensor
a = torch.from_numpy(np.array([1, 2, 3]))
print(a)


 1
 2
 3
[torch.LongTensor of size 3]



In [14]:
# tensor to array
b = torch.ones(2, 2).numpy()
print(b)

[[ 1.  1.]
 [ 1.  1.]]


### 自定义autograd函数
- pytorch里，用户可以自定义autograd函数，可以参考[autograd](http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-defining-new-autograd-functions)

## enjoy it