# Facebook Pytorch 1.0 教程

## 1. What is PyTorch?
**PyTorch的组成部分：**
1. automatic differentiation engine
2. Ndarray library with GPU support
3. Distributed training
4. Production-ready C++ runtime
5. gradient based optimization package
6. Utilities(data loading,etc.)

# 2. ndarray library
Pytorch被称为神经网络的numpy，为什么呢？因为`np.ndarry` <-> `torch.Tensor`中的操作非常相似，加上GPU，所以运算非常快，下面我们比较一下：

* 首先是numpy 代码：

In [2]:
import numpy as np


N,D_in,H,D_out = 54,1000,100,10

# 创建随机输入和输出
x = np.random.randn(N , D_in)
y = np.random.randn(N , D_out)

# 随机初始化权值
w1 = np.random.randn(D_in,H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6

for t in range(500):
    # forward 前向过程
    h = x.dot(w1)
    h_relu = np.maximum(h,0)
    y_pred = h_relu.dot(w2)
    
    # 计算loss
    loss = np.square(y_pred - y).sum()
    print(t,"turn: ", loss)
    
    # backprop 后向过程
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
    

0 turn:  25902431.879741177
1 turn:  19667915.881960966
2 turn:  15703056.330939017
3 turn:  12327716.761904921
4 turn:  9261643.786908623
5 turn:  6650974.979294437
6 turn:  4634447.263955887
7 turn:  3192417.3091512616
8 turn:  2215607.1908576544
9 turn:  1570944.481600485
10 turn:  1145688.8724153633
11 turn:  862917.4575967597
12 turn:  669830.348605209
13 turn:  533807.5029864854
14 turn:  434891.61297826376
15 turn:  360650.2440819749
16 turn:  303279.8470286433
17 turn:  257781.675638947
18 turn:  220997.61671311493
19 turn:  190788.10689599218
20 turn:  165636.85301751236
21 turn:  144447.24898729956
22 turn:  126464.36986866029
23 turn:  111101.4644300804
24 turn:  97903.51607467321
25 turn:  86501.50336650021
26 turn:  76614.88797254513
27 turn:  68004.53541816722
28 turn:  60481.781052643564
29 turn:  53888.58197762249
30 turn:  48100.6526167174
31 turn:  43004.21008457731
32 turn:  38502.26470520457
33 turn:  34520.42071436035
34 turn:  30992.228937691594
35 turn:  27861.52

334 turn:  8.061175915059923e-05
335 turn:  7.630478851167481e-05
336 turn:  7.222998583053305e-05
337 turn:  6.837333738253106e-05
338 turn:  6.472206563795398e-05
339 turn:  6.126706406838408e-05
340 turn:  5.799840992027617e-05
341 turn:  5.490385132341202e-05
342 turn:  5.1974346769910984e-05
343 turn:  4.9203306370805166e-05
344 turn:  4.657871977932011e-05
345 turn:  4.409529028559907e-05
346 turn:  4.1745624660792006e-05
347 turn:  3.952013064859082e-05
348 turn:  3.741356884119265e-05
349 turn:  3.542060156858078e-05
350 turn:  3.3533779546417576e-05
351 turn:  3.174752076868103e-05
352 turn:  3.005717396162182e-05
353 turn:  2.8456440277215083e-05
354 turn:  2.694151207641914e-05
355 turn:  2.5507608710717385e-05
356 turn:  2.4149800209581932e-05
357 turn:  2.2864786125790542e-05
358 turn:  2.1648662869251148e-05
359 turn:  2.0496921600090074e-05
360 turn:  1.940650390396995e-05
361 turn:  1.8374549444551858e-05
362 turn:  1.7397406651527572e-05
363 turn:  1.6472867115506033e-

下面我们来看tensor是怎么解决这个问题的。

In [9]:
# TO-DO
import torch

dtype = torch.FloatTensor

print(dtype)

<class 'torch.FloatTensor'>


可以看到numpy.ndarray和torch.tensor是非常相似的。

* 构造一个 $ 5 \times 3 $ 的0矩阵

In [6]:
from __future__ import print_function
import torch

x = torch.Tensor(5, 3)
print(x)

tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 1.8946e-42, 0.0000e+00],
        [0.0000e+00, 1.1692e-19, 0.0000e+00]])


* 随机初始化两个矩阵

In [10]:
x = torch.rand(5,3)
print(x)

tensor([[0.3405, 0.6669, 0.4768],
        [0.1587, 0.8933, 0.8746],
        [0.4759, 0.2391, 0.2873],
        [0.7888, 0.8154, 0.0488],
        [0.9692, 0.3559, 0.4046]])


* 获取`size()`

In [12]:
print(x.size())

torch.Size([5, 3])


* 切片操作

In [13]:
print(x[:,1])

tensor([0.6669, 0.8933, 0.2391, 0.8154, 0.3559])


* 构造全0张量并且转换为numpy格式：

In [15]:
a = torch.ones(5)
print(a)
print(type(a))

b = a.numpy()
print(b)
print(type(b))

tensor([1., 1., 1., 1., 1.])
<class 'torch.Tensor'>
[1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>


* 加

In [16]:
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


可以看出，`tensor.numpy()`是一个浅拷贝，换言之，使用这种转换的时候是一个引用.
继续看下面的操作

* 将`numpy.ndarray`转换为`torch.tensor`

In [19]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)

np.add(a , 1 ,out = a)

print(a)
print(b)

[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


* 最后查看一下如何转化为GPU运行（一般只需要全局设计一下即可）

In [20]:
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

# 3. automatic differentiation engine

Pytorch 的关键在于自动求导Autograd技术，这个具体的操作请查看evernote笔记
我们直接来看这个操作：

In [None]:
import torch