In [1]:
%matplotlib inline


Lab 1: Introduction to PyTorch
***********************

Introduction to Torch's tensor library
======================================

All of deep learning is computations on tensors, which are
generalizations of a matrix that can be indexed in more than 2
dimensions. We will see exactly what this means in-depth later. First,
lets look what we can do with tensors.



In [2]:
# Author: Robert Guthrie

import torch
torch.manual_seed(1)
#神经网络中，参数默认是进行随机初始化的，设置初始化让每次初始化固定，
#利用随机数种子来使pytorch中的结果可以复现

<torch._C.Generator at 0x7ff9bd2853b0>

## Creating Tensors

Tensors can be created from Python lists with the torch.tensor()
function.




tensor是tensorflow基础的一个概念——张量。
Tensorflow用到了数据流图，数据流图包括数据（Data）、流（Flow）、图（Graph）。Tensorflow里的数据用到的都是tensor，所以谷歌起名为tensorflow。

1.数据类型dtype d是data的首字母，type是类型的意思。tensor里每一个元素的数据类型是一样的。类似于Numpy中ndarray.dtype，tensorflow里的数据类型可以有很多种，比方说tf.float32就是32位的浮点数，tf.int8就是8位的整型，tf.unit8就是8位的无符号整型，tf.string为字符串等等。
2.形状Shape 类似于Numpy中ndarray.shape，比方说一个2行3列的二维矩阵，他的形状就是2行3列。
3.其他属性 device是tensor在哪个设备上被计算出来的，graph是tensor所属的图，name是tensor的名字
,op是operation的缩写是产生这个tensor的操作运算，对应图上的结点，这些结点接收一些tensor作为输入并输出一些tensor。

几种Tensor in tensorflow
1.Constant（常量）是值不能改变的一种tensor，定义在tf.constant这个类里。
2.Variable（变量）是值可以改变的一种tensor，定义在tf.Variable这个类中。
3.Placeholder(占位符）先占住一个固定的位置，之后在往里面添加值的一种Tensor。
4.SparseTensor(稀疏张量）是一种稀疏的Tensor，类似线代中稀疏矩阵。

In [3]:
# torch.tensor(data) creates a torch.Tensor object with the given data.
V_data = [1., 2., 3.] # list 
V = torch.tensor(V_data)
print(V)

# Creates a matrix
M_data = [[1., 2., 3.]]
M = torch.tensor(M_data)
print(M)

# Create a 3D tensor of size 2x2x2.
T_data = [[[1., 2.], [3., 4.]],
          [[5., 6.], [7., 8.]]]
T = torch.tensor(T_data)
print(T)

tensor([1., 2., 3.])
tensor([[1., 2., 3.]])
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])


In [4]:
print(V.shape)
print(M.shape)
print(T.shape)

torch.Size([3])
torch.Size([1, 3])
torch.Size([2, 2, 2])


use **.item()** to get a Python number from it 

In [5]:
type(V)

torch.Tensor

In [6]:
print(T[0, 1, 1])
print(T[0, 1, 1].item())

tensor(4.)
4.0


You can create a tensor with random data and the supplied dimensionality
with **torch.randn()**




In [7]:
x = torch.randn((3, 4, 5)) #3组4行5列
print(x)

tensor([[[-1.5256, -0.7502, -0.6540, -1.6095, -0.1002],
         [-0.6092, -0.9798, -1.6091, -0.7121,  0.3037],
         [-0.7773, -0.2515, -0.2223,  1.6871,  0.2284],
         [ 0.4676, -0.6970, -1.1608,  0.6995,  0.1991]],

        [[ 0.8657,  0.2444, -0.6629,  0.8073,  1.1017],
         [-0.1759, -2.2456, -1.4465,  0.0612, -0.6177],
         [-0.7981, -0.1316,  1.8793, -0.0721,  0.1578],
         [-0.7735,  0.1991,  0.0457,  0.1530, -0.4757]],

        [[-0.1110,  0.2927, -0.1578, -0.0288,  0.4533],
         [ 1.1422,  0.2486, -1.7754, -0.0255, -1.0233],
         [-0.5962, -1.0055,  0.4285,  1.4761, -1.7869],
         [ 1.6103, -0.7040, -0.1853, -0.9962, -0.8313]]])


Operations with Tensors

Similar to NumPy, PyTorch tensors are also broadcastable.



Tensors类似于numpy的ndarrays，但是可以在GPU上使用来加速计算。

In [8]:
x = torch.tensor([1., 2., 3.])
y = torch.tensor([4.])
z = x * y
print(z)

tensor([ 4.,  8., 12.])


In [9]:
# By default, it concatenates along the first axis (concatenates rows)
#默认连接行
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 = torch.cat([x_1, y_1])
print(z_1.shape)

# Concatenate columns:
#要特定dim来选择连接列
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
# second arg specifies which axis to concat along
z_2 = torch.cat([x_2, y_2], dim=1)
print(z_2.shape)

#If your tensors are not compatible, torch will complain.  Uncomment to see the error
# torch.cat([x_1, x_2])

torch.Size([5, 5])
torch.Size([2, 8])


Reshaping Tensors

Many neural network components expect their inputs to have
a certain shape. Often you will need to reshape before passing your data
to the component.




.view()

In [10]:
x = torch.randn(2, 3, 4)
print(x.shape)

torch.Size([2, 3, 4])


In [11]:
# Reshape to 2 rows, 12 columns
print(x.view(2, 12).shape)

torch.Size([2, 12])


In [12]:
# If one of the dimensions is -1, its size can be inferred
print(x.view(3, -1).shape)
#它的大小可以被推断

torch.Size([3, 8])


**.squeeze()** and **.unsqueeze()**

In [13]:
a = torch.randn(24)
print(a.shape)

torch.Size([24])


In [14]:
# .unsqueeze() adds a superficial 1 dimension to the tensor at a specific dimension
b = a.unsqueeze(dim=0).unsqueeze(dim=0)
print(b.shape)

torch.Size([1, 1, 24])


In [15]:
# .squeeze() removes all 1 dimensions of the tensor
c = b.squeeze()
print(c.shape)

torch.Size([24])


In [16]:
d = a.unsqueeze(dim=0)
print(d.shape)

torch.Size([1, 24])


In [17]:
e = a.unsqueeze(dim=1)
print(e.shape)

torch.Size([24, 1])


In [18]:
f = d.squeeze()
print(f.shape)

torch.Size([24])


In [19]:
g = f.squeeze()
print(g.shape)
#去掉第arg维是1的维度

torch.Size([24])


Computation Graphs and Automatic Differentiation
================================================

The concept of a computation graph is essential to efficient deep
learning programming, because it allows you to not have to write the
back propagation gradients yourself. A computation graph is simply a
specification of how your data is combined to give you the output. Since
the graph totally specifies what parameters were involved with which
operations, it contains enough information to compute derivatives. This
probably sounds vague, so let's see what is going on using the
fundamental flag ``requires_grad``.



In [20]:
# Tensor factory methods have a ``requires_grad`` flag
x = torch.tensor([1., 2., 3], requires_grad=True)

# With requires_grad=True, you can still do all the operations you previously
# could
y = x ** 2
print(y)

# BUT y knows something extra.
print(y.grad_fn)

tensor([1., 4., 9.], grad_fn=<PowBackward0>)
<PowBackward0 object at 0x7ff9bf1fce50>


In [21]:
# Lets sum up all the entries in y
s = y.sum()
print(s)
print(s.grad_fn)

tensor(14., grad_fn=<SumBackward0>)
<SumBackward0 object at 0x7ff9bf1f5d90>


创建一个张量x，并设置其 requires_grad参数为True，程序将会追踪所有对于该张量的操作，当完成计算后通过调用 .backward()，自动计算所有的梯度， 这个张量的所有梯度将会自动积累到 .grad 属性。

So now, what is the derivative of this sum with respect to the first
component of $x$? 和的导数 
In math, we want

\begin{align}\frac{\partial s}{\partial x_0}\end{align}



Well, $s$ knows that it was created as a sum of the tensor $y$. $y$ knows
that it was $x^{2}$. So

\begin{align}s = \overbrace{x_0^2}^\text{$y_0$} + \overbrace{x_1^2}^\text{$y_1$} + \overbrace{x_2^2}^\text{$y_2$}\end{align}





\begin{align}
\frac{\partial s}{\partial x} = \frac{\partial s}{\partial y}\frac{\partial y}{\partial x} = \vec{1}*2x\vert_{x=[1,2,3]} = [2,4,6]
\end{align}

Lets have Pytorch compute the gradient, and see that we were right:
(note if you run this block multiple times, the gradient will increment.
That is because Pytorch *accumulates* the gradient into the .grad
property, since for many models this is very convenient.)




In [22]:
# calling .backward() on any variable will run backprop, starting from it.
s.backward()
print(x.grad)

tensor([2., 4., 6.])


use **.detach()** to delete computation history so that we can just take the value

In [23]:
s_ = s.detach()

In [24]:
print(s_)
print(s_.grad_fn)

tensor(14.)
None


# Using GPU(s) and CUDA

In [25]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


In [26]:
a = torch.tensor([1., 2.], device=device)
b = torch.tensor([3., 4.]).to(device)
# c = a.cuda()
# a = c.cpu()

In [27]:
c = a + b
print(c)

tensor([4., 6.])


In [28]:
d = torch.tensor([4., 5.])
print(d.device)

cpu


In [29]:
c + d

tensor([ 8., 11.])