# Introduction to Deep Learning with PyTorch

In this notebook, you'll get introduced to [PyTorch](http://pytorch.org/), a framework for building and training neural networks. PyTorch in a lot of ways behaves like the arrays you love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calculates gradients (for backpropagation!) and another module specifically for building neural networks. All together, PyTorch ends up being more coherent with Python and the Numpy/Scipy stack compared to TensorFlow and other frameworks.



在这个笔记中，我们将介绍:
- Pytorch是一个构建和训练神经网络的`框架`。
- PyTorch在很多方面，表现为Numpy中熟悉的数组方式进行计算。这些Numpy数组就是`张量`。
- PyTorch处理`张量`，并且将张量送到GPU中进行加速训练神经网络。
- PyTorch还提供了一个`自动计算梯度（反向传播）的模块`和一个`专门构建神经网络的模块`供使用。总之，Pytorch和Tensorflow甚至其他模型相比，更加独立，不再依赖于Python和Numpy/Scipy模块。

## Neural Networks
- `笔记`:每个神经元都属包含若干输入权重，这些权重输入加和（线性组合）之后，传给激活函数作为本单元的输出。

Deep Learning is based on artificial neural networks which have been around in some form since the late 1950s. The networks are built from individual parts approximating neurons, typically called units or simply "neurons." Each unit has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output.

<img src="assets/simple_neuron.png" width=400px>

`权重加和`:Mathematically this looks like: 

\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\
y &= f\left(\sum_i w_i x_i +b \right)
\end{align}


`向量化`：With vectors this is the dot/inner product of two vectors:

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots  x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_1 \\
           w_2 \\
           \vdots \\
           w_n
\end{bmatrix}
$$

## Tensors

It turns out neural network computations are just a bunch of linear algebra operations on *tensors*, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

- 神经网络的计算只不过是一组张量的矩阵运算。
- `向量`：一维张量
- `矩阵`：二维矩阵
- `数组`：三维以上的张量

<img src="assets/tensor_examples.svg" width=600px>

With the basics covered, it's time to explore how we can use PyTorch to build a simple neural network.

In [1]:
# First, import PyTorch 
# 引入Pytorch包
import torch

In [2]:
#激活函数
# s函数非常适合输出概率
def activation(x):
    """ Sigmoid activation function 
    
        Arguments
        ---------
        x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))

In [3]:
### Generate some data 定义随机种子
torch.manual_seed(7) # Set the random seed so things are predictable 设置随机种子

# Features are 5 random normal variables 随机生成一个1行5列的向量
features = torch.randn((1, 5))
# True weights for our data, random normal variables again 生成一个对应权重向量
weights = torch.randn_like(features)
# and a true bias term  生成一个bias
bias = torch.randn((1, 1))

Above I generated data we can use to get the output of our simple network. This is all just random for now, going forward we'll start using normal data. Going through each relevant line:

`features = torch.randn((1, 5))` creates a tensor with shape `(1, 5)`, one row and five columns, that contains values randomly distributed according to the normal distribution with a mean of zero and standard deviation of one. 

`weights = torch.randn_like(features)` creates another tensor with the same shape as `features`, again containing values from a normal distribution.

Finally, `bias = torch.randn((1, 1))` creates a single value from a normal distribution.

PyTorch tensors can be added, multiplied, subtracted, etc, just like Numpy arrays. In general, you'll use PyTorch tensors pretty much the same way you'd use Numpy arrays. They come with some nice benefits though such as GPU acceleration which we'll get to later. For now, use the generated data to calculate the output of this simple single layer network. 
> **Exercise**: Calculate the output of the network with input features `features`, weights `weights`, and bias `bias`. Similar to Numpy, PyTorch has a [`torch.sum()`](https://pytorch.org/docs/stable/torch.html#torch.sum) function, as well as a `.sum()` method on tensors, for taking sums. Use the function `activation` defined above as the activation function.
创建数据之后需要按照公式先做乘法加上bias再经过一层激活函数，最终得到激活值。

In [4]:
## 以下这个部分主要是用元素级乘法进行运算的
## Calculate the output of this network using the weights and bias tensors  
y = activation(torch.sum(weights * features)+bias)
## 或者也可以这么写
y1 = activation((features*weights).sum() + bias)
print(y1)

tensor([[0.1595]])


You can do the multiplication and sum in the same operation using a matrix multiplication. In general, you'll want to use matrix multiplications since they are more efficient and accelerated using modern libraries and high-performance computing on GPUs.
使用矩阵乘法运算可以一次性解决相乘再相加的问题。

Here, we want to do a matrix multiplication of the features and the weights. For this we can use [`torch.mm()`](https://pytorch.org/docs/stable/torch.html#torch.mm) or [`torch.matmul()`](https://pytorch.org/docs/stable/torch.html#torch.matmul) which is somewhat more complicated and supports broadcasting. If we try to do it with `features` and `weights` as they are, we'll get an error
关于，矩阵乘法我们可以采用
- `torch.mm()`：不支持广播，遇到维度不匹配会报错。
- `torch.matmul()`：支持广播，但是可能会得到意外的结果。


```python
>> torch.mm(features, weights)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-15d592eb5279> in <module>()
----> 1 torch.mm(features, weights)

RuntimeError: size mismatch, m1: [1 x 5], m2: [1 x 5] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1524590658547/work/aten/src/TH/generic/THTensorMath.c:2033
```

As you're building neural networks in any framework, you'll see this often. Really often. What's happening here is our tensors aren't the correct shapes to perform a matrix multiplication. Remember that for matrix multiplications, the number of columns in the first tensor must equal to the number of rows in the second column. Both `features` and `weights` have the same shape, `(1, 5)`. This means we need to change the shape of `weights` to get the matrix multiplication to work.

在构建神经网络的过程中，经常会看到这样的维度不匹配的矩阵相乘。这里特征和权重都是同shape的，这也就意味着需要该表权重的shape。

**Note:** To see the shape of a tensor called `tensor`, use `tensor.shape`. If you're building neural networks, you'll be using this method often.

There are a few options here: [`weights.reshape()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape), [`weights.resize_()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.resize_), and [`weights.view()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view).

* `weights.reshape(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)` sometimes, and sometimes a clone, as in it copies the data to another part of memory. reshape函数返回一个新的tensor，但是原tensor的数据和内存中的位置都不变，可以说是变换了索引。但是有时这个函数也返回一个克隆版本的数据。也就是说这么操作，生成的是新向量， 不会影响原来的tensor。
* `weights.resize_(a, b)` returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory. Here I should note that the underscore at the end of the method denotes that this method is performed **in-place**. Here is a great forum thread to [read more about in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.resize函数可能会导致元素丢失。“\_”这个符号意味着是对本身元素进行的改变。
* `weights.view(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)`. view函数将会返回一个新的tensor，推荐这种做法,不更改内存的数据，不能改的还报错。

I usually use `.view()`, but any of the three methods will work for this. So, now we can reshape `weights` to have five rows and one column with something like `weights.view(5, 1)`.

> **Exercise**: Calculate the output of our little network using matrix multiplication.

In [5]:
## Calculate the output of this network using matrix multiplication
y = activation( torch.mm(features,weights.view(5,1))+bias )
print(y)

tensor([[0.1595]])


### Stack them up!

That's how you can calculate the output for a single neuron. The real power of this algorithm happens when you start stacking these individual units into layers and stacks of layers, into a network of neurons. The output of one layer of neurons becomes the input for the next layer. With multiple input units and output units, we now need to express the weights as a matrix.

<img src='assets/multilayer_diagram_weights.png' width=450px>

The first layer shown on the bottom here are the inputs, understandably called the **input layer**. The middle layer is called the **hidden layer**, and the final layer (on the right) is the **output layer**. We can express this network mathematically with matrices again and use matrix multiplication to get linear combinations for each unit in one operation. For example, the hidden layer ($h_1$ and $h_2$ here) can be calculated 

$$
\vec{h} = [h_1 \, h_2] = 
\begin{bmatrix}
x_1 \, x_2 \cdots \, x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_{11} & w_{12} \\
           w_{21} &w_{22} \\
           \vdots &\vdots \\
           w_{n1} &w_{n2}
\end{bmatrix}
$$

The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply

$$
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$$

In [6]:
### 一个基础的神经网络构建代码
### Generate some data 随机生成数据
torch.manual_seed(7) # Set the random seed so things are predictable

# 1第一步，构建一个存储特征行向量
# Features are 3 random normal variables 
features = torch.randn((1, 3)) # 1行3列的特征向量

# 2 第二步，定义神经网络的大小，即定义每层神经元个数
# Define the size of each layer in our network  
n_input = features.shape[1]     # Number of input units, must match number of input features 拿到特征数，作为输入层的神经元节点数
n_hidden = 2                    # Number of hidden units  设定隐藏层神经元个数 2个
n_output = 1                    # Number of output units 设定输出层个数 1个

# 3 第三步，初始化权重矩阵
# 神经网络一共n层，需要n-1个权重矩阵
# Weights for inputs to hidden layer  
## features的维度是（1，i_input） 行向量 
W1 = torch.randn(n_input, n_hidden) 
## 过了输入层就到了隐藏层，（1，n_hidden）
# Weights for hidden layer to output layer
W2 = torch.randn(n_hidden, n_output)
 
#   初始化bias偏差
# and bias terms for hidden and output layers
B1 = torch.randn((1, n_hidden))
B2 = torch.randn((1, n_output))

> **Exercise:** Calculate the output for this multi-layer network using the weights `W1` & `W2`, and the biases, `B1` & `B2`. 

In [7]:
## Your solution here
output_hidden = activation(torch.mm(features,W1)+B1) 
y = activation(torch.mm(output_hidden,W2)+B2)
print(y)

tensor([[0.3171]])


If you did this correctly, you should see the output `tensor([[ 0.3171]])`.

The number of hidden units a parameter of the network（隐藏层神经元个数，作为整个网络中的一个参数）, often called a **hyperparameter(超参数)** to differentiate it from the weights and biases parameters. As you'll see later when we discuss training a neural network, the more hidden units a network has, and the more layers, the better able it is to learn from data and make accurate predictions.

隐藏层越多，学习的越好。

## Numpy to Torch and back

Special bonus section! PyTorch has a great feature for converting between Numpy arrays and Torch tensors. To create a tensor from a Numpy array, use `torch.from_numpy()`. To convert a tensor to a Numpy array, use the `.numpy()` method.

- 使用`torch.from_numpy()`，从Numpy数组中创建一个tensor
- 使用`.numpy()`，将tensor转层numpy数组

In [17]:
import numpy as np
a = np.random.rand(4,3)
a

array([[0.10467423, 0.01589729, 0.89453177],
       [0.81011341, 0.13170603, 0.00181328],
       [0.15225916, 0.04044523, 0.98910322],
       [0.69886453, 0.54048863, 0.1659593 ]])

In [18]:
b = torch.from_numpy(a)
b

tensor([[0.1047, 0.0159, 0.8945],
        [0.8101, 0.1317, 0.0018],
        [0.1523, 0.0404, 0.9891],
        [0.6989, 0.5405, 0.1660]], dtype=torch.float64)

In [19]:
b.numpy()

array([[0.10467423, 0.01589729, 0.89453177],
       [0.81011341, 0.13170603, 0.00181328],
       [0.15225916, 0.04044523, 0.98910322],
       [0.69886453, 0.54048863, 0.1659593 ]])

The memory is shared between the Numpy array and Torch tensor, so if you change the values in-place of one object, the other will change as well. 

tensor和numpy数组的数是共用的，也就是说改变了一个，另外一个也就改变了。

In [20]:
# Multiply PyTorch Tensor by 2, in place
b.mul_(2)

tensor([[0.2093, 0.0318, 1.7891],
        [1.6202, 0.2634, 0.0036],
        [0.3045, 0.0809, 1.9782],
        [1.3977, 1.0810, 0.3319]], dtype=torch.float64)

In [21]:
# Numpy array matches new values from Tensor
a

array([[0.20934847, 0.03179458, 1.78906354],
       [1.62022682, 0.26341205, 0.00362656],
       [0.30451833, 0.08089046, 1.97820645],
       [1.39772905, 1.08097727, 0.33191859]])