### Chapter 2 : Preliminary Knowledge
Main content:
- 数据操作
  - 广播机制（两个数据分别复制扩充到同样的尺寸）
  - 节省内存（使用X[:] = \<expression\>或X+=\<expression\>来避免重新分配）
- 数据预处理
- 线性代数 
  - 转置.T 范数norm
  - 非降维求和 (keepdims=True)，累积和cumsum
  - torch.dot只支持向量，矩阵和向量间用mv，矩阵之间用mm
- 微积分
  - 设T是梯度算符，T(Ax) = A.T, T(x.T·A) = A, T(x.T A x) = (A + A.T)x
- 自动微分
  - 在默认情况下，PyTorch会累积梯度，我们需要清除之前的值
  - 自动微分必须是标量，非标量的话要么转成标量，要么指定输出形状
  - 分离操作
- 概率论
- 查阅文档、API的指导
  - dir查看可以调用的函数和类

In [2]:
import numpy as np
import pandas as pd
import torch 

In [3]:
# 数据操作
## '+' 等操作会导致内存的重新分配
X = torch.arange(4).reshape(2, 2)
Y = torch.arange(4).reshape(2, 2)
print(id(X))
X = X + Y
print(id(X))
X[:] = X + Y
print(id(X))

1963708091840
1963708098880
1963708098880


In [4]:
# 线性代数
## 非降维求和：sum_X维度不会变化
sum_X = X.sum(axis=1, keepdim=True)
print(X)
print(sum_X)
print(X / sum_X)
print(X.cumsum(axis=1))
vector_x = torch.arange(2)
### torch.dot只支持向量，矩阵和向量间用mv，矩阵之间用mm
torch.mv(X, vector_x), torch.mm(X, Y)
tensor_test = torch.arange(8, dtype=float).reshape(4, 1, 2)
print(len(tensor_test), tensor_test.sum(axis=0).shape, tensor_test.shape)
torch.linalg.norm(tensor_test)

tensor([[0, 3],
        [6, 9]])
tensor([[ 3],
        [15]])
tensor([[0.0000, 1.0000],
        [0.4000, 0.6000]])
tensor([[ 0,  3],
        [ 6, 15]])
4 torch.Size([1, 2]) torch.Size([4, 1, 2])


tensor(11.8322, dtype=torch.float64)

In [17]:
# 自动微分
x = torch.arange(4.0)
x.requires_grad_(True) # 等价于x=torch.arange(4.0,requires_grad=True)
print("grade before the compute:", x.grad)

y = 2 * torch.dot(x, x)
y.backward()
print("dot grad", x.grad, y.grad_fn)

x.grad.zero_() # 在默认情况下，PyTorch会累积梯度，我们需要清除之前的值
y = x.sum()
y.backward()
print("sum grad", x.grad)

## 自动微分必须是标量，非标量的话要么转成标量，要么指定输出形状
x.grad.zero_()
y = x * x
y.sum().backward()
print("mult grad", x.grad)

x.grad.zero_()
y = x * x
y.backward(torch.ones_like(y))
# gradients = torch.autograd.grad(outputs=y, inputs=x, grad_outputs=torch.ones_like(y))
print("non-scalar grad", x.grad)

## 分离操作
x.grad.zero_()
y = x * x
u = y.detach()
z = u * x
z.sum().backward()
print(x.grad, ", which should be: tensor([0, 3, 12, 27]) without detach")

## 练习
x.grad.zero_()
y = x.sum()
y.backward()
y.backward()
print("Two times of beackward(): ", x.grad, "--梯度会累加")

grade before the compute: None
dot grad tensor([ 0.,  4.,  8., 12.]) <MulBackward0 object at 0x000001C93E68F550>
sum grad tensor([1., 1., 1., 1.])
mult grad tensor([0., 2., 4., 6.])
non-scalar grad tensor([0., 2., 4., 6.])
tensor([0., 1., 4., 9.]) , which should be: tensor([0, 3, 12, 27]) without detach
Two times of beackward():  tensor([2., 2., 2., 2.]) --梯度会累加


In [30]:
# 概率
from torch.distributions import multinomial
fair_probs = torch.ones([6]) / 6
print(multinomial.Multinomial(100, fair_probs).sample())
print(dir(torch.distributions))

tensor([13., 24., 11., 18., 14., 20.])
['AbsTransform', 'AffineTransform', 'Bernoulli', 'Beta', 'Binomial', 'CatTransform', 'Categorical', 'Cauchy', 'Chi2', 'ComposeTransform', 'ContinuousBernoulli', 'CorrCholeskyTransform', 'Dirichlet', 'Distribution', 'ExpTransform', 'Exponential', 'ExponentialFamily', 'FisherSnedecor', 'Gamma', 'Geometric', 'Gumbel', 'HalfCauchy', 'HalfNormal', 'Independent', 'IndependentTransform', 'Kumaraswamy', 'LKJCholesky', 'Laplace', 'LogNormal', 'LogisticNormal', 'LowRankMultivariateNormal', 'LowerCholeskyTransform', 'MixtureSameFamily', 'Multinomial', 'MultivariateNormal', 'NegativeBinomial', 'Normal', 'OneHotCategorical', 'OneHotCategoricalStraightThrough', 'Pareto', 'Poisson', 'PowerTransform', 'RelaxedBernoulli', 'RelaxedOneHotCategorical', 'ReshapeTransform', 'SigmoidTransform', 'SoftmaxTransform', 'StackTransform', 'StickBreakingTransform', 'StudentT', 'TanhTransform', 'Transform', 'TransformedDistribution', 'Uniform', 'VonMises', 'Weibull', '__all__', 

In [31]:
x1 = torch.tensor([1.0, 2])
x2 = torch.tensor([3, 4.0])
x1.requires_grad_(True)
x2.requires_grad_(True)

y = x1 * x2
y.retain_grad()

y.backward(torch.ones_like(y))

print("another example of backward on non-scalar output")
x1.grad, x2.grad, y.grad, y.dtype, torch.ones_like(y)

another example of backward on non-scalar output


(tensor([3., 4.]),
 tensor([1., 2.]),
 tensor([1., 1.]),
 torch.float32,
 tensor([1., 1.]))

In [36]:
print("Example of backward in control flows")

def auto_grad_in_control_flows(a):
    if a[0] > 2:
        return a * a
    else:
        return a.dot(a)

a = torch.tensor([1, 2.0])
b = torch.tensor([3.0, 4.0])
a.requires_grad_(True)
b.requires_grad_(True)
fa = auto_grad_in_control_flows(a)
fb = auto_grad_in_control_flows(b)
fa, fb

Example of backward in control flows


(tensor(5., grad_fn=<DotBackward0>),
 tensor([ 9., 16.], grad_fn=<MulBackward0>))