In [1]:
import torch

### Some notation in math 
$x, y \in \mathbb{R}$, means that they are scalar 

$x, y \in {0, 1}$ means that x, y can only be 0 or 1

$\textbf{x}$ bold text represents vector, $x_2$ represents second scalar inside the vector. 

$\textbf{A}, \textbf{B} \in \mathbb{R}^{m \times n}$, represents two matrixes. 

$\textbf{A} \odot \textbf{B}$ = element-wise multiplication to matrixes. 

In [None]:
## some useful helper function to vectors in pytoch
x = torch.arange(3)
len(x), x.shape[0], x.shape, x.reshape(1, 3).T, x.reshape(3, 1), x.reshape(3, 1, 1)

(3,
 3,
 torch.Size([3]),
 tensor([[0],
         [1],
         [2]]),
 tensor([[0],
         [1],
         [2]]),
 tensor([[[0]],
 
         [[1]],
 
         [[2]]]))

In [None]:
## some useful helper function of matrix in pytoch 
A = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == A.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

### 一些对于tensor的理解。

原始的理解：他就是连续的bytes的排列，只不过行列的堆叠正好成为了matrix的形状，方便被数学形式化。

一个例子：
> 开始处理图像时，张量将变得更加重要。每幅图像都以
阶张量，轴对应于高度、宽度和通道。在每个空间位置，每种颜色（红色、绿色和蓝色）的强度沿通道堆叠。此外，一组图像在代码中表示为
阶张量，其中不同的图像沿第一个轴进行索引。高阶张量是通过增加形状分量的数量来构建的，就像向量和矩阵一样。

In [15]:
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = A.clone()  # Assign a copy of A to B by allocating new memory
A, A + B

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[ 0.,  2.,  4.],
         [ 6.,  8., 10.]]))

In [None]:
# matrix
A.shape, A.sum(axis=0).shape, A.sum(axis=1).shape, 

(torch.Size([2, 3]), torch.Size([3]), torch.Size([2]))

In [19]:
# img 
B = torch.arange(24, dtype=torch.float32).reshape(2, 3, 4)
B.shape, B.sum(axis=0).shape, B.sum(axis=1).shape, B.sum(axis=2).shape 

(torch.Size([2, 3, 4]),
 torch.Size([3, 4]),
 torch.Size([2, 4]),
 torch.Size([2, 3]))

## 例子理解
$\textbf{B} \in \mathbb{R}^{2 \times 3 \times 4 \times 5}$, 看作是一个视频，那么

```np.operator(axis=0)``` -> 代表我们对每一个视频帧自己进行operator reduce操作，最后只剩下一个视频帧。

```np.operator(axis=1)``` -> 代表我们对每一个视频帧中的每个channel进行operator reduce操作，最后剩下2个视频帧，但是丢失了channel。

```np.operator(axis=2)``` -> 

```np.operator(axis=3)``` -> 

In [21]:
# video 
B = torch.arange(2 * 3 * 4 * 5, dtype=torch.float32).reshape(2, 3, 4, 5)
B.shape, B.sum(axis=0).shape, B.sum(axis=1).shape, B.sum(axis=2).shape, B.sum(axis=3).shape 

(torch.Size([2, 3, 4, 5]),
 torch.Size([3, 4, 5]),
 torch.Size([2, 4, 5]),
 torch.Size([2, 3, 5]),
 torch.Size([2, 3, 4]))

In [22]:
A.sum(axis=[0, 1]) == A.sum()  # Same as A.sum()

tensor(True)

In [None]:
# 一些operator 
A.sum(axis=[0, 1]), A.mean(), A.sum() / A.numel(), 

(tensor(2.5000), tensor(2.5000))

In [25]:
# exec：对视频操作
B = torch.arange(2 * 3 * 4 * 5, dtype=torch.float32).reshape(2, 3, 4, 5)
# 对channel取均值
B, B.mean(axis=1)

(tensor([[[[  0.,   1.,   2.,   3.,   4.],
           [  5.,   6.,   7.,   8.,   9.],
           [ 10.,  11.,  12.,  13.,  14.],
           [ 15.,  16.,  17.,  18.,  19.]],
 
          [[ 20.,  21.,  22.,  23.,  24.],
           [ 25.,  26.,  27.,  28.,  29.],
           [ 30.,  31.,  32.,  33.,  34.],
           [ 35.,  36.,  37.,  38.,  39.]],
 
          [[ 40.,  41.,  42.,  43.,  44.],
           [ 45.,  46.,  47.,  48.,  49.],
           [ 50.,  51.,  52.,  53.,  54.],
           [ 55.,  56.,  57.,  58.,  59.]]],
 
 
         [[[ 60.,  61.,  62.,  63.,  64.],
           [ 65.,  66.,  67.,  68.,  69.],
           [ 70.,  71.,  72.,  73.,  74.],
           [ 75.,  76.,  77.,  78.,  79.]],
 
          [[ 80.,  81.,  82.,  83.,  84.],
           [ 85.,  86.,  87.,  88.,  89.],
           [ 90.,  91.,  92.,  93.,  94.],
           [ 95.,  96.,  97.,  98.,  99.]],
 
          [[100., 101., 102., 103., 104.],
           [105., 106., 107., 108., 109.],
           [110., 111., 112., 113., 

In [26]:
## Some generally useful opeartaion of reduction

# 1. normallize
A /= A.sum()

# 2. normalize across certain axis (it performs broadcast) 
A /= A.sum(dim=0)

### dot prod and matrix prod
dot prod = inner prod = $\langle \textbf{x}, \textbf{y} \rangle= \textbf{x}^T \textbf{y}$

In [36]:
## About dot 
x = torch.arange(3, dtype = torch.float32)
y = torch.ones(3, dtype = torch.float32) * 2
x, y, torch.dot(x, y)

(tensor([0., 1., 2.]), tensor([2., 2., 2.]), tensor(6.))

some norm: $\textbf{A} \in \mathbb{R}^{M \times N}$. And $\textbf{A}\textbf{x}$ is a transformation of dim $N$ to $M$ 

personal notes: 大多数情况matrix都要想成是vectors的集合，比如features，imges的个数，channel。而理解matrix为什么要这么乘至关重要 -- 为了清理？为了归一化？为了non-linear一下？

In [38]:
## matrix transformation 
A = torch.arange(12, dtype=torch.float32).reshape(3, 4)
x = torch.arange(4, dtype=torch.float32)
A.shape, x.shape, torch.mv(A, x), A@x

(torch.Size([3, 4]),
 torch.Size([4]),
 tensor([14., 38., 62.]),
 tensor([14., 38., 62.]))

### norm
A norm is a function that maps a vector to a scalar and satisfies the following three properties:

In [None]:
u = torch.tensor([3.0, -4.0])
# l2, and l1 norm
torch.norm(u),torch.abs(u).sum()

tensor(5.)