#  PyTorch中矩阵乘法总结

<br>
<br>
<br>


<div>
<img src='mul-known.png' width='800' height='800'/>
</div>

---

<br>
<br>
<br>

## 1 . Pytorch中广播机制


---

以数组A和数组B的相加为例

**核心:如果相加的两个数组的shape不同, 就会触发广播机制
1)程序会自动执行操作使得A.shape==B.shape,
2)对应位置进行相加运算结果的shape是:A.shape和B.shape对应位置的最大值**


---

有两种情况能够进行广播：

1. A.ndim > B.ndim, 并且A.shape最后几个元素包含B.shape。 比如：    
    A.shape=(2,3,4,5), B.shape=(3,4,5)  
    A.shape=(2,3,4,5), B.shape=(4,5)  
    A.shape=(2,3,4,5), B.shape=(5)  
2. A.ndim == B.ndim, 并且A.shape和B.shape对应位置的元素要么相同要么其中一个是1。比如 ：  
    A.shape=(1,9,4), B.shape=(15,1,4)  
    A.shape=(1,9,4), B.shape=(15,1,1)  


### 1.1 A.ndim 大于 B.ndim

In [24]:

# a.shape=(2,2,3,4)
a = np.arange(1,49).reshape((2,2,3,4))
# b.shape=(3,4)
b = np.arange(1,13).reshape((3,4))
# numpy会将b.shape调整至(2,2,3,4), 这一步相当于numpy自动实现np.tile(b,[2,2,1,1])
res = a + b
print('===================================')
print(a.shape)
print('===================================')
print(b.shape)
print('===================================')
print(res.shape)
print('===================================')
print(a+b == a + np.tile(b,[2,2,1,1]) )


(2, 2, 3, 4)
(3, 4)
(2, 2, 3, 4)
[[[[ True  True  True  True]
   [ True  True  True  True]
   [ True  True  True  True]]

  [[ True  True  True  True]
   [ True  True  True  True]
   [ True  True  True  True]]]


 [[[ True  True  True  True]
   [ True  True  True  True]
   [ True  True  True  True]]

  [[ True  True  True  True]
   [ True  True  True  True]
   [ True  True  True  True]]]]


### 1.2 A.ndim 等于 B.ndim

In [25]:
#示例1
# a.shape=(4,3)
a = np.arange(12).reshape(4,3)
# b.shape=(4,1)
b = np.arange(4).reshape(4,1)
# numpy会将b.shape调整至(4,3), 这一步相当于numpy自动实现np.tile(b,[1,3])
res = a + b
print('===================================')
print(a.shape)
print('===================================')
print(b.shape)
print('===================================')
print(res.shape)
print('===================================')
print((a+b == a + np.tile(b,[1,3])) )  # 打印结果都是True

(4, 3)
(4, 1)
(4, 3)
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]]


##  2. 逐元素(Element-wise)乘法

<br>
<br>
<br>


<div>
<img src='mul.png' width='500' height='500'/>
</div>

---

支持 矩阵，向量，标量

1. a*b， 支持广播（broadcast）
2. torch.mul()， 支持广播（broadcast）

In [26]:
# 矩阵*矩阵
# 是矩阵a和b对应位相乘，a和b的维度必须相等，所以只要保证a和b的shape是broadcastable就可以。
import torch
mat1 = torch.randn(2, 3)
mat2 = torch.randn(2, 3)
print(mat1)
print(mat2)
print('------')
print('mat1[0][0] * mat2[0][0] :', mat1[0][0] * mat2[0][0])
print('mat1[0][1] * mat2[0][1] :', mat1[0][1] * mat2[0][1])
print('------')
print('------')
# mul(a, b)
mat_a = torch.mul(mat1, mat2)
print('矩阵-矩阵:')
print('mul(a, b):')
print(mat_a)
print(mat_a.shape)
print('------')
# a*b
mat_b = mat1 * mat2
print('a * b:')
print(mat_b)
print(mat_b.shape)
print('------')
print('------')

# 矩阵-标量
mat_c = torch.mul(mat1, 100)
print('矩阵-标量:')
print(mat_c)
print(mat_c.shape)

tensor([[-0.4628, -0.2711,  1.5739],
        [ 0.1660,  0.0560, -0.1976]])
tensor([[-0.9069, -0.5303,  0.3433],
        [-0.9565, -0.8471,  0.8448]])
------
mat1[0][0] * mat2[0][0] : tensor(0.4197)
mat1[0][1] * mat2[0][1] : tensor(0.1438)
------
------
矩阵-矩阵:
mul(a, b):
tensor([[ 0.4197,  0.1438,  0.5403],
        [-0.1588, -0.0474, -0.1669]])
torch.Size([2, 3])
------
a * b:
tensor([[ 0.4197,  0.1438,  0.5403],
        [-0.1588, -0.0474, -0.1669]])
torch.Size([2, 3])
------
------
矩阵-标量:
tensor([[-46.2758, -27.1072, 157.3922],
        [ 16.6020,   5.5984, -19.7567]])
torch.Size([2, 3])


## 3. 向量点积

<br>
<br>
<br>

<div>
<img src='dot.jpg' width='500' height='500'/>
</div>

---

向量点积是先求点乘再求和。所以需要向量维度相同。

1. torch.dot()，不支持广播（broadcast）.

In [27]:
vec1 = torch.tensor([2, 3])
vec2 = torch.tensor([2, 1])
vec = torch.dot(vec1, vec2)
vec

tensor(7)

In [28]:
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)
mat = torch.dot(mat1, mat2)
mat

# 报错，只允许一维的tensor

RuntimeError: 1D tensors expected, but got 2D and 2D tensors

## 4. 矩阵乘法

<br>
<br>
<br>

<div>
<img src='m-mul.png' width='500' height='500'/>
</div>

---


1. torch.mm()，不支持broadcast
2. torch.matmul()，支持broadcast
3. @

In [29]:
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 4)
mat_a = torch.mm(mat1, mat2)
print('mm(a, b):')
print(mat_a)
print(mat_a.shape)
# 该函数只用来计算两个2-D矩阵的矩阵乘法。
print('------')


mat_b = torch.matmul(mat1, mat2)
print('matmul(a, b):')
print(mat_b)
print(mat_b.shape)
print('------')
# 有更复杂的用法
# 两个tensors之间的矩阵乘法，具体怎么操作的，依据输入而定：

mat_c = mat1@ mat2
print('mat1@mat2:')
print(mat_c)
print(mat_c.shape)
# 该函数只用来计算两个2-D矩阵的矩阵乘法。
print('------')

mm(a, b):
tensor([[-0.2466,  0.7865,  0.0741, -0.7053],
        [-0.1164, -0.9973,  2.6223, -1.5018]])
torch.Size([2, 4])
------
matmul(a, b):
tensor([[-0.2466,  0.7865,  0.0741, -0.7053],
        [-0.1164, -0.9973,  2.6223, -1.5018]])
torch.Size([2, 4])
------
mat1@mat2:
tensor([[-0.2466,  0.7865,  0.0741, -0.7053],
        [-0.1164, -0.9973,  2.6223, -1.5018]])
torch.Size([2, 4])
------


### 4.1 三维带Batch矩阵乘法

1. torch.bmm() 不支持broadcast操作

In [30]:
mat1 = torch.randn(5,2, 3)
mat2 = torch.randn(5, 3, 4)
mat_a = torch.bmm(mat1, mat2)
print('bmm(a, b):')
print(mat_a)
print(mat_a.shape)
print('------')
# 由于神经网络训练一般采用mini-batch，经常输入的是三维带batch矩阵
# 该函数的两个输入必须是三维矩阵且第一维相同（表示Batch维度），不支持broadcast操作。

bmm(a, b):
tensor([[[ 2.8325, -2.5959, -0.4154,  1.5308],
         [ 3.5484, -1.4033, -0.2714,  0.7386]],

        [[ 0.4989, -2.6536,  1.0311,  2.5753],
         [ 1.7673, -2.2543, -2.8876,  0.4661]],

        [[-0.6346, -1.5044,  1.5649,  1.4992],
         [-0.9662,  1.0395,  1.5298, -0.7813]],

        [[ 2.2554,  0.2765, -2.6558, -1.9617],
         [ 2.4201,  2.1655,  1.0596,  0.4645]],

        [[ 0.1871,  0.3360, -1.3293,  1.7182],
         [ 0.2896,  0.7587, -1.6035,  0.0417]]])
torch.Size([5, 2, 4])
------


### 4.2 矩阵与向量相乘

1. torch.mv()

In [31]:
mat = torch.randn(2, 3)
vec = torch.randn(3)
res = torch.mv(mat, vec)
res

tensor([ 0.5008, -1.5133])

###4.3  matmul用法

1. 如果两个张量都是一维的，则返回 点积（标量）
2. 如果两个参数都是二维的，则返回 矩阵乘积。
3. 如果第一个参数是 1 维，第二个参数是 2 维，为了矩阵乘法的目的，在第一维上加 1（达到扩充维度的目的）， 矩阵计算完成之后，第一维加上的 1 将会被删掉。
4. 如果第一个参数是 2 维，第二个参数是 1 维，就是矩阵和向量相乘。
5. 支持复杂高维广播。

In [32]:
# 1
vec1 = torch.tensor([2, 3])
vec2 = torch.tensor([2, 1])
vec = torch.matmul(vec1, vec2)
vec

tensor(7)

In [33]:
# 2. 
# 同上

In [34]:
# 3. 
vec1 = torch.randn(3)
vec2 = vec1.reshape(1, -1)
mat1 = torch.randn(3, 4)
print(vec1)
print(vec2)
print(mat1)
mat = torch.matmul(vec1, mat1)
mat2 = torch.matmul(vec2, mat1)
print(mat)
print(mat2)

tensor([-0.4932,  0.9520, -1.5402])
tensor([[-0.4932,  0.9520, -1.5402]])
tensor([[ 1.5088,  0.3568, -1.1389, -1.1866],
        [ 0.9095, -0.9637,  0.3556, -0.8225],
        [ 0.4509, -0.0729,  0.6477, -0.0493]])
tensor([-0.5728, -0.9811, -0.0973, -0.1218])
tensor([[-0.5728, -0.9811, -0.0973, -0.1218]])


In [35]:
# 4. 
mat1 = torch.randn(3, 4)
vec1 = torch.randn(4)
print(mat1)
print(vec1)
mat = torch.matmul(mat1, vec1)
mat

tensor([[-1.7485, -1.3445,  0.4366, -1.4827],
        [ 0.8077, -0.6766,  2.0069, -1.6549],
        [ 0.6546, -0.1501,  0.9207,  0.4119]])
tensor([-0.0413,  0.9223,  0.3687,  0.9711])


tensor([-2.4467, -1.5246,  0.5740])

In [36]:
# 5.1 batched matrix x broadcasted vector
a = torch.randn(10, 3, 4)
b = torch.randn(4)
matmul_a_b = torch.matmul(a, b)
matmul_a_b.shape

torch.Size([10, 3])

In [37]:
# 5.2 batched matrix x batched matrix
c = torch.randn(10, 3, 4)
d = torch.randn(10, 4, 5)
matmul_c_d = torch.matmul(c, d)
matmul_c_d.shape

torch.Size([10, 3, 5])

In [38]:
# 5.3  batched matrix x broadcasted matrix
m = torch.randn(10, 3, 4)
n = torch.randn(4, 5)
matmul_m_n = torch.matmul(m, n)
matmul_m_n.shape

torch.Size([10, 3, 5])

In [39]:
# 5.4 最后两维会广播
tensor1 = torch.randn(10, 1, 3, 4)
tensor2 = torch.randn(2, 4, 5)
matmul_k_p = torch.matmul(tensor1, tensor2)
matmul_k_p.shape

torch.Size([10, 2, 3, 5])