<a href="https://colab.research.google.com/github/Voyageran/StartNN/blob/main/Convolution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import sys
from google.colab import drive
drive.mount('/content/gdrive')
sys.path.insert(0,"/content/content/notebooks/colabInstallPackage")

!cp -av '/content/gdrive/MyDrive/Colab Notebooks/d2l' '/content/'

Mounted at /content/gdrive
'/content/gdrive/MyDrive/Colab Notebooks/d2l' -> '/content/d2l'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/paddle.py' -> '/content/d2l/paddle.py'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/mxnet.py' -> '/content/d2l/mxnet.py'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/tensorflow.py' -> '/content/d2l/tensorflow.py'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/torch.py' -> '/content/d2l/torch.py'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/__init__.py' -> '/content/d2l/__init__.py'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/__pycache__' -> '/content/d2l/__pycache__'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/__pycache__/mxnet.cpython-310.pyc' -> '/content/d2l/__pycache__/mxnet.cpython-310.pyc'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/__pycache__/__init__.cpython-310.pyc' -> '/content/d2l/__pycache__/__init__.cpython-310.pyc'
'/content/gdrive/MyDrive/Colab Notebooks/d2l/__pycache__/torch.cpython-310.pyc' -> '/content/d2l/__pycache__/torch.cpython-

# **卷积（操作子）**

## **Why 卷积？**

**Classification for cat and dog images**

为什么不用但隐藏层MLP？

我们来做个假设。设我们收集到的图片是12M，RGB三个通道，所以36M，如果使用100神经元的单隐藏层MLP，那么模型有3.6B元素。这么多元素比世界上所有猫狗总和还多，不如直接把所有手动标记。

**遵循规则**：


e.g.，在一张图片中用识别器找一个人
1.   平移不变性：识别器在图片的每个位置都是一样的
2.   局部性：可以只用看局部信息



## **卷积是特殊的全连接层**

普通全连接层，把每张图片vectorize成a vector，做全连接的时候需要一个二维的weight matrix。

现在我们不把图片（input+output）打散，保留空间结构。即还存在宽度和高度。那么存权重的应该是个四维张量（ij存输出，kl管输入），
$$
h_{ij} = \sum_{k,l}w_{i,j,k,l}x_{k,l} = \sum_{a,b}v_{i,j,a,b} x_{i+a,j+b}
$$
Where $k = i+a$, $l = j+b$. $V$是$W$的重新索引。

**原则一：平移不变性**

现在$h_{i,j} = \sum_{a,b}v_{i,j,a,b}x_{i+a,j+b}$中，$v_{i,j,a,b}$依赖于$(i,j)$

***Solution***:

Let $v_{i,j,a,b} = v_{a,b}$, then
$$
h_{i,j} = \sum_{a,b}v_{a,b} x_{i+a,j+b}
$$
这是2D交叉相关（DL叫卷积，不严谨，因为卷积是反着走的，应该a->(-a),b->(-b)）。不用存那么多权重。

**原则二：局部性**
$$
h_{i,j} = \sum_{a,b}v_{a,b} x_{i+a,j+b}
$$
我们只想看部分。

***Solution***:

远离一定范围的时候，权重为0。写成公式为：
$|a|,|b|> Δ$ such that $v_{a,b}=0$,
$$
h_{i,j} = \sum_{a=-Δ}^{Δ}\sum_{b=-Δ}^{Δ}x_{i+a,j+b}
$$
控制$a,b$的范围。

# **卷积层**

E.g.，

Input:
$
\begin{bmatrix}
0&1&2\\
3&4&5\\
6&7&8
\end{bmatrix}
$
Kernel:
$
\begin{bmatrix}
0&1\\
2&3
\end{bmatrix}
$

Output:
$
\begin{bmatrix}
19&25\\
37&43
\end{bmatrix}
$

算法：
$0\times 0+1\times 1+2\times 3+3\times 4 = 19$

...

可以用不同的核做 边缘检测、锐化、高斯模糊。

- 1D
$$
y_{i} = \sum_{a=1}^{h}w_{a}x_{i+a}
$$
文本、语言、时间序列

- 3D
$$
y_{i,j,k} = \sum_{a=1}^{h}\sum_{b=1}^{w}\sum_{c=1}^{d} w_{a,b,c}x_{i+a,j+b,k+c}
$$
Videos, medical images （核磁共振）, 气象地图（时间轴）

- 卷积层将input和kernel进行交叉相关，加上bias后得到output。
- Kernel和bias是可以学习的params
- The size of kernel matrix is a hyper-parameter。

In [2]:
import torch
from torch import nn
from d2l import torch as d2l

In [7]:
def corr2d(X, K):
  """Compute 2d corr"""
  h, w = K.shape #K:kernel
  Y = torch.zeros((X.shape[0]-h+1, X.shape[1]-w+1))
  for i in range(Y.shape[0]):
    for j in range(Y.shape[1]):
      Y[i,j] = (X[i: i+h, j: j+w] *K).sum()
  return Y

In [8]:
# test
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
corr2d(X, K)

tensor([[19., 25.],
        [37., 43.]])

In [9]:
# 2D convolution
class Conv2D(nn.Module):
  def __init__(self, kernel_size):
    super().__init__()
    self.weight = nn.Parameter(torch.rand(kernel_size))
    self.bias = nn.Prameters(torch.zeros(1))

  def forward(self, x):
    return corr2d(x, self.weight) + self.bias

In [10]:
# 边缘检测
# black:0, white:1
X = torch.ones((6, 8))
X[:, 2:6] = 0
X

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])

In [11]:
K = torch.tensor([[1.0,-1.0]]) #水平相邻的两元素相同，则输出为零，否则输出为非零。

In [12]:
Y = corr2d(X, K) # 黑变白：-1， 白变黑：1
Y #只能检测垂直边缘

tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])

In [13]:
corr2d(X.t(), K)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

Given `X` and `Y`, learn the kernel `K`

In [16]:
# nn.Conv2d
conv2d = nn.Conv2d(1,1, kernel_size=(1,2), bias = False) #1，1：输入输出通道均为1

X = X.reshape((1,1,6,8))
Y = Y.reshape((1,1,6,7))

for i in range(10):
  Y_hat = conv2d(X)
  l = (Y_hat - Y)**2
  conv2d.zero_grad()
  l.sum().backward()
  conv2d.weight.data[:] -= 3e-2 * conv2d.weight.grad
  if (i+1) % 2 ==0:
    print(f'batch{i+1}, loss{l.sum():.3f}')

batch2, loss11.268
batch4, loss2.573
batch6, loss0.712
batch8, loss0.234
batch10, loss0.086


In [17]:
conv2d.weight.data.reshape((1,2))

tensor([[ 0.9558, -1.0140]])