In [1]:
import torch
import torch.nn as nn
print(torch.cuda.is_available())

True


此note book 用于记录学习pytorch中的基本组件 \
以下是第一部分的学习 卷积层
```
torch.nn.Conv2d(
in_channels, out_channels, kernel_size,
stride=1, padding=0, dilation=1, groups=1,
bias=True, padding_mode='zeros', device=None, dtype=None
)
```
作用？
- group? 好像不怎么使用 就是讲inchannel 和 outchannel 分组进行计算 这样就可以并行加速了
- dilation?  空洞卷积
- padding mode?
- bias？
参数设置？ 可以使用tuple
- kernel_size
- stride
- padding
- dilation

输入输出的image的尺寸的公式：
$$
input : (N, C_{in}, H_{in}, W_{in}) \\
output: (N, C_{out}, H_{out}, W_{out}) \\
H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor \\
W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor \\
$$
卷积参数的shape：
$$
(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]})
$$
![卷积操作](imgs/inchannel_out_channel_and_convs_number.png)
二维卷积公式：
$$
input:(N, C_{\text{in}}, H, W) \\
output:(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) \\
\text{out}(N_i, C_{\text{out}_j})

= \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) \\
$$
> 以上的公式中都是不显示 H W 这两个维度的

这里的weights 的shape : $$(C_{\text{out}}, C_{\text{in}})$$   所以 $$weight(C_{\text{out}_j}, k)$$ 就是指其中的那个一个weight kernel \\
后面的 $$\text{input}(N_i, k)$$ 代表 input中 第k个channel的二维矩阵（图片）

问？每个filter中卷积核的个数=输入数据的通道数 那么每个filter输出的结果是 这N个卷积核的结果的和还是...
答：是这多个卷积核结果之和。

## 主要作用
降低数据的维度+特征提取

In [2]:
B = 1
C = 3
W = 32
H = 32
conv2d = nn.Conv2d(in_channels=3,out_channels=6,kernel_size=3,stride=1,padding=0)
rand_input = torch.randn(B,C,H,W)
print(rand_input)
output = conv2d(rand_input)
print(output)

print(f"shape of output:{output.shape}")

tensor([[[[ 2.7492, -0.5850,  0.6645,  ..., -0.0843,  0.9196,  0.5032],
          [ 0.0033,  0.6150, -0.0094,  ..., -0.7399,  0.1225, -0.0925],
          [ 1.1334,  0.8468,  0.6016,  ..., -1.0344, -0.4512, -0.3912],
          ...,
          [-1.2454, -1.2182, -0.1210,  ..., -1.0635,  2.0381, -1.2331],
          [ 0.7625,  0.6985,  0.6744,  ..., -0.5993,  0.3940, -0.1629],
          [-2.1266,  0.0457,  0.2667,  ..., -1.7995,  0.9823,  0.4535]],

         [[ 0.4228, -0.2508,  1.8892,  ...,  0.5423, -1.2994,  1.0039],
          [ 0.1983,  1.9052,  1.9258,  ..., -1.1269,  1.4375,  1.0716],
          [-0.5213, -0.3206, -1.2260,  ...,  0.3053,  0.6250,  0.1919],
          ...,
          [ 0.7234,  0.0990, -0.7705,  ...,  0.5804,  0.1557, -0.4291],
          [ 2.1061, -0.6887,  0.4298,  ...,  0.9134, -0.5004, -2.2304],
          [-0.6967, -0.6886, -0.9839,  ..., -0.2360,  0.5805, -0.3449]],

         [[ 0.9963, -2.5798,  0.3310,  ..., -0.6797,  1.5526,  1.0293],
          [ 0.1678,  0.9181,  

第二部分 max pooling层
```
torch.nn.MaxPool2d(
kernel_size, stride=None, padding=0,
dilation=1, return_indices=False, ceil_mode=False
)
```
stride_value default = kernel size
最后计算输出大小的公式和 conv一样

In [6]:
mxpooling = nn.MaxPool2d(kernel_size=2,stride=2)
mp_res = mxpooling(rand_input)
print(mp_res)
print(mp_res.shape)
print(rand_input.shape)

tensor([[[[ 2.7492,  0.7720,  1.7686,  2.8989,  0.6675,  1.3239,  1.9258,
            0.3387,  3.2310,  0.7679,  0.7288,  1.0333,  0.2415,  1.8076,
            1.3143,  0.9196],
          [ 1.3525,  0.6016,  0.6883,  1.5667,  0.2283,  0.0691,  0.5952,
            0.8215,  1.2634,  1.0897,  2.6298,  1.6139, -0.1389,  1.1012,
            1.2962,  0.2276],
          [ 1.6383,  2.0741,  0.2045,  1.5812,  1.9770,  1.2848,  1.5642,
           -0.0993,  1.8787,  0.3962,  0.1958,  0.8191,  0.4005,  1.6044,
            1.1210,  1.5537],
          [ 1.1072,  0.4568,  0.5509,  2.6087,  0.2178,  1.0347,  1.7890,
            1.8310,  2.1244,  2.5683,  0.9266,  1.9299,  0.7298,  1.9629,
            2.1985,  0.9904],
          [ 0.5263,  1.3765,  1.0478,  0.6585,  2.1655,  0.9058,  0.4137,
            0.2748,  0.9338,  0.6746,  0.7377,  0.9522,  0.3981,  0.9531,
            0.8447,  2.1483],
          [ 0.0553,  1.6250,  0.5976,  1.1540,  2.2395,  2.0076,  0.4031,
            1.2469,  0.4830,  0.7642

batchNormal 层

batchNormal的主要流程
- 计算均值和方差
- 使用均值和方差（标准差）归一化batch中的所有数值
- 缩放和平移？（）

- 为什么要有batchNormalize操作？
- 缩放和平移是什么操作？为什么要有缩放和平移操作？

### 缩放和平移操作
batchNormal 将数值归一化 x_norm = (x-u)/std
缩放和平移操作：$$y_{i}=\gamma\dot{x}_{i}+\beta$$

### 均值和方差的更新
batchNormal层的在推理的时候使用当前batch数据的均值和方差去归一化当前batch的数据，并且使用当前batch数据计算出来的均值和方差更新全局的均值和方差
更新过程如下：
$$
\begin{array}{c}{{\mu=\mathrm{momentum}\times\mu+(1-\ m\mathrm{omentum})\times\mu_{B}}}\\ {{\sigma}}\\ {{\sigma^{2}=\mathrm{momentum}\times\sigma^{2}+(1-\ m\mathrm{omentum})\times\sigma_{B}^{2}}}\end{array}
$$

In [1]:
import torch
import torch.nn as nn

# 定义一个 BatchNorm 层
batch_norm = nn.BatchNorm2d(3)  # 假设输入有 3 个通道

# 生成随机输入数据，形状为 (批量大小, 通道数, 高度, 宽度)
input_data = torch.randn(4, 3, 5, 5)  # 这里批量大小为 4，高度和宽度为 5

# 应用 BatchNorm 层
output_data = batch_norm(input_data)

# 打印输入和输出数据
print("输入数据:\n", input_data)
print("\n输出数据:\n", output_data)


输入数据:
 tensor([[[[ 3.8243e-01, -2.8936e-01, -1.4393e+00,  1.4487e+00, -4.2554e-01],
          [ 1.1096e-01,  3.2976e-01, -1.7931e-01, -8.5284e-01, -1.0374e-01],
          [-7.7632e-03, -6.7513e-01, -1.0480e+00,  2.2964e+00, -2.1147e-01],
          [ 1.2232e+00,  1.7368e-01,  6.4999e-01, -4.5079e-01,  3.8084e-01],
          [-1.7682e+00,  3.6582e-01, -5.1679e-01,  4.1868e-01,  8.1121e-02]],

         [[-7.0161e-01, -8.5976e-01, -2.0912e-01,  7.4458e-01, -2.2513e+00],
          [ 2.2255e+00, -5.8848e-01,  1.3384e+00, -4.7989e-01,  5.4484e-01],
          [ 2.5878e-01,  6.7666e-01, -1.4280e-01,  2.4399e-01,  4.5065e-01],
          [ 3.2518e-01,  9.9259e-01,  4.4418e-01, -2.5541e+00, -1.2634e+00],
          [ 4.7942e-01, -1.3516e+00,  1.6378e+00, -1.4406e+00,  5.7740e-01]],

         [[ 1.0271e+00, -6.7980e-01,  7.4454e-01,  4.0147e-02,  8.6911e-01],
          [ 5.3564e-01, -1.5960e+00,  1.2090e-02, -2.5937e-01,  4.8759e-01],
          [-1.8451e+00,  8.0335e-01,  5.4286e-01, -7.5915e-01, -3