## 卷積神經網絡 (CNN):

### 簡介


CNN 是一種深度學習算法，特別適用於處理具有格點結構的數據，如圖像（2D格點）和視頻（3D格點）。CNN 通過滑動窗口提取局部特徵，並通過層層的特徵合成，最終實現從局部到全局的特徵學習。


### 結構
$$
\begin{align*}
\text{Input Layer} & \rightarrow \text{Image data of dimensions (height, width, depth)} \\
\text{Convolutional Layer} & \rightarrow \text{Filters applied to the input to produce feature maps} \\
\text{Pooling Layer} & \rightarrow \text{Downsampling operation to reduce spatial dimensions} \\
\text{Fully Connected Layer} & \rightarrow \text{Neurons connected to all activations in the previous layer} \\
\text{Output Layer} & \rightarrow \text{Class scores or predictions}
\end{align*}
$$

<style>
    img {
        display: block;
        margin-left: auto;
        margin-right: auto;
    }
</style>

![image](https://i.imgur.com/v4VM3qu.gif)


#### 解釋:
可以看到，
- 滑動的窗口就是kernel
- stride則是每次kernel移動多少
- padding是在邊界添加額外像素，使尺寸一致

#### 計算size:
$$\text{Output Size} = \frac{(\text{Input Size} - \text{Kernel Size} + 2 \times \text{Padding})}{\text{Stride}} + 1$$

*基本上過conv層大小不會變，過pool層才會

In [None]:
#第一步:老樣子，先import
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [None]:
#第二步:定義神經網路
class CNN(nn.Module):
    def __init__(self):
        super(CNN,self).__init__()
        #定義卷積層
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1) #輸入1，輸出32，kernel_size=3, stride=1, padding=1
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        #定義池化層
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        #定義全連接層
        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        #卷積->活化->池化
        x = self.maxpool(F.relu(self.conv1(x)))
        x = self.maxpool(F.relu(self.conv2(x)))
        #reshape，-1表示自動reshape
        x = x.view(-1, 64*7*7) #計算出來的7*7是因為經過兩次maxpooling, 28/2/2=7
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [None]:
#第三步:定義超參數
BatchSize = 64
LR = 0.01
EPOCHS = 5

In [None]:
#第四步:下載數據，這裡使用MNIST(新手數據集)
from torchvision import datasets, transforms
from torchvision.transforms import ToPILImage
transforms = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ]
)


mnist_dataset = datasets.MNIST(root='./data', train=True, transform=transforms, download=True)

train_size = int(0.8 * len(mnist_dataset))
val_size = len(mnist_dataset) - train_size

train_dataset, val_dataset = torch.utils.data.random_split(mnist_dataset, [train_size, val_size])


train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 52360510.26it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 1825157.36it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 14190198.84it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 7143055.41it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [None]:
train_loader

<torch.utils.data.dataloader.DataLoader at 0x7e18b42d3f10>

In [None]:
#第五步:實現神經網路，定義損失函數和優化器
model = CNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LR)

In [None]:
from tqdm import tqdm
#第六步:訓練神經網路
for i in range(EPOCHS):
    model.train()
    for data, target in tqdm(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

    model.eval()
    val_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in val_loader:
            output = model(data)
            val_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    val_loss /= len(val_loader.dataset)
    print('\nValidation set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        val_loss, correct, len(val_loader.dataset),
        100. * correct / len(val_loader.dataset)))

100%|██████████| 750/750 [01:32<00:00,  8.08it/s]



Validation set: Average loss: 0.0018, Accuracy: 11572/12000 (96%)



  4%|▍         | 30/750 [00:03<01:24,  8.53it/s]


KeyboardInterrupt: 

In [None]:
#儲存model
torch.save(model.state_dict(), "MnistCnn.pth")

In [None]:
#推論
class CNN(nn.Module):
    def __init__(self):
        super(CNN,self).__init__()
        #定義卷積層
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1) #輸入1，輸出32，kernel_size=3, stride=1, padding=1
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        #定義池化層
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        #定義全連接層
        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        #卷積->活化->池化
        x = self.maxpool(F.relu(self.conv1(x)))
        x = self.maxpool(F.relu(self.conv2(x)))
        #reshape，-1表示自動reshape
        x = x.view(-1, 64*7*7) #計算出來的7*7是因為經過兩次maxpooling, 28/2/2=7
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
model = CNN()
model.load_state_dict(torch.load("MnistCnn.pth"))

<All keys matched successfully>

In [None]:
model.eval()
correct = 0
val_loss = 0
with torch.no_grad():
    for data, target in val_loader:
        output = model(data)
        val_loss += criterion(output, target).item()
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()

    val_loss /= len(val_loader.dataset)
    print('\nValidation set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        val_loss, correct, len(val_loader.dataset),
        100. * correct / len(val_loader.dataset)))


Validation set: Average loss: 0.0017, Accuracy: 11596/12000 (97%)

