# Convolution operator

Image Processing 時，單純使用 fully connected networks (dense networks) 有許多問題。

1. 是否需要考慮各變數 (features) 之間的所有關係：一張照片的最左上角和最右下角的 pixels 之間可能完全沒有關係，因此這兩個 inputs 的 weights 不需要連到同一個 hidden layer 的 node。因此，使用 dense networks 會產生過多多餘的 weights，造成計算的成本增加以及耗時太久。
2. 太多 weights 和 bias，可能造成 overfitting。

因此，hidden layers 的 nodes 只需要連到前一個 layer 的幾個 nodes (相鄰的 pixels) 即可。現今最常用的架構就是 Convolutional Neural Network (CNN)

## Implementing CNN in PyTorch

有兩種方式：
1. 物件導向的方式建立一個 network 的 class (使用 torch.nn) (下圖左)
2. Functional (使用 torch.nn.functional) (下圖右)

![](Image/Image12.jpg)

### 實際寫法的差異

**僅使用一個 kernel**
![](Image/Image13.jpg)

**使用五個 kernel**
![](Image/Image14.jpg)

in_channels = 3 代表輸入的照片有 RGB 三張圖層。 <br/>
out_channels = 1 代表輸出的 feature map 只有一張，也就代表只有使用一個 kernel。 <br/>
out_channels = 5 代表輸出的 feature map 有五張，也就代表有使用五個 kernels。 <br/>
stride 代表 kernel 移動的速度。 <br/>
padding 代表 kernel 是否要超出照片的邊緣，可以讓 kernel 掃出來的 feature maps 不會變小。

# Pooling operators

![](Image/Image15.jpg)

如上圖，假設有一組 64 個經過 kernels 掃出來的 feature maps， **Pooling** 就是將每個 feature maps 的解析度減少 (typically 長寬都變一半)。

Pooling 可以讓訓練過程更有效率，也可以讓模型較不 overfitting (參數減少)。

---

有兩種常用的 pooling：
1. Max pooling

![](Image/Image16.jpg)

2. Average pooling

![](Image/Image17.jpg)

## Max-pooling in PyTorch

一樣有兩種方式：
1. 物件導向的方式建立一個 network 的 class (使用 torch.nn) (下圖左)
2. Functional (使用 torch.nn.functional) (下圖右)

![](Image/Image18.jpg)

Tensor 被四個中括號包含，代表 image 有四個 dimensions (minibatch size, depth, height, width)

**torch.nn.MaxPool2d(2)** 以及 **F.max_pool2d(im, 2)** 代表將整個 image 分成許多 2 * 2 的格子，再取每個 2 * 2 的格子的 max 值來完成 pooling

## Average-pooling in PyTorch

一樣有兩種方式：
1. 物件導向的方式建立一個 network 的 class (使用 torch.nn) (下圖左)
2. Functional (使用 torch.nn.functional) (下圖右)

![](Image/Image19.jpg)

# Convolutional Neural Networks

**AlexNet 架構**

![](Image/Image20.jpg)

CNNs 其實就是包含許多 convolutional kernels 、 pooling layers 以及一些 dense layers 的 neural networks。上圖就有 5 個 convolutional layers、3 個 max-pooling layers、1 個 average-pooling layer、最後還有 3 個 fully connected layers，並且可以將 images 分成 1000 個不同的 classes。

## 建立 AlexNet 架構 in PyTorch

![](Image/Image21.jpg)

256 * 6 * 6 代表輸入的 channels 有 256 個，而每個 channel 的解析度都是 6 * 6。

![](Image/Image22.jpg)

### 實際操作

Convolutional Layers 的 channels 數量是沒有固定的 (hyper parameter)，可嘗試不同的數量。

In [3]:
# import
import torch
import torch.nn as nn

# define the CNN model
class Net(nn.Module):
    def __init__(self, num_classes):
        super(Net, self).__init__()
        
        # Instantiate the ReLU nonlinearity
        self.relu = nn.ReLU()
        
        # Instantiate two convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)
        
        # Instantiate a max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        
        # Instantiate a fully connected layer
        self.fc = nn.Linear(7 * 7 * 10, 10)

    def forward(self, x):

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv1(x))
        x = self.pool(x)

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv2(x))
        x = self.pool(x)

        # Prepare the image for the fully connected layer
        x = x.view(-1, 7 * 7 * 10)

        # Apply the fully connected layer and return the result
        return self.fc(x)

# Training Convolutional Neural Networks (CIFAR-10)

In [4]:
# import
import torch
import torchvision    # a package which deals with datasets and pre-trained neural networks
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## Create training dataloader and testing dataloader

In [5]:
# define a transformation of images to torch tensors using torchvision transforms 
# 下載資料時就可以直接用這個定義好的 transform 物件來轉換資料
transformCIFAR = transforms.Compose(
    [transforms.ToTensor(),    # define a transformation of images object
    transforms.Normalize((0.4914, 0.48216, 0.44653),    # R, G, B 的平均 (pre-computed) 用來將數據標準化
                        (0.24703, 0.24349, 0.26159))]    # R, G, B 的標準差 (pre-computed) 用來將數據標準化
)

# download the CIFAR-10 datasets
training_setCIFAR = torchvision.datasets.CIFAR10(root = "./Datasets/CIFAR-10", train = True, download = True, transform = transformCIFAR)
testing_setCIFAR = torchvision.datasets.CIFAR10(root = "./Datasets/CIFAR-10", train = False, download = True, transform = transformCIFAR)

# load the CIFAR-10 datasets
trainloaderCIFAR = torch.utils.data.DataLoader(training_setCIFAR, batch_size = 32, shuffle = True, num_workers = 4)
testloaderCIFAR = torch.utils.data.DataLoader(testing_setCIFAR, batch_size = 32, shuffle = False, num_workers = 4)

Files already downloaded and verified
Files already downloaded and verified


## Define the model

In [8]:
# define the CNN model
class CIFAR_Net(nn.Module):
    def __init__(self, num_classes = 10):
        super(CIFAR_Net, self).__init__()
        
        # Instantiate the ReLU nonlinearity
        self.relu = nn.ReLU()
        
        # Instantiate two convolutional layers
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        
        # Instantiate a max pooling layer
        self.pool = nn.MaxPool2d(2, 2)    # pooling 會被用在每個 convolutional layer 之後 (所以會用 3 次)
        
        # Instantiate a fully connected layer
        self.fc = nn.Linear(4 * 4 * 128, num_classes)    # pooling 3 次所以解析度變成 4 * 4 (32 / 2 --> 16 / 2 --> 8 / 2 --> 4)

    def forward(self, x):

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv1(x))
        x = self.pool(x)

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv2(x))
        x = self.pool(x)
        
        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv3(x))
        x = self.pool(x)

        # Prepare the image for the fully connected layer
        x = x.view(-1, 4 * 4 * 128)

        # Apply the fully connected layer and return the result
        return self.fc(x)

## Define the optimizer and loss function

In [9]:
# 建立模型
net = CIFAR_Net()

# 建立 cross entropy loss function 的物件
criterion = nn.CrossEntropyLoss()

# 設定 optimizer
optimizer = optim.Adam(net.parameters(), lr = 3e-4)

## Training

In [11]:
for epoch in range(10):    # 所有的資料要被訓練10次 (10 個 epochs)
    for i, data in enumerate(trainloaderCIFAR, start = 0):    # 每個 batch 之後都會更新 weights (i 為 batch number, data 為每個 batch 的資料集)
        # get the inputs
        inputs, labels = data
        
        # 將 optimizer 的 gradient 歸零，以避免迴圈前一次的 gradient 的影響。每一次更新 weights 的 gradients 都是重新開始算的
        optimizer.zero_grad()
        
        # Forward propagation (計算 output)
        outputs = net(inputs)
        
        # optimisation
        loss = criterion(outputs, labels)
        
        # Backpropagation
        loss.backward()    # 計算 loss 這個函數中各個 weights 的 gradients
        
        # 利用計算得到的 gradients 更新 weights
        optimizer.step()
        
print("Finish Training!")        

  Variable._execution_engine.run_backward(


Finish Training!


## Evaluate the result

In [12]:
correct, total = 0, 0
predictions = []

# 將 net 這個 Net 的物件設定為 evaluation mode
net.eval()

for i, data in enumerate(testloaderCIFAR, start = 0):
    # get the testing data
    inputs, labels = data
    
    # 預測結果 (計算 output scores for each class)
    outputs = net(inputs)
    
    # 將 output scores 轉換成類別 (誰的 output score 最大就屬於哪個類別)
    _, predicted = torch.max(outputs.data, 1)
    
    # 將預測的類別轉換成 list
    predictions.append(outputs)
    
    # 計算資料的總數
    total += labels.size(0)
    
    # 計算預測正確的數量
    correct += (predicted == labels).sum().item()
    
# 印出預測正確率    
print("The CIFAR-10 testing set accuracy of the network if: %d %%" % (100 * correct / total))

The CIFAR-10  testing set accuracy of the network if: 75 %


---

# Training Convolutional Neural Networks (MNIST)

In [4]:
# import
import torch
import torchvision    # a package which deals with datasets and pre-trained neural networks
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## Create training dataloader and testing dataloader

In [13]:
# Transform the data to torch tensors and normalize it 
transformMNIST = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.1307), ((0.3081)))])   # 由於 MNIST 是灰階，因此沒有像 RGB 一樣三個平均和三個標準差，這裡平均為 0.1307，標準差為 0.3081

# Prepare training set and testing set
trainsetMNIST = torchvision.datasets.MNIST(root = "./Datasets/MNIST", train=True, 
                                      download=True, transform=transformMNIST)
testsetMNIST = torchvision.datasets.MNIST(root = "./Datasets/MNIST", train=False, 
                                     download=True, transform=transformMNIST)

# Prepare training loader and testing loader
trainloaderMNIST = torch.utils.data.DataLoader(trainsetMNIST, batch_size = 32,
                                          shuffle = True, num_workers=8)
testloaderMNIST = torch.utils.data.DataLoader(testsetMNIST, batch_size = 32,
                                         shuffle = False, num_workers=8) 

  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


## Define the model

In [16]:
# define the CNN model
class MNIST_Net(nn.Module):
    def __init__(self, num_classes = 10):
        super(MNIST_Net, self).__init__()
        
        # Instantiate the ReLU nonlinearity
        self.relu = nn.ReLU()
        
        # Instantiate two convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        
        # Instantiate a max pooling layer
        self.pool = nn.MaxPool2d(2, 2)    # pooling 會被用在每個 convolutional layer 之後 (所以會用 3 次)
        
        # Instantiate a fully connected layer
        self.fc = nn.Linear(7 * 7 * 64, num_classes)    # pooling 2 次所以解析度變成 7 * 7 (28 / 2 --> 14 / 2 --> 7)

    def forward(self, x):

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv1(x))
        x = self.pool(x)

        # Apply conv followd by relu, then in next line pool
        x = self.relu(self.conv2(x))
        x = self.pool(x)

        # Prepare the image for the fully connected layer
        x = x.view(-1, 7 * 7 * 64)

        # Apply the fully connected layer and return the result
        return self.fc(x)

## Define the optimizer and loss function

In [17]:
# 建立模型
net = MNIST_Net()

# 建立 cross entropy loss function 的物件
criterion = nn.CrossEntropyLoss()

# 設定 optimizer
optimizer = optim.Adam(net.parameters(), lr = 3e-4)

## Training

In [18]:
for epoch in range(10):    # 所有的資料要被訓練10次 (10 個 epochs)
    for i, data in enumerate(trainloaderMNIST, start = 0):    # 每個 batch 之後都會更新 weights (i 為 batch number, data 為每個 batch 的資料集)
        # get the inputs
        inputs, labels = data
        
        # 將 optimizer 的 gradient 歸零，以避免迴圈前一次的 gradient 的影響。每一次更新 weights 的 gradients 都是重新開始算的
        optimizer.zero_grad()
        
        # Forward propagation (計算 output)
        outputs = net(inputs)
        
        # optimisation
        loss = criterion(outputs, labels)
        
        # Backpropagation
        loss.backward()    # 計算 loss 這個函數中各個 weights 的 gradients
        
        # 利用計算得到的 gradients 更新 weights
        optimizer.step()
        
print("Finish Training!")        

Finish Training!


## Evaluate the result

In [19]:
correct, total = 0, 0
predictions = []

# 將 net 這個 Net 的物件設定為 evaluation mode
net.eval()

for i, data in enumerate(testloaderMNIST, start = 0):
    # get the testing data
    inputs, labels = data
    
    # 預測結果 (計算 output scores for each class)
    outputs = net(inputs)
    
    # 將 output scores 轉換成類別 (誰的 output score 最大就屬於哪個類別)
    _, predicted = torch.max(outputs.data, 1)
    
    # 將預測的類別轉換成 list
    predictions.append(outputs)
    
    # 計算資料的總數
    total += labels.size(0)
    
    # 計算預測正確的數量
    correct += (predicted == labels).sum().item()
    
# 印出預測正確率    
print("The MNIST testing set accuracy of the network if: %d %%" % (100 * correct / total))

The MNIST testing set accuracy of the network if: 98 %


Convolutional Neural Networks 的預測結果遠高於 fully connected networks