# Day 13

> [DL Basic] [CNN - Convolution은 무엇인가?](https://github.com/changwoomon/Boostcamp-AI-Tech/blob/main/Week%203/Day%2013/cnn.ipynb)

### Convolution
- Continuous convolution
$$(f*g)(t)=\int f(\tau)g(t-\tau)d\tau=\int f(t-\tau)g(t)d\tau$$
- Discrete convolution
$$(f*g)(t)=\sum_{i=-\infty}^{\infty}f(i)g(t-i)=\sum_{i=-\infty}^{\infty}f(t-i)g(i)$$
- 2D image convolution
$$(I*K)(i,j)=\sum_{m}\sum_{n}I(m,n)K(i-m,j-n)=\sum_{m}\sum_{n}I(i-m,j-n)K(m,n)$$

### Convolutional Neural Networks (CNN)
- CNN consists of convolution layer, pooling layer, and fully connected layer
    - Convolution and Pooling layers : feature extraction
    - Fully connected layer : decision making (ex. classification)

### 1x1 Convolution
- Why?
    - Dimension reduction
    - To reduce the number of parameters while increasing the depth
    - ex. bottleneck architecture

> [DL Basic] Modern CNN - 1x1 convolution의 중요성

### [AlexNet](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
- ILSVRC
    - **I**mageNet **L**arge-**S**cale **V**isual **R**ecognition **C**hallenge
        - Classification / Detection / Localization / Segmentation
        - 1,000 different categories
        - Over 1 million images
        - Training set : 456,567 images        
- Key ideas
    - Rectified Linear Unit (ReLU) activation
    - GPU implementation (2 GPUs)
    - Local response normalization, Overlapping pooling
    - Data augmentation
    - Dropout
- ReLU Activation
    - Preserves properties of linear models
    - Easy to optimize with gradient descent
    - Good generalization
    - Overcome the vanishing gradient problem

### [VGGNet](https://arxiv.org/pdf/1409.1556.pdf)
- Increasing depth with **3x3** convolution filters (with stride 1)
- 1x1 convolution for fully connected layers
- Dropout (p=0.5)
- VGG16, VGG19

### [GoogLeNet](https://arxiv.org/pdf/1409.4842.pdf)
- GoogLeNet won the ILSVRC at 2014
    - It combined network-in-network (NiN) with inception blocks
- Inception blocks
    - What are the benefits of the inception block?
        - Reduce the number of parameter
    - How?
        - Recall how the number of parameters is computed
        - 1x1 convolution can be seen as channel-wise dimension reduction
    - Benefit of 1x1 convolution
        - **1x1 convolution** enables about 30% reduce of the number of parameters

### [ResNet](https://arxiv.org/pdf/1512.03385.pdf)
- Deeper neural networks are hard to train
    - Overfitting is usually caused by an excessive number of parameters
    - But, not in this case
- Add an identity map (skip connection)
- Add an identity map after nonlinear activations 
- Batch normalization after convolutions
- Bottleneck architecture
- Performance increases while parameter size decreases

### [DenseNet](https://arxiv.org/pdf/1608.06993.pdf)
- DenseNet uses **concatenation** instead of **addition**
- Dense Block
    - Each layer concatenates the feature maps of all preceding layers
    - The number of channels increases geometrically
- Transition Block
    - BatchNorm -> 1x1 Conv -> 2x2 AvgPooling
    - Dimension reduction

### Summary
- VGG : repeated 3x3 blocks
- GoogLeNet : 1x1 convolution
- ResNet : skip-connection
- DenseNet : concatenation

> [DL Basic] Computer Vision Applications

### Semantic Segmentation
- Fully Convolutional Network
- Deconvolution (conv transpose)

### Detection
- [R-CNN](https://arxiv.org/pdf/1311.2524.pdf)
    1. takes an input image
    2. extracts around 2,000 region proposals (using Selective search)
    3. compute features for each proposal (using AlexNet)
    4. classifies with linear SVMs

- [SPPNet](https://arxiv.org/pdf/1406.4729.pdf)
    - In R-CNN, the number of cop/warp is usually over 2,000 meaning that CNN must run more than 2,000 times (59s/image on CPU)
    - However, in SPPNet, CNN runs once

- [Fast R-CNN](https://arxiv.org/pdf/1504.08083.pdf)
    1. Takes an input and a set of bounding boxes
    2. Generated convolutional feature map
    3. For each region, get a fixed length feature from ROI pooling
    4. Two outputs : class and bounding-box regressor

- [Faster R-CNN](https://arxiv.org/pdf/1506.01497.pdf)
    - Faster R-CNN = <span style="color:red">Region Proposal Network</span> + Fast R-CNN
    - Region Proposal Network
        - Anchor boxes : detection boxes with predefined sizes

- [YOLO](https://arxiv.org/pdf/1506.02640.pdf)
    - YOLO (v1) is an extremely fast object detection algorithm
        - baseline : 45fps / smaller version : 155fps
    - It **simultaneously** predicts multiple bounding boxes and class probabilities
        - No explicit bounding box sampling (compared with Faster R-CNN)

    1. Given an image, YOLO divides it into SxS grid
        - If the center of an object falls into the grid cell, that grid cell is responsible for detection
    2. Each cell predicts B bounding boxes (B=5)
        - Each bounding box predicts
            - box refinement (x,y,w,h)
            - confidence (of objectness)
    3. Each cell predicts C class probabilities
    - In total, it becomes a tensor with SxSx(Bx5+c) size
        - SxS : Number of cells of the grid
        - Bx5 : B bounding boxes with offsets (x,y,w,h) and confidence
        - C : Number of classes

> [DL Basic] CNN - 강아지 종류 분류하기
- [dataset](https://github.com/changwoomon/Boostcamp-AI-Tech/blob/main/Week%203/Day%2013/dog_breed_dataset.ipynb)
- [cnn](https://github.com/changwoomon/Boostcamp-AI-Tech/blob/main/Week%203/Day%2013/dog_breed_CNN.ipynb)

> [DL Basic] CNN - 나만의 데이터셋 만들기
- [google-images-download](https://github.com/Joeclinton1/google-images-download.git)
- [google images download 설치하기](https://github.com/BoostcampAITech/lecture-note-python-basics-for-ai/blob/main/codes/pytorch/00_utils/google%20images%20download.md)