<a href="https://colab.research.google.com/github/KwonHo-geun/AI_Study/blob/main/25.07.15_DenseNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 DenseNet: Densely Connected Convolutional Networks

---

## ✅ DenseNet이란?

**DenseNet**은 2017년 CVPR에서 발표된 구조로,  
기존 CNN의 정보 손실 문제와 gradient vanishing 문제를 해결하기 위해 등장했습니다.

> 핵심 아이디어:  
> **모든 이전 레이어의 출력을 다음 레이어 입력으로 연결**한다는 것

DenseNet은 각 레이어가 **앞선 모든 레이어의 feature map을 입력으로 받아**,  
feature를 재사용(reuse)하고 gradient 흐름을 개선합니다.

## 🔍 DenseNet의 구조 핵심: Dense Block

일반 CNN은 레이어들이 순차적으로 연결됩니다:

```
x₀ → x₁ → x₂ → x₃ → ...
```

DenseNet에서는 다음과 같이 연결됩니다:

```
x₁ = H₁(x₀)  
x₂ = H₂([x₀, x₁])  
x₃ = H₃([x₀, x₁, x₂])  
...
```

- `[x₀, x₁, ..., x_{l-1}]`: 채널 방향으로 concat한 것
- `Hₗ`: 각 레이어의 연산 (BN → ReLU → Conv)

모든 이전 출력이 다음 입력에 **concatenation**되어 전달되는 것이 핵심입니다.

## 🧱 Dense Block 구조

```
Input
 ├── Layer 1: H₁(x₀)
 ├── Layer 2: H₂([x₀, x₁])
 ├── Layer 3: H₃([x₀, x₁, x₂])
 └── ...
 ↓
Transition Layer (1×1 Conv + AvgPool)
```

- 여러 개의 레이어가 하나의 블록을 구성
- 블록 사이에는 Transition Layer를 통해 크기 및 채널 수 조절

***요약 ***
- 정보의 재사용으로 메모리와 파라미터를 효율적으로 사용가능  
- Bottleneck : 병목현상 줄임
- Feature 크기 : 채널 수 줄임
- Feature를 더이상 학습하지 않고, 마지막 층에있는 것들만 계산한다는것
- 즉, 각각의 Conv2 layer들은 바뀌지 않고, 별도의 채널로 두고 Transition Layer로 조정

- concat(=concatenate)
주어진 차원을 따라 텐서들을 연결하는데 사용
딥 러닝에서는 주로 모델의 입력 또는 중간 연산에서 두 개의 텐서를 연결하는 경우가 많음.
두 텐서를 연결해서 입력으로 사용하는 것은 두 가지의 정보를 모두 사용한다는 의미.

## ⚙️ 주요 구성 요소

### 1. Dense Block
- BN → ReLU → 1×1 Conv (Bottleneck)
- BN → ReLU → 3×3 Conv
- 결과를 입력과 함께 concat

### 2. Transition Layer
- 1×1 Conv + 2×2 AvgPooling
- feature map의 크기와 채널 수를 줄임

### 3. Growth Rate (k)
- 각 레이어가 생성하는 채널 수
- DenseNet-121 기준: `k = 32`


## 🚀 DenseNet의 장점

| 장점 | 설명 |
|------|------|
| ✅ Feature reuse | 이전 feature를 concat하여 재사용 |
| ✅ 효율적 파라미터 | 깊은 구조 대비 적은 파라미터 |
| ✅ Gradient 흐름 개선 | Dense connection으로 역전파 경로가 풍부 |
| ✅ Regularization 효과 | dropout 없이도 과적합 방지 효과 |


## 📊 일반 CNN vs DenseNet

| 항목 | 일반 CNN | DenseNet |
|------|-----------|----------|
| 연결 방식 | 순차 연결 | 모든 이전 레이어와 연결 |
| 정보 재사용 | 없음 | 활발히 재사용 |
| 파라미터 수 | 비교적 많음 | 적음 |
| Gradient 흐름 | 제한적 | 매우 활발 |

---

## 🧬 DenseNet-121 예시 구조

- Conv1: 7×7 Conv + MaxPool
- Dense Block 1 (6 layers)
- Transition Layer
- Dense Block 2 (12 layers)
- Transition Layer
- Dense Block 3 (24 layers)
- Transition Layer
- Dense Block 4 (16 layers)
- Global Avg Pool
- FC Layer

총 121개의 레이어로 구성됨

## 🧪 PyTorch 예제

```python
from torchvision.models import densenet121

model = densenet121(pretrained=True)
print(model)
```

- torchvision에는 `densenet121`, `densenet161`, `densenet169`, `densenet201`이 포함되어 있음
- Feature extractor 또는 fine-tuning용으로 널리 활용됨

---

## 📚 참고 자료

- 논문: [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)
- PyTorch Docs: https://pytorch.org/vision/stable/models/generated/torchvision.models.densenet121.html


In [7]:
# 데이터 다운로드

!wget https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip

--2025-07-16 11:02:23--  https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 64.233.181.207, 142.250.125.207, 209.85.200.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|64.233.181.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68606236 (65M) [application/zip]
Saving to: ‘cats_and_dogs_filtered.zip’


2025-07-16 11:02:23 (217 MB/s) - ‘cats_and_dogs_filtered.zip’ saved [68606236/68606236]



In [8]:
import os
import shutil

if os.path.exists('/content/cats_and_dogs_filtered/'):    # 작업 디렉토리는 cats_and_dogs_filtered

    shutil.rmtree('/content/cats_and_dogs_filtered/')
    print('/content/cats_and_dogs_filtered/  is removed !!!')

In [9]:
# 압축파일 풀기

import zipfile

with zipfile.ZipFile('/content/cats_and_dogs_filtered.zip', 'r') as target_file:

    target_file.extractall('/content/')

In [10]:
import os

# train data 개수

train_cats_list = os.listdir('/content/cats_and_dogs_filtered/train/cats/')

train_dogs_list = os.listdir('/content/cats_and_dogs_filtered/train/dogs/')

# validation data 개수

test_cats_list = os.listdir('/content/cats_and_dogs_filtered/validation/cats/')

test_dogs_list = os.listdir('/content/cats_and_dogs_filtered/validation/dogs/')

print(len(train_cats_list), len(train_dogs_list))

print(len(test_cats_list), len(test_dogs_list))

1000 1000
500 500


In [1]:
import os
import time
import copy
import numpy as np
import matplotlib.pyplot as plt
import torch

import torch.nn as nn
import torch.optim as optim

from torchvision import datasets, models, transforms


In [15]:
ddir = '/content/cats_and_dogs_filtered'

batch_size = 4
num_workers = 2

data_transformers = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.490, 0.449, 0.411], [0.231, 0.221, 0.230])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.490, 0.449, 0.411], [0.231, 0.221, 0.230])
    ])
}

img_data = {
    'train': datasets.ImageFolder(
        os.path.join(ddir, 'train'),
        data_transformers['train']
    ),
    'val': datasets.ImageFolder(
        os.path.join(ddir, 'validation'),
        data_transformers['val']
    )
}

dloaders = {
    k: torch.utils.data.DataLoader(
        img_data[k], batch_size=batch_size, shuffle=True, num_workers=num_workers
    )
    for k in ['train', 'val']
}
dset_sizes = {x: len(img_data[x]) for x in ['train', 'val']}
classes = img_data['train'].classes
dvc = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


In [2]:
import torch
import torch.nn as nn

class Bottleneck(nn.Module):
    def __init__(self, in_channels, growth_rate):
        super().__init__()

        inner_channel = 4 * growth_rate

        self.bottle_neck = nn.Sequential(
            nn.BatchNorm2d(in_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels, inner_channel, kernel_size=1, bias=False),
            nn.BatchNorm2d(inner_channel),
            nn.ReLU(inplace=True),
            nn.Conv2d(inner_channel, growth_rate, kernel_size=3, padding=1, bias=False)
        )

    def forward(self, x):
        return torch.cat([x, self.bottle_neck(x)], 1)


class Transition(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()

        self.down_sample = nn.Sequential(
            nn.BatchNorm2d(in_channels),
            nn.Conv2d(in_channels, out_channels, 1, bias=False),
            nn.AvgPool2d(2, stride=2)
        )

    def forward(self, x):
        return self.down_sample(x)


class DenseNet(nn.Module):
    def __init__(self, block, nblocks, growth_rate=12, reduction=0.5, num_class=100):
        super().__init__()
        self.growth_rate = growth_rate

        inner_channels = 2 * growth_rate

        self.conv1 = nn.Conv2d(3, inner_channels, kernel_size=3, padding=1, bias=False)

        self.features = nn.Sequential()

        for index in range(len(nblocks) - 1):
            self.features.add_module("dense_block_layer_{}".format(index), self._make_dense_layers(block, inner_channels, nblocks[index]))
            inner_channels += growth_rate * nblocks[index]

            out_channels = int(reduction * inner_channels) # int() will automatic floor the value
            self.features.add_module("transition_layer_{}".format(index), Transition(inner_channels, out_channels))
            inner_channels = out_channels

        self.features.add_module("dense_block{}".format(len(nblocks) - 1), self._make_dense_layers(block, inner_channels, nblocks[len(nblocks)-1]))
        inner_channels += growth_rate * nblocks[len(nblocks) - 1]
        self.features.add_module('bn', nn.BatchNorm2d(inner_channels))
        self.features.add_module('relu', nn.ReLU(inplace=True))

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

        self.linear = nn.Linear(inner_channels, num_class)


    def forward(self, x):
        output = self.conv1(x)
        output = self.features(output)
        output = self.avgpool(output)
        output = output.view(output.size()[0], -1)
        output = self.linear(output)
        return output

    def _make_dense_layers(self, block, in_channels, nblocks):
        dense_block = nn.Sequential()
        for index in range(nblocks):
            dense_block.add_module('bottle_neck_layer_{}'.format(index), block(in_channels, self.growth_rate))
            in_channels += self.growth_rate
        return dense_block


def densenet121(num_class=100):
    return DenseNet(Bottleneck, [6,12,24,16], growth_rate=32, num_class=num_class)


In [3]:
def train(model, loss_func, optimizer, epochs=10):
    start = time.time()

    accuracy = 0.0

    for e in range(epochs):
        print(f'Epoch number {e}/{epochs - 1}')
        print('=' * 20)

        # for each epoch we run through the training and validation set
        for dset in ['train', 'val']:
            if dset == 'train':
                model.train()  # set model to train mode (i.e. trainbale weights)
            else:
                model.eval()   # set model to validation mode

            loss = 0.0
            successes = 0

            # iterate over the (training/validation) data.
            for imgs, tgts in dloaders[dset]:
                imgs = imgs.to(dvc)
                tgts = tgts.to(dvc)
                optimizer.zero_grad()

                with torch.set_grad_enabled(dset == 'train'):
                    ops = model(imgs)
                    _, preds = torch.max(ops, 1)
                    loss_curr = loss_func(ops, tgts)
                    # backward pass only if in training mode
                    if dset == 'train':
                        loss_curr.backward()
                        optimizer.step()

                loss += loss_curr.item() * imgs.size(0)
                successes += torch.sum(preds == tgts.data)

            loss_epoch = loss / dset_sizes[dset]
            accuracy_epoch = successes.double() / dset_sizes[dset]

            print(f'{dset} loss in this epoch: {loss_epoch}, accuracy in this epoch: {accuracy_epoch}')
            if dset == 'val' and accuracy_epoch > accuracy:
                accuracy = accuracy_epoch

        print()

    time_delta = time.time() - start
    print(f'Training finished in {time_delta // 60}mins {time_delta % 60}secs')
    print(f'Best validation set accuracy: {accuracy}')


    return model



In [4]:
model = densenet121(2)
if torch.cuda.is_available() :
  model = model.cuda()
print(model)

DenseNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (features): Sequential(
    (dense_block_layer_0): Sequential(
      (bottle_neck_layer_0): Bottleneck(
        (bottle_neck): Sequential(
          (0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (4): ReLU(inplace=True)
          (5): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
      )
      (bottle_neck_layer_1): Bottleneck(
        (bottle_neck): Sequential(
          (0): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (1): ReLU(inplace=True)
          (2): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (3): BatchNorm2d(12

In [16]:
loss_func = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
pretrained_model = train(model, loss_func, optimizer, epochs=5)

Epoch number 0/4


KeyboardInterrupt: 