<a href="https://colab.research.google.com/github/KwonHo-geun/AI_Study/blob/main/25.07.16_EfficientNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⚡ EfficientNet: 모델 크기, 성능, 효율의 완벽한 균형

---

## ✅ EfficientNet이란?

**EfficientNet**은 Google Brain이 2019년 ICML에서 발표한 모델로,  
"적은 연산량으로 높은 정확도를 달성"하기 위해 설계된 **효율성 중심의 CNN 아키텍처**입니다.

> 핵심 슬로건은  
> “모델을 단순히 키우는 것이 아니라 **균형 있게(scale efficiently)** 키우자!”

---

## 🧠 배경: 왜 EfficientNet이 필요한가?

### 📉 기존 모델의 확장 한계

CNN 구조를 성능 향상을 위해 키우는 방법은 3가지입니다:

1. **Depth** (깊이 늘리기)
2. **Width** (채널 수 늘리기)
3. **Resolution** (입력 이미지 크기 늘리기)

기존 방식은 이 세 가지 중 하나만 늘리거나 감에 따라 **비효율적인 확장**이 많았음

---

## 📌 핵심 아이디어: Compound Scaling

EfficientNet은 다음 원리를 따릅니다:

> **“Depth, Width, Resolution을 동시에, 일정 비율로 확장하면 효율과 정확도 모두 향상된다!”**

### ✅ 수식적 표현:

```math
depth: d = α^φ  
width: w = β^φ  
resolution: r = γ^φ
```

- φ: 사용자가 선택한 scaling coefficient
- α, β, γ: 각각 depth/width/resolution의 증가율
- 조건: α × β² × γ² ≈ 2 (연산량 2배 제약)

> → φ가 커질수록 모델이 커지지만, 균형 있게 성장함
> 수작업으로 진행하여 균형있게 확장하는 것이 MobileNet과 다름


## 🧱 EfficientNet의 기본 구조: MBConv + SE + Swish

### ✅ 1. MBConv (Mobile Inverted Bottleneck)
- MobileNetV2에서 사용된 inverted residual block 기반
- Depthwise Separable Conv 사용
- inverte residual Block + BottleNeck

### ✅ 2. Squeeze-and-Excitation(SE) block
- 채널 간 중요도 학습 (attention)
- channel-wise context 정보를 반영

### ✅ 3. Swish Activation
- `x * sigmoid(x)` 형태의 smooth한 활성화 함수
- ReLU보다 높은 표현력

## 🧬 EfficientNet 구조 요약

- Stage 1: Stem (Conv3x3, stride 2)
- Stage 2~7: 여러 개의 MBConv blocks + SE
- Stage 8: Head (1x1 Conv)
- Stage 9: Pooling + Fully Connected

> → **EfficientNet-B0**은 이 구조를 기반으로 하고, 나머지 B1~B7은 compound scaling 적용

## 🔢 EfficientNet 모델 종류 (B0~B7)
##**실무에선 EfficientNetB0~B3까지만함. 리소스때문에**
| 모델 | Params (M) | FLOPs (B) | Input size | Top-1 Accuracy |
|------|-------------|-----------|-------------|-----------------|
| B0   | 5.3         | 0.39      | 224×224     | 77.1%           |
| B1   | 7.8         | 0.7       | 240×240     | 79.1%           |
| B2   | 9.2         | 1.0       | 260×260     | 80.1%           |
| B3   | 12.0        | 1.8       | 300×300     | 81.6%           |
| B4   | 19.0        | 4.2       | 380×380     | 82.9%           |
| B5   | 30.0        | 9.9       | 456×456     | 83.6%           |
| B6   | 43.0        | 19.0      | 528×528     | 84.0%           |
| B7   | 66.0        | 39.0      | 600×600     | 84.4%           |

---

## 📉 EfficientNet vs 기존 모델

| 모델         | 파라미터 수 | Top-1 Accuracy | 특징 |
|--------------|-------------|----------------|------|
| ResNet-50    | 25M         | 76%            | 많이 사용됨 |
| EfficientNet-B0 | 5.3M      | 77.1%          | 더 작고 정확도 높음 |
| EfficientNet-B7 | 66M      | 84.4%          | SOTA 수준 정확도 |

> EfficientNet은 **적은 연산량으로 동일하거나 더 높은 정확도**를 달성함


## 🧪 PyTorch 사용 예제

```python
import timm

model = timm.create_model('efficientnet_b0', pretrained=True)
print(model)
```

- `efficientnet_b0 ~ b7` 모델 선택 가능
- `timm` 라이브러리 설치 필요: `pip install timm`

---

## 📚 참고 자료

- 논문: [EfficientNet: Rethinking Model Scaling for CNNs](https://arxiv.org/abs/1905.11946)
- Google AI Blog: https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html
- PyTorch TIMM repo: https://github.com/huggingface/pytorch-image-models

---

In [19]:
import torch, torchvision
import torchvision.models as models
import torchvision.datasets as datasets

import matplotlib.pyplot as plt
from PIL import Image

In [20]:
models.efficientnet_b0()

EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [21]:
### Model
efficientnet_b0 = models.efficientnet_b0(pretrained=True)

## Dataset
to_tensor = torchvision.transforms.Compose(
                [torchvision.transforms.ToTensor(),
               torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])]
                                          )

cifar10 = torchvision.datasets.CIFAR10(root='./', download=True, transform=to_tensor)

dataloader = torch.utils.data.DataLoader(cifar10, batch_size=8, shuffle=True, num_workers=2)

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 127MB/s] 
100%|██████████| 170M/170M [00:01<00:00, 98.7MB/s]


In [22]:
for idx, data in enumerate(dataloader):

    img, gt = data

    print(img.shape)

    scores = efficientnet_b0(img)

    print(scores.shape)
    break

torch.Size([8, 3, 32, 32])
torch.Size([8, 1000])


In [23]:
for k,v in efficientnet_b0.named_parameters():
    print(k,v.shape)

features.0.0.weight torch.Size([32, 3, 3, 3])
features.0.1.weight torch.Size([32])
features.0.1.bias torch.Size([32])
features.1.0.block.0.0.weight torch.Size([32, 1, 3, 3])
features.1.0.block.0.1.weight torch.Size([32])
features.1.0.block.0.1.bias torch.Size([32])
features.1.0.block.1.fc1.weight torch.Size([8, 32, 1, 1])
features.1.0.block.1.fc1.bias torch.Size([8])
features.1.0.block.1.fc2.weight torch.Size([32, 8, 1, 1])
features.1.0.block.1.fc2.bias torch.Size([32])
features.1.0.block.2.0.weight torch.Size([16, 32, 1, 1])
features.1.0.block.2.1.weight torch.Size([16])
features.1.0.block.2.1.bias torch.Size([16])
features.2.0.block.0.0.weight torch.Size([96, 16, 1, 1])
features.2.0.block.0.1.weight torch.Size([96])
features.2.0.block.0.1.bias torch.Size([96])
features.2.0.block.1.0.weight torch.Size([96, 1, 3, 3])
features.2.0.block.1.1.weight torch.Size([96])
features.2.0.block.1.1.bias torch.Size([96])
features.2.0.block.2.fc1.weight torch.Size([4, 96, 1, 1])
features.2.0.block.2.

In [24]:
from prettytable import PrettyTable

def count_parameters(model):
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    for name, parameter in model.named_parameters():
        if not parameter.requires_grad: continue
        params = parameter.numel()
        table.add_row([name, params])
        total_params+=params
    print(table)
    print(f"Total Trainable Params: {total_params}")
    return total_params




In [25]:
count_parameters(efficientnet_b0)

+---------------------------------+------------+
|             Modules             | Parameters |
+---------------------------------+------------+
|       features.0.0.weight       |    864     |
|       features.0.1.weight       |     32     |
|        features.0.1.bias        |     32     |
|  features.1.0.block.0.0.weight  |    288     |
|  features.1.0.block.0.1.weight  |     32     |
|   features.1.0.block.0.1.bias   |     32     |
| features.1.0.block.1.fc1.weight |    256     |
|  features.1.0.block.1.fc1.bias  |     8      |
| features.1.0.block.1.fc2.weight |    256     |
|  features.1.0.block.1.fc2.bias  |     32     |
|  features.1.0.block.2.0.weight  |    512     |
|  features.1.0.block.2.1.weight  |     16     |
|   features.1.0.block.2.1.bias   |     16     |
|  features.2.0.block.0.0.weight  |    1536    |
|  features.2.0.block.0.1.weight  |     96     |
|   features.2.0.block.0.1.bias   |     96     |
|  features.2.0.block.1.0.weight  |    864     |
|  features.2.0.bloc

5288548

In [26]:
def count_parameters2(model):
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    for name, parameter in model.named_parameters():
        if not parameter.requires_grad: continue
        params = parameter.numel()
        table.add_row([name, params])
        total_params+=params
#     print(table)
    print(f"Total Trainable Params: {total_params}")
    return total_params

In [27]:
count_parameters2(models.efficientnet_b0())
count_parameters2(models.efficientnet_b1())
count_parameters2(models.efficientnet_b2())
count_parameters2(models.efficientnet_b3())
count_parameters2(models.efficientnet_b4())
count_parameters2(models.efficientnet_b5())
count_parameters2(models.efficientnet_b6())
count_parameters2(models.efficientnet_b7())

Total Trainable Params: 5288548
Total Trainable Params: 7794184
Total Trainable Params: 9109994
Total Trainable Params: 12233232
Total Trainable Params: 19341616
Total Trainable Params: 30389784
Total Trainable Params: 43040704
Total Trainable Params: 66347960


66347960