# Pure PyTorch와 PyTorch Lightning 를 사용했을 때의 모델링 방법 비교하기 (w/MNIST 데이터셋)

**PyTorch와 PyTorch Lightning 실습: PyTorch와 PyTorch Lightning 코드를 구현하고 비교**

**실습 개요**

* PyTorch와 PyTorch Lightning으로 MNIST 문제를 해결하는 코드를 구현하고 비교

**MNIST 데이터셋(http://yann.lecun.com/exdb/mnist/)**

- MNIST 데이터셋
- 이미지의 숫자를 예측하는 문제
- 입력 : 숫자가 적힌 28*28 사이즈의 흑백 이미지
- 출력 : 0부터 9까지의 숫자
- 학습 데이터 : 55,000개
- 검증 데이터 : 5,000개
- 평가 데이터 : 10,000개
- License : GNU General Public License v3.0

**모델**

* Multi-Layer Perceptron 구조의 간단한 3 Layer 모델 

**Loss 함수**
* negative log likelihood loss

**평가**

* 예측 데이터를 0부터 9사이의 정수로 변환한 뒤 정답 데이터와 비교하여 정확도를 측정

# PyTorch로 MNIST Task 구현


In [1]:
# 필요한 모듈 import 
import os

import torch
import torch.nn as nn
from torch.nn import functional as F

from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, random_split

from torchvision.datasets import MNIST
from torchvision import datasets, transforms

## Data Preparation

### Download Data

In [2]:
# MNIST 데이터 다운로드하고 압축 해제
!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz

--2023-04-10 14:17:20--  http://www.di.ens.fr/~lelarge/MNIST.tar.gz
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/~lelarge/MNIST.tar.gz [following]
--2023-04-10 14:17:22--  https://www.di.ens.fr/~lelarge/MNIST.tar.gz
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘MNIST.tar.gz’

MNIST.tar.gz            [               <=>  ]  33.20M  5.02MB/s    in 21s     

2023-04-10 14:17:43 (1.61 MB/s) - ‘MNIST.tar.gz’ saved [34813078]

MNIST/
MNIST/raw/
MNIST/raw/train-labels-idx1-ubyte
MNIST/raw/t10k-labels-idx1-ubyte.gz
MNIST/raw/t10k-labels-idx1-ubyte
MNIST/raw/t10k-images-idx3-ubyte.gz
MNIST/raw/train-images-idx3-ubyte
MNIST/raw/train-labels-idx1-ubyte.gz
MNIST/raw/t10k-images-idx3-ubyte
MNIST/raw/tra

In [3]:
# 이미지 변환기
# 참조: https://github.com/pytorch/examples/blob/main/mnist/main.py
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.1307,), (0.3081,))])

# MNIST 데이터 다운
mnist_train = MNIST(os.getcwd(), train=True,  download=True, transform=transform)
mnist_test  = MNIST(os.getcwd(), train=False, download=True, transform=transform)

# Number of Train Datasets : 55000
# Number of Validation Datasets : 5000
mnist_train, mnist_val = random_split(mnist_train, [55000, 5000])

### Init DataLoader

In [4]:
mnist_train_dataloader = DataLoader(mnist_train, batch_size=64, shuffle=True) # <-- shuffle is important!

In [5]:
mnist_val_dataloader   = DataLoader(mnist_val, batch_size=64)

In [6]:
mnist_test_dataloader  = DataLoader(mnist_test, batch_size=64)

## Model Implementation

In [10]:
class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.layer_1 = nn.Linear(28 * 28, 128)
    self.layer_2 = nn.Linear(128, 256)
    self.layer_3 = nn.Linear(256, 10)

  def forward(self, x):
      batch_size, channelds, width, height = x.size()

      # (batch_size, 1, 28, 28) -> (batch_size, 1*28*28)
      x = x.view(batch_size, -1)

      # layer 1
      x = self.layer_1(x)
      x = torch.relu(x)

      # layer 2
      x = self.layer_2(x)
      x = torch.relu(x)

      # layer 3
      x = self.layer_3(x)

      # probability distribution over labels
      x = torch.log_softmax(x, dim=1)

      return x

In [11]:
net = Net()
net

Net(
  (layer_1): Linear(in_features=784, out_features=128, bias=True)
  (layer_2): Linear(in_features=128, out_features=256, bias=True)
  (layer_3): Linear(in_features=256, out_features=10, bias=True)
)

### Setting Device

In [12]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [13]:
device

device(type='cuda')

In [14]:
net = net.to(device)

## Updater Implementation 
Setting Optimizer and Scheduler

<img src="https://lh4.googleusercontent.com/VjIXrwV3x6XTUGOBkRLjb1Hqqs97_u9EUjmHqkAIAPqBZtG2DoFTpgW9l8zG9XRxJpu_lCLlHJJqEsOHKk6ZG1o44CRtSiM89hqDEVan38UqW_DGPNvuZTtb--t0iIJ79HMBEs3j=s0" width="70%" height="70%"/>

출처: https://neptune.ai/blog/how-to-choose-a-learning-rate-scheduler

In [15]:
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
scheduler = StepLR(optimizer, step_size=1)

## Iterative Learning

### Train & Validation

In [16]:
for epoch in range(1, 3): 
  # Train mode -----------------------------------------------------------------
  net.train()

  for batch_idx, (data, target) in enumerate(mnist_train_dataloader):
    data, target = data.to(device), target.to(device)

    optimizer.zero_grad() # <- pytorch specific operation
    
    output = net(data)
    
    ## Loss calculation
    loss = F.nll_loss(output, target)
    loss.backward()
    
    optimizer.step()      # <- parameter update 수행

    if batch_idx % 100 == 0:
      print("Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}".format(
          epoch, batch_idx * len(data), len(mnist_train_dataloader.dataset),
          100. * batch_idx / len(mnist_train_dataloader), loss.item()
      ))
  # ----------------------------------------------------------------------------

  # Validation mode ------------------------------------------------------------
  net.eval()
  val_loss = 0
  correct = 0
  with torch.no_grad():
    for data, target in  mnist_val_dataloader:
      data, target = data.to(device), target.to(device)

      output = net(data)

      ## Loss calculation
      val_loss = F.nll_loss(output, target, reduction='sum').item()
      pred = output.argmax(dim=1, keepdim=True)
      correct += pred.eq(target.view_as(pred)).sum().item()

  val_loss /= len(mnist_val_dataloader.dataset)

  print("\n[Validation] Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format(
      val_loss, correct, len(mnist_val_dataloader.dataset),
      100. * correct / len(mnist_val_dataloader.dataset)
  ))
  # ----------------------------------------------------------------------------



[Validation] Average loss: 0.0001, Accuracy: 4800/5000 (96%)


[Validation] Average loss: 0.0000, Accuracy: 4852/5000 (97%)



### Test

In [17]:
net.eval()
correct = 0
with torch.no_grad():
  for data, target in  mnist_test_dataloader:
    # Test mode ------------------------------------------------------------------
    data, target = data.to(device), target.to(device)

    output = net(data)

    pred = output.argmax(dim=1, keepdim=True)
    correct += pred.eq(target.view_as(pred)).sum().item()
    # ----------------------------------------------------------------------------

print("\n[Test] Accuracy: {}/{} ({:.0f}%)\n".format(
    correct, len(mnist_test_dataloader.dataset),
    100. * correct / len(mnist_test_dataloader.dataset)
))


[Test] Accuracy: 9711/10000 (97%)



## Pytorch 만 썼을 때의 문제점
- 모델을 학습하고 평가하는 반복 학습을 할때마다 dataloader를 매번 호출해야하며 
- 모델과 데이터, 옵티마이저를 일일히 불러와서 코드가 중복이 되는 불편함이 있다.
- 모델, 데이터, 학습 및 평가가 구조적으로 정리되지 않아 가독성이 떨어진다.


# PyTorch Lightning으로 MNIST Task 구현

### PyTorch Lightning 설치와 모듈 import

In [18]:
!pip install pytorch-lightning

Collecting pytorch-lightning
  Downloading pytorch_lightning-2.0.1-py3-none-any.whl (716 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m716.4/716.4 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting torchmetrics>=0.7.0
  Downloading torchmetrics-0.11.4-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.2/519.2 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting lightning-utilities>=0.7.0
  Downloading lightning_utilities-0.8.0-py3-none-any.whl (20 kB)
Installing collected packages: lightning-utilities, torchmetrics, pytorch-lightning
Successfully installed lightning-utilities-0.8.0 pytorch-lightning-2.0.1 torchmetrics-0.11.4


In [19]:
import os

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
import pytorch_lightning as pl



## Data Preparation

In [21]:
#DataModule
# 데이터를 다운로드, 메모리 저장 
# -> PyTorch Dataset으로 변환, 데이터 전처리 (특히, transforms) 
# -> dataloader 형태로 학습/평가 분할
# 위 과정을 처리해주고 재사용 가능한 클래스
class MNSTDataModule(pl.LightningDataModule):
    def __init__(self,
                 batch_size: int = 32,
                 ):
        super().__init__()
        self.batch_size = batch_size
        self.transform = transforms.Compose([transforms.ToTensor(),
                                             transforms.Normalize((0.1307,),(0.3081))])

    def prepare_data(self):
        # download
        MNIST(os.getcwd(), train=True, download=True)
        MNIST(os.getcwd(), train=False, download=True)
    
    def setup(self, stage = None):
        # Assign train/val datasets for use in dataloaders
        if stage == "fit" or stage is None:
            mnist_full = MNIST(os.getcwd(), train=True, transform=self.transform)
            self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000])

        # Assign test dataset for use in dataloader(s)
        if stage == "test" or stage is None:
            self.mnist_test = MNIST(os.getcwd(), train=False, transform=self.transform)

    def train_dataloader(self):
        return DataLoader(self.mnist_train, batch_size=self.batch_size, shuffle=True)

    def val_dataloader(self):
        return DataLoader(self.mnist_val, batch_size=self.batch_size)

    def test_dataloader(self):
        return DataLoader(self.mnist_test, batch_size=self.batch_size)


## Model & Updater Implementation with Loss

In [22]:
class PLNet(pl.LightningModule):
    # Model Implementation -------------------------------------------------------
    def __init__(self):
        super(PLNet, self).__init__()

        self.layer_1 = nn.Linear(28 * 28, 128)
        self.layer_2 = nn.Linear(128, 256)
        self.layer_3 = nn.Linear(256, 10)

        self.validation_step_outputs = []
        self.test_step_outputs = []

    def forward(self, x):
        batch_size, channelds, width, height = x.size()

        # (batch_size, 1, 28, 28) -> (batch_size, 1*28*28)
        x = x.view(batch_size, -1)

        # layer 1
        x = self.layer_1(x)
        x = torch.relu(x)

        # layer 2
        x = self.layer_2(x)
        x = torch.relu(x)

        # layer 3
        x = self.layer_3(x)

        # probability distribution over labels
        x = torch.log_softmax(x, dim=1)

        return x
    # ----------------------------------------------------------------------------

    # Updater Implementation -----------------------------------------------------
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        scheduler = StepLR(optimizer, step_size=1)
        return [optimizer], [scheduler]
    # ----------------------------------------------------------------------------

    # training Step --------------------------------------------------------------
    def training_step(self, batch, batch_idx):
        data, target = batch

        output = self(data)
        
        ## loss calculation
        loss = F.nll_loss(output, target)
        return loss
    # ----------------------------------------------------------------------------

    # Validation Step to Epoch ---------------------------------------------------
    def validation_step(self, batch, batch_idx):
        data, target = batch

        output = self(data)

        ## loss calculation
        loss = F.nll_loss(output, target)
        pred = output.argmax(dim=1, keepdim=True)
        correct = pred.eq(target.view_as(pred)).sum().item()
        preds = {"val_loss" : loss, "correct" : correct}
        self.validation_step_outputs.append(preds)
        return preds

    def on_validation_epoch_end(self):
        avg_loss = torch.stack([x['val_loss'] for x in self.validation_step_outputs]).mean()
        self.log('val_loss', avg_loss)
        self.log('avg_val_loss', avg_loss)
        self.validation_step_outputs.clear()
    # ----------------------------------------------------------------------------
    
    # Test Step to Epoch ---------------------------------------------------------
    def test_step(self, batch, batch_idx):
        data, target = batch

        output = self(data)
        pred = output.argmax(dim=1, keepdim=True)
        correct = pred.eq(target.view_as(pred)).sum().item()/ len(target)
        preds = {"correct": correct}
        self.test_step_outputs.append(preds)
        return preds

    def on_test_epoch_end(self):
        outputs = self.test_step_outputs
        all_correct = sum([output["correct"] for output in outputs])
        accuracy = all_correct / len(outputs)

        self.log("accuracy", accuracy)
        self.test_step_outputs.clear()
    # ----------------------------------------------------------------------------

## Iterative Learning

In [23]:
# Data Preparation
dm = MNSTDataModule()

pl_net = PLNet()

# Train & Validation
trainer = pl.Trainer(max_epochs = 3)
trainer.fit(pl_net, datamodule=dm)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 3060 Ti') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Missing logger folder: /home/kingstar/workspace/nlp_competition/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name    | Type   | Params
-----------------------------------
0 | layer_1 | Linear | 100 K 
1 | layer_2 | Linear | 33.0 K
2 | layer_3 | Linear | 2.6 K 
-----------------------------------
136 K     Trainable params
0         Non-trainable params
136 K     Total params
0.544     Total estimated model params size (MB)
2023-04-10 14:22:10.669365: I tensorf

Sanity Checking: 0it [00:00, ?it/s]

  rank_zero_warn(
  rank_zero_warn(


Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.


## Pytorch Lightning 을 사용할 때의 장점

PyTorch Lightning 구조는 기존의 PyTorch 학습을 간단한 한줄에 묶을 수 있고, 

중복되는 Deep Learning Block을 Module들로 묶어서 모듈의 가독성과 재활용성을 높일 수 있다.



## 평가하기

In [24]:
# Test
trainer.test(datamodule=dm)

  rank_zero_warn(
You are using a CUDA device ('NVIDIA GeForce RTX 3060 Ti') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Restoring states from the checkpoint path at /home/kingstar/workspace/nlp_competition/lightning_logs/version_0/checkpoints/epoch=2-step=5157.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Loaded model weights from the checkpoint at /home/kingstar/workspace/nlp_competition/lightning_logs/version_0/checkpoints/epoch=2-step=5157.ckpt
  rank_zero_warn(


Testing: 0it [00:00, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        accuracy            0.9754393100738525
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'accuracy': 0.9754393100738525}]

In [36]:
data, target = mnist_test[9999]
data = torch.unsqueeze(data, 0)

output = pl_net(data)
pred = output.argmax(dim=1, keepdim=True)
print('예측값: ',pred.item(),' 정답:', target)

예측값:  6  정답: 6


In [37]:
data

tensor([[[[-0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.4242, -0.3224,  1.0650,  2.8088,  2.3760,
            0.7086, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
           -0.4242, -0.4242, -0.424

In [38]:
output

tensor([[-1.3144e+01, -1.7093e+01, -1.2600e+01, -1.5234e+01, -8.8065e+00,
         -1.1260e+01, -1.6902e-04, -1.9338e+01, -1.4085e+01, -1.6090e+01]],
       grad_fn=<LogSoftmaxBackward0>)

In [42]:
pred.item() # pred = tensor([[6]])

6

###**콘텐츠 라이선스**

<font color='red'><b>**WARNING**</b></font> : **본 교육 콘텐츠의 지식재산권은 재단법인 네이버커넥트에 귀속됩니다. 본 콘텐츠를 어떠한 경로로든 외부로 유출 및 수정하는 행위를 엄격히 금합니다.** 다만, 비영리적 교육 및 연구활동에 한정되어 사용할 수 있으나 재단의 허락을 받아야 합니다. 이를 위반하는 경우, 관련 법률에 따라 책임을 질 수 있습니다.