Nueral Networks 구성요소 : layers/modules

PyTorch의 모든 모듈은 nn.Module의 하위 클래스

In [3]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

There was a problem when trying to write in your cache folder (/home/huggingface_cache/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.


In [4]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using mps device


In [5]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [6]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


model.forward() 직접 호출 금지!

In [7]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([9], device='mps:0')


### Model Layers

3 images of size 28x28

In [8]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


**nn.Flatten**

each 2D 28x28 image -> 784 픽셀값을 갖는 연속된 배열

In [9]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


**nn.Linear**

저장된 weight와 bias 들을 사용해 input에 선형 변환을 적용하는 모듈

In [10]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


**nn.ReLU**

비선형 활성화 함수는 모델의 입력 & 출력 사이의 복잡한 매핑을 생성하는 역할.

In [11]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.1934, -0.0227, -0.2303,  0.0259,  0.3355,  0.5507,  0.0613, -0.1797,
         -0.2747,  0.5941, -0.2115, -0.3339,  0.0726, -0.1416, -0.0697,  0.0253,
          0.0136,  0.6804,  0.3188,  0.0365],
        [ 0.0734,  0.1717, -0.3628, -0.0433,  0.6415,  0.5833, -0.0880, -0.3692,
         -0.3231,  0.5183, -0.0877, -0.1987,  0.0339, -0.1200,  0.0256,  0.1072,
          0.0956,  0.7044,  0.2107,  0.0867],
        [ 0.2493, -0.2407, -0.3770,  0.0924,  0.5932,  0.2919, -0.1233, -0.4126,
         -0.1954,  0.5052, -0.2531, -0.3232,  0.3336,  0.0729,  0.2021, -0.0834,
         -0.4024,  0.5506,  0.4008, -0.0699]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.0000, 0.0000, 0.0259, 0.3355, 0.5507, 0.0613, 0.0000, 0.0000,
         0.5941, 0.0000, 0.0000, 0.0726, 0.0000, 0.0000, 0.0253, 0.0136, 0.6804,
         0.3188, 0.0365],
        [0.0734, 0.1717, 0.0000, 0.0000, 0.6415, 0.5833, 0.0000, 0.0000, 0.0000,
         0.5183, 0.0000, 0.0000, 0.0339, 0.0000, 0.02

**nn.Sequential**

모듈의 순서가 있는 컨테이너

In [12]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

**nn.Softmax**

신경망의 마지막 선형 계층은 logits -(-inf, inf) 사이의 원시값- 를 반환. 이를 0과 1 사이의 predicted probability 로 조정된다.

In [13]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

### 모델 파라미터

In [14]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0188,  0.0313, -0.0031,  ...,  0.0246, -0.0289,  0.0213],
        [-0.0220, -0.0014,  0.0245,  ...,  0.0254, -0.0042,  0.0215]],
       device='mps:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0328, -0.0076], device='mps:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0377,  0.0433,  0.0086,  ..., -0.0174, -0.0408, -0.0171],
        [ 0.0210,  0.0379, -0.0337,  ...,  0.0017,  0.0427, -0.0293]],
       device='mps:0', grad_fn=<Slice