# CH8. 성능 최적화

- 실습해볼 task : Sentence Positive/Negative Classificationn using CNN(CNN 기반 문장의 긍부정 분류)
- 실습 순서
    - [참고자료](https://github.com/graykode/nlp-tutorial/blob/master/2-1.TextCNN/TextCNN.py)

## 0. 필요한 라이브러리 & 모듈

In [1]:
import pprint

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary

from models import text_cnn
from train import train_model
from test import test_model
from train import train_model_with_early_stop

## 1. CH9장 내용 기반 데이터 전처리

- 데이터(일종의 corpus) load

In [2]:
corpus = [
            "i love you", 
            "he loves me", 
            "she likes baseball", 
            "i hate you", 
            "sorry for that", 
            "this is awful"
]
labels = [1, 1, 1, 0, 0, 0] 

- Tokenization 진행
    - 문장을 단어 단위로 분리

In [3]:
words = " ".join(corpus).split()
words = list(set(words))
words


['this',
 'awful',
 'i',
 'baseball',
 'love',
 'hate',
 'likes',
 'loves',
 'you',
 'sorry',
 'he',
 'for',
 'that',
 'she',
 'is',
 'me']

- 토큰에 정수 매핑하는 딕셔너리 만들기
    - 단어 데이터를 모델이 이해할 수 있는 정수로 이루어진 벡터로 바꾸기 위해

In [4]:
word_dict = {w: i for i, w in enumerate(words)}
word_dict

{'this': 0,
 'awful': 1,
 'i': 2,
 'baseball': 3,
 'love': 4,
 'hate': 5,
 'likes': 6,
 'loves': 7,
 'you': 8,
 'sorry': 9,
 'he': 10,
 'for': 11,
 'that': 12,
 'she': 13,
 'is': 14,
 'me': 15}

- input 데이터 텐서화
    - 문장 -> 벡터

In [12]:
sentence_arrays = [np.asarray([word_dict[n] for n in sentence.split()]) for sentence in corpus]
inputs = torch.LongTensor(sentence_arrays)
label_array = np.asarray(labels)
targets = torch.LongTensor(label_array)
print(f"input 데이터 shape: {inputs.size()}")
print(f"target 데이터 shape: {targets.size()}")

input 데이터 shape: torch.Size([6, 3])
target 데이터 shape: torch.Size([6])


## 2. 성능 최적화 이용 학습
- Batch Normalization
- Drop Out
- Early Stopping

- train & test vanilla model

In [6]:
num_filters = 3 
filter_sizes = [2, 2, 2] 
vocab_size = len(word_dict)
embedding_size = 2 
sequence_length = 3 
num_classes = 2 
model = text_cnn.TextCNN(
    num_filters, filter_sizes, vocab_size,
    embedding_size, sequence_length, num_classes
)
model 

TextCNN(
  (W): Embedding(16, 2)
  (Weight): Linear(in_features=9, out_features=2, bias=False)
  (filter_list): ModuleList(
    (0-2): 3 x Conv2d(1, 3, kernel_size=(2, 2), stride=(1, 1))
  )
)

In [9]:

train_model(model,inputs,targets,100)


Epoch: 0001 cost = 0.762808
Epoch: 0002 cost = 0.759640
Epoch: 0003 cost = 0.756512
Epoch: 0004 cost = 0.753426
Epoch: 0005 cost = 0.750380
Epoch: 0006 cost = 0.747375
Epoch: 0007 cost = 0.744411
Epoch: 0008 cost = 0.741552
Epoch: 0009 cost = 0.738744
Epoch: 0010 cost = 0.735965
Epoch: 0011 cost = 0.733219
Epoch: 0012 cost = 0.730507
Epoch: 0013 cost = 0.727827
Epoch: 0014 cost = 0.725162
Epoch: 0015 cost = 0.722527
Epoch: 0016 cost = 0.719922
Epoch: 0017 cost = 0.717345
Epoch: 0018 cost = 0.714795
Epoch: 0019 cost = 0.712271
Epoch: 0020 cost = 0.709770
Epoch: 0021 cost = 0.707293
Epoch: 0022 cost = 0.704836
Epoch: 0023 cost = 0.702399
Epoch: 0024 cost = 0.699980
Epoch: 0025 cost = 0.697578
Epoch: 0026 cost = 0.695190
Epoch: 0027 cost = 0.692790
Epoch: 0028 cost = 0.690402
Epoch: 0029 cost = 0.688026
Epoch: 0030 cost = 0.685662
Epoch: 0031 cost = 0.683293
Epoch: 0032 cost = 0.680926
Epoch: 0033 cost = 0.678567
Epoch: 0034 cost = 0.676267
Epoch: 0035 cost = 0.673993
Epoch: 0036 cost = 0

In [7]:
test_text = 'he hate you'
tests = [np.asarray([word_dict[n] for n in test_text.split()])]
test_input = torch.LongTensor(tests)
prediction = test_model(model,test_input)

if prediction == 0:
    print(test_text,"is Bad Mean...")
else:
    print(test_text,"is Good Mean!!")

he hate you is Good Mean!!


### 2.1 What is Batch Normalization?
- `normalization`
    - 정규화 : 데이터 범위를 사용자가 원하는 범위로 제한하는 것
        - feature scaling으로도 불림
    - 방법
        - [nn.BatchNorm2d](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html)

- train vanilla model + batch normalization

In [8]:
model_batchnormalized = text_cnn.TextCNN(
    num_filters, filter_sizes, vocab_size,
    embedding_size, sequence_length, num_classes,is_batch_normalize=True
)
model_batchnormalized

TextCNN(
  (W): Embedding(16, 2)
  (Weight): Linear(in_features=9, out_features=2, bias=False)
  (filter_list): ModuleList(
    (0-2): 3 x Conv2d(1, 3, kernel_size=(2, 2), stride=(1, 1))
  )
  (batch_norm_list): ModuleList(
    (0-2): 3 x BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)

In [14]:
train_model(model_batchnormalized,inputs,targets,100)

Epoch: 0001 cost = 0.546563
Epoch: 0002 cost = 0.532212
Epoch: 0003 cost = 0.518653
Epoch: 0004 cost = 0.505375
Epoch: 0005 cost = 0.491775
Epoch: 0006 cost = 0.477543
Epoch: 0007 cost = 0.462534
Epoch: 0008 cost = 0.446795
Epoch: 0009 cost = 0.431129
Epoch: 0010 cost = 0.420883
Epoch: 0011 cost = 0.416298
Epoch: 0012 cost = 0.410644
Epoch: 0013 cost = 0.404155
Epoch: 0014 cost = 0.397225
Epoch: 0015 cost = 0.390195
Epoch: 0016 cost = 0.383286
Epoch: 0017 cost = 0.376609
Epoch: 0018 cost = 0.370202
Epoch: 0019 cost = 0.364071
Epoch: 0020 cost = 0.358207
Epoch: 0021 cost = 0.353755
Epoch: 0022 cost = 0.350179
Epoch: 0023 cost = 0.346632
Epoch: 0024 cost = 0.343124
Epoch: 0025 cost = 0.340613
Epoch: 0026 cost = 0.337502
Epoch: 0027 cost = 0.333660
Epoch: 0028 cost = 0.330041
Epoch: 0029 cost = 0.327010
Epoch: 0030 cost = 0.324023
Epoch: 0031 cost = 0.321086
Epoch: 0032 cost = 0.318202
Epoch: 0033 cost = 0.315374
Epoch: 0034 cost = 0.312601
Epoch: 0035 cost = 0.309883
Epoch: 0036 cost = 0

### 2.2 What is Drop Out?
- drop out 
    - 드롭아웃 : 학습 시 , 일정 비율의 뉴런만 사용하고 나머지 뉴런에 해당하는 가중치는 업데이트 하지 않는 방법
        - 매 단계마다 사용하지 않는 뉴런을 바꾼다.
    - 방법 
        - [nn.Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html)

- train & test vanilla model + dropout

In [9]:
model_dropout = text_cnn.TextCNN(
    num_filters, filter_sizes, vocab_size,
    embedding_size, sequence_length, num_classes, dropout_prob = 0.5
)
model_dropout

TextCNN(
  (W): Embedding(16, 2)
  (Weight): Linear(in_features=9, out_features=2, bias=False)
  (filter_list): ModuleList(
    (0-2): 3 x Conv2d(1, 3, kernel_size=(2, 2), stride=(1, 1))
  )
  (dropout): Dropout(p=0.5, inplace=False)
)

In [10]:
train_model(model_dropout,inputs,targets,100)

Epoch: 0001 cost = 0.835831
Epoch: 0002 cost = 0.839361
Epoch: 0003 cost = 1.023070
Epoch: 0004 cost = 1.100236
Epoch: 0005 cost = 1.013633
Epoch: 0006 cost = 0.814668
Epoch: 0007 cost = 0.749790
Epoch: 0008 cost = 1.029342
Epoch: 0009 cost = 0.898030
Epoch: 0010 cost = 0.959308
Epoch: 0011 cost = 1.000341
Epoch: 0012 cost = 0.697045
Epoch: 0013 cost = 0.700373
Epoch: 0014 cost = 0.771796
Epoch: 0015 cost = 0.844553
Epoch: 0016 cost = 0.744747
Epoch: 0017 cost = 1.010743
Epoch: 0018 cost = 0.891656
Epoch: 0019 cost = 0.850598
Epoch: 0020 cost = 0.873699
Epoch: 0021 cost = 0.858110
Epoch: 0022 cost = 0.988086
Epoch: 0023 cost = 0.898695
Epoch: 0024 cost = 0.735082
Epoch: 0025 cost = 0.838922
Epoch: 0026 cost = 0.967191
Epoch: 0027 cost = 0.829715
Epoch: 0028 cost = 0.908104
Epoch: 0029 cost = 0.983906
Epoch: 0030 cost = 0.854923
Epoch: 0031 cost = 0.757845
Epoch: 0032 cost = 0.746741
Epoch: 0033 cost = 0.777772
Epoch: 0034 cost = 0.638498
Epoch: 0035 cost = 0.800358
Epoch: 0036 cost = 0

### 2.3 What is Early Stopping?
- early stopping
    - 조기 종료: 검증 데이터셋에 대한 오차가 증가하는 시점에 학습을 멈추도록 조정
    - 방법
        - [참고 코드](https://teddylee777.github.io/pytorch/early-stopping/)

- train vanilla model with early stop

In [11]:
train_model_with_early_stop(model_dropout,inputs,targets,100)

Epoch: 0001 cost = 0.720871
Epoch: 0002 cost = 0.814197
Epoch: 0003 cost = 0.668677
Epoch: 0004 cost = 0.706945
Epoch: 0005 cost = 0.696048
Epoch: 0006 cost = 0.760279
Epoch: 0007 cost = 0.642941
Epoch: 0008 cost = 0.622185
Epoch: 0009 cost = 0.771354
Epoch: 0010 cost = 0.762187
Epoch: 0011 cost = 0.692801
Epoch: 0012 cost = 0.675045
Epoch: 0013 cost = 0.620510
Epoch: 0014 cost = 0.645030
Epoch: 0015 cost = 0.702103
Epoch: 0016 cost = 0.743795
Epoch: 0017 cost = 0.698649
Epoch: 0018 cost = 0.694609
Early stopping at epoch 18 due to lack of improvement.
