문제 : colab에서는 실행되나, jupyter - monai에선 실행 한댐


----

# Convolutional Neural Networks for Sentence Classification using Ignite

This is a tutorial on using Ignite to train neural network models, setup experiments and validate models.

In this experiment, we'll be replicating [
Convolutional Neural Networks for Sentence Classification by Yoon Kim](https://arxiv.org/abs/1408.5882)! This paper uses CNN for text classification, a task typically reserved for RNNs, Logistic Regression, Naive Bayes.

We want to be able to classify IMDB movie reviews and predict whether the review is positive or negative. IMDB Movie Review dataset comprises of 25000 positive and 25000 negative examples. The dataset comprises of text and label pairs. This is binary classification problem. We'll be using PyTorch to create the model, torchtext to import data and Ignite to train and monitor the models!

Lets get started! 

* IMDB 영화 리뷰 분류, 25,000개 긍정문, 25,000부정문으로 구성
* text, label 쌍으로 구성

## Import Libraries

In [1]:
import random

`torchtext` is a library that provides multiple datasets for NLP tasks, similar to `torchvision`. Below we import the following:
* **datasets**: A module to download NLP datasets.
* **GloVe**: A module to download and use pretrained GloVe embedings.

In [2]:
from torchtext import datasets      # 공개 데이터셋
from torchtext.vocab import GloVe   # pretraining embeding dataset이용

In [3]:
# !pip install pytorch-ignite torchtext==0.9.1 spacy
# !python -m spacy download en_core_web_sm

In [4]:
# !pip install torchtext==0.9.1

We import torch, nn and functional modules to create our models! 

In [5]:
import torch
import torch.nn as nn
import torch.nn.functional as F

`Ignite` is a High-level library to help with training neural networks in PyTorch. It comes with an `Engine` to setup a training loop, various metrics, handlers and a helpful contrib section! 



Below we import the following:
* **Engine**: Runs a given process_function over each batch of a dataset, emitting events as it goes.
* **Events**: Allows users to attach functions to an `Engine` to fire functions at a specific event. Eg: `EPOCH_COMPLETED`, `ITERATION_STARTED`, etc.
* **Accuracy**: Metric to calculate accuracy over a dataset, for binary, multiclass, multilabel cases. 
* **Loss**: General metric that takes a loss function as a parameter, calculate loss over a dataset.
* **RunningAverage**: General metric to attach to Engine during training. 
* **ModelCheckpoint**: Handler to checkpoint models. 
* **EarlyStopping**: Handler to stop training based on a score function. 
* **ProgressBar**: Handler to create a tqdm progress bar.

[정리]
* Engine : training loop, metrics, handler, 데이터셋의 각 배치의 process_fuction을 실행, event를 발생
* Events : 특정 이벤트에서 특정 기능을 추하할 수 있게 함
* RunningAverage : 훈련 중 Engine에 연결할 metric
* ProgressBar : tqdm 진행률 표시

In [6]:
from ignite.engine import Engine, Events
from ignite.metrics import Accuracy, Loss, RunningAverage
from ignite.handlers import ModelCheckpoint, EarlyStopping
from ignite.contrib.handlers import ProgressBar
from ignite.utils import manual_seed

SEED = 1234
manual_seed(SEED)

In [7]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = '5,6,7'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Processing Data

We first set up a tokenizer using `torchtext.data.utils`.
The job of a tokenizer to split a sentence into "tokens". You can read more about it at [wikipedia](https://en.wikipedia.org/wiki/Lexical_analysis).
We will use the tokenizer from the "spacy" library which is a popular choice. Feel free to switch to "basic_english" if you want to use the default one or any other that you want.

docs: https://pytorch.org/text/stable/data_utils.html

spacy 라이브러리를 이용하여 토크나이징 진행

In [8]:
!python -m spacy download en_core_web_sm

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting en-core-web-sm==3.1.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.1.0/en_core_web_sm-3.1.0-py3-none-any.whl (13.6 MB)
     |████████████████████████████████| 13.6 MB 207 kB/s            
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [9]:
from torchtext.data.utils import get_tokenizer
tokenizer = get_tokenizer("spacy")



In [10]:
# 토크나이징 예시
tokenizer("Ignite is a high-level library for training and evaluating neural networks.")

['Ignite',
 'is',
 'a',
 'high',
 '-',
 'level',
 'library',
 'for',
 'training',
 'and',
 'evaluating',
 'neural',
 'networks',
 '.']

Next, the IMDB training and test datasets are downloaded. The `torchtext.datasets` API returns the train/test dataset split directly without the preprocessing information. Each split is an iterator which yields the raw texts and labels line-by-line.

In [11]:
train_iter, test_iter = datasets.IMDB(split=('train','test'))

Now we set up the train, validation and test splits.  

In [12]:
# We are using only 1000 samples for faster training
# set to -1 to use full data
N = 1000 

# We will use 80% of the `train split` for training and the rest for validation
train_frac = 0.8
_temp = list(train_iter)


random.shuffle(_temp)
_temp = _temp[:(N if N > 0 else len(_temp) )]
n_train = int(len(_temp)*train_frac)

train_list = _temp[:n_train]
validation_list = _temp[n_train:]
test_list = list(test_iter)
test_list = test_list[:(N if N > 0 else len(test_list))]

In [13]:
train_list[:2]

[('neg',
  "David Mamet is a very interesting and a very un-equal director. His first movie 'House of Games' was the one I liked best, and it set a series of films with characters whose perspective of life changes as they get into complicated situations, and so does the perspective of the viewer.<br /><br />So is 'Homicide' which from the title tries to set the mind of the viewer to the usual crime drama. The principal characters are two cops, one Jewish and one Irish who deal with a racially charged area. The murder of an old Jewish shop owner who proves to be an ancient veteran of the Israeli Independence war triggers the Jewish identity in the mind and heart of the Jewish detective.<br /><br />This is were the flaws of the film are the more obvious. The process of awakening is theatrical and hard to believe, the group of Jewish militants is operatic, and the way the detective eventually walks to the final violent confrontation is pathetic. The end of the film itself is Mamet-like sm

Let's explore a data sample to see what it looks like.
Each data sample is a tuple  of the format `(label, text)`.

The value of label can is either 'pos' or 'neg'.


In [14]:
random_sample = random.sample(train_list,1)[0]
print(' text:', random_sample[1])
print('label:', random_sample[0])

 text: Now I'll be the first to admit it when I say something that may be blasphemous or unfair, so I would like to apologize in advance for my ranting about how much I disliked this movie.<br /><br />That about sums it up too. I disliked this movie. To be more specific, I disliked the concept of this movie. The cinematography was good. The mood was nice. And the acting was satisfactory.<br /><br /> However, the story is fatuous, unacurate and misleading. It is also offensive.<br /><br />I am a quarter Cree Indian, and for some reason I feel insulted, on a personal level, by the nature of Whitaker's character. First of all, he's a black guy. And this isn't a racist remark, I swear. The thought of a White, Hispanic or even Native American swinging a katana on a rooftop offends everything that the katana represents. The katana represents the soul of a Samurai, imbibed with the souls of his ancestors who guide and protect the Samurai. For Ghost Dog to use his guns instead of the Katana is

Now that we have the datasets splits, let's build our vocabulary. For this, we will use the `Vocab` class from `torchtext.vocab`. It is important that we build our vocabulary based on the train dataset as validation and test are **unseen** in our experimenting. 

`Vocab` allows us to use pretrained **GloVE** 100 dimensional word vectors. This means each word is described by 100 floats! If you want to read more about this, here are a few resources.
* [StanfordNLP - GloVe](https://github.com/stanfordnlp/GloVe)
* [DeepLearning.ai Lecture](https://www.coursera.org/lecture/nlp-sequence-models/glove-word-vectors-IxDTG)
* [Stanford CS224N Lecture by Richard Socher](https://www.youtube.com/watch?v=ASn7ExxLZws)

Note than the GloVE download size is around 900MB, so it might take some time to download. 

An instance of the `Vocab` class has the following attributes:
* `extend` is used to extend the vocabulary
* `freqs` is a dictionary of the frequency of each word
* `itos` is a list of all the words in the vocabulary.
* `stoi` is a dictionary mapping every word to an index.
* `vectors` is a torch.Tensor of the downloaded embeddings

----
어휘(vocab) build. 'torchtext.vocab의 Vocab 클래스 이용'. 학습중에 valid, test는 쓰지 않기 때문에 train 데이터셋을 이용한 어휘구축이 중요. pretrained `GloVE`를 이용하여 100차원 단어벡터 생성. 즉, 각 단어는 100개의 float으로 설명가능하다는 의미.

Vocab클래스의 attributes:
* extend: 어휘 확장
* freqs : 각 단어의 frequence dict
* itos : 어휘에 있는 모든 단어의 ordered 리스트
* stoi : 모든 단어를 index에 맵핑하는 dict
* vectors : 다운로드한 임베딩의 텐서

In [15]:
from collections import Counter
from torchtext.vocab import Vocab

counter = Counter()

for (label, line) in train_list:
    counter.update(tokenizer(line))

vocab = Vocab(
    counter,
    min_freq=10,
    vectors=GloVe(name='6B', dim=100, cache='/tmp/glove/')
)

In [16]:
print("The length of the new vocab is", len(vocab))
new_stoi = vocab.stoi
print("The index of '<BOS>' is", new_stoi['<BOS>'])
new_itos = vocab.itos
print("The token at index 2 is", new_itos[2])

The length of the new vocab is 2004
The index of '<BOS>' is 0
The token at index 2 is the


In [17]:
vocab.vectors.shape, vocab.vectors

(torch.Size([2004, 100]),
 tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
         ...,
         [ 0.1388,  0.7624,  1.1537,  ...,  0.4456,  0.4540,  0.4101],
         [-0.0863, -0.3406,  0.8735,  ..., -1.0531, -0.2014,  1.4113],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]]))

In [18]:
vocab['The'], vocab['the'], vocab['of']

(25, 2, 7)

In [19]:
vocab.stoi['The'], vocab.stoi['the'], vocab.stoi['of']

(25, 2, 7)

We now create `text_transform` and `label_transform`, which are callable objects, such as a `lambda` func here, to process the raw text and label data from the dataset iterators (or iterables like a `list`). You can add the special symbols such as `<BOS>` and `<EOS>` to the sentence in `text_transform`.

In [20]:
text_transform = lambda x: [vocab[token] for token in tokenizer(x)]
label_transform = lambda x: 1 if x == 'pos' else 0

# Print out the output of text_transform
print("input to the text_transform:", "here is an example")
print("output of the text_transform:", text_transform("here is an example"))

input to the text_transform: here is an example
output of the text_transform: [163, 9, 42, 563]


For generating the data batches we will use `torch.utils.data.DataLoader`. You could customize the data batch by defining a function with the `collate_fn` argument in the DataLoader. Here, in the `collate_batch` func, we process the raw text data and add padding to dynamically match the longest sentence in a batch.

DataLoader로 배치생성, collate_fn인수로 데이터일괄처리.(배치에서 가장 긴문장으로 맞춰주도록 패딩 추가)

In [21]:
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence

def collate_batch(batch):
    label_list, text_list = [], []
    for (_label, _text) in batch:
        label_list.append(label_transform(_label))
        processed_text = torch.tensor(text_transform(_text))
        text_list.append(processed_text)
    return torch.tensor(label_list), pad_sequence(text_list, padding_value=3.0)


In [22]:
batch_size = 8  # A batch size of 8

# train, validation, test 순으로 dataloader 생성
def create_iterators(batch_size=8):
    """Heler function to create the iterators"""
    dataloaders = []
    for split in [train_list, validation_list, test_list]:
        print(len(split))
        dataloader = DataLoader(
            split, batch_size=batch_size,
            collate_fn=collate_batch
            )
        dataloaders.append(dataloader)
    return dataloaders


In [23]:
train_iterator, valid_iterator, test_iterator = create_iterators()

800
200
1000


In [24]:
# seq는 컬럼별로 한문장씩 들어가 있음
l, seq = next(iter(train_iterator))
l, seq, seq.shape

(tensor([0, 0, 1, 0, 1, 1, 0, 0]),
 tensor([[731,  35, 331,  ...,  12,  12,  12],
         [  0,   8,   0,  ..., 239, 184,  33],
         [  9, 240,  50,  ...,   8,  35, 131],
         ...,
         [  3,   3,   3,  ...,   3,   3,   3],
         [  3,   3,   3,  ...,   3,   3,   3],
         [  3,   3,   3,  ...,   3,   3,   3]]),
 torch.Size([665, 8]))

In [25]:
# 8개 배치(행), 1문장의 token index (각 token index는 vocab을 이용하여 단어별 100차원의 벡터(2D 이미지의 채널 개념)로 만들 수 있다.)
seq.T.shape, seq.T

(torch.Size([8, 665]),
 tensor([[731,   0,   9,  ...,   3,   3,   3],
         [ 35,   8, 240,  ...,   3,   3,   3],
         [331,   0,  50,  ...,   3,   3,   3],
         ...,
         [ 12, 239,   8,  ...,   3,   3,   3],
         [ 12, 184,  35,  ...,   3,   3,   3],
         [ 12,  33, 131,  ...,   3,   3,   3]]))

In [26]:
torch.tensor(text_transform(train_list[0][1]))

tensor([ 731,    0,    9,    6,   63,  193,    5,    6,   63,    0,   17,    0,
         212,    4,  525,  110,   22,   44,    0,    7,    0,   44,   19,    2,
          37,   12,  444,  144,    3,    5,   11,  279,    6,  228,    7,  122,
          21,  133,  539, 1959,    7,  136, 1618,   20,   47,   91,   95,    0,
           0,    3,    5,   49,   81,    2, 1959,    7,    2,    0,   18, 1112,
           9,   44,    0,   44,   73,   43,    2,  592,  538,    8,  279,    2,
         395,    7,    2,  593,    8,    2,  554,  737,  470,    4,   25,    0,
         133,   31,  113, 1072,    3,   37, 1483,    5,   37,    0,   48,  849,
          21,    6,    0,    0, 1613,    4,   25,  548,    7,   42,  211, 1483,
        1814, 1443,   48,    0,    8,   36,   42,    0,    0,    7,    2,    0,
           0,  503,    0,    2, 1483,    0,   10,    2,  395,    5,  545,    7,
           2, 1483,    0,   18,  345,    9,   71,    2, 1637,    7,    2,   23,
          31,    2,   58,  639,    4,   

In [27]:
torch.tensor(text_transform(train_list[1][1]))

tensor([  35,    8,  240,  116,    0,    0,  340,    2,    0,   24,   64,   38,
           0,    7,   13,  414,    4,    0, 1593,    5,   96,  143,   64,  394,
           5,  290,    0, 1932,    0,    4,    0,  110,    0,   15,   12,   19,
        1139,    0, 1689,    2,    0,   20,   85,    0,  394, 1028,   19,    0,
         464,   89,   15,    4,    0,    0,   26, 1178,    0,   14,    0,  287,
          20,   99,   20, 1178,    0,    0,    0,  287,   15,    4,   27,  123,
         124,   18,  840,    0, 1593,    0,    3,    6,  387,    0,    2,    0,
          19,    0,    3,   20,    6,    0,    0,   26,    0, 1593,    0,   19,
         244,    4,   25,  908,    5,  387,    7,    0,   10,    2,    0,   19,
         131, 1377,   29,  252,    4,  165,   31,    6,  191, 1796,    0,   13,
         195,    0,    4,  313,   47,  358,  147,   43,    2,  387,    0, 1858,
           0,   19,  643,  145,    6,   45,   93,   84,   19,    6,    0,  375,
           0,    3,  164,    0,   13,   

Let's actually explore what the output of the iterator is, this way we'll know what the input of the model is, how to compare the label to the output and how to setup are process_functions for Ignite's `Engine`.
* `batch[0][0]` is the label of a single example. We can see that `vocab.stoi` was used to map the label that originally text into a float.
* `batch[1][0]` is the text of a single example. Similar to label, `vocab.stoi` was used to convert each token of the example's text into indices.

Now let's print the lengths of the sentences of the first 10 batches of `train_iterator`. We see here that all the batches are of different lengths, this means that the iterator is working as expected.

* 아래에서 Ignite 'Engine'의 process_fuctions 설정할수있다.
* batch[문장번호][0] : 레이블
* batch[문장번호][1] : 문장텐서
* 아래 10개 배치를 출력해보면, 배치마다 문장길이가 다르다는 점을 알 수 있다.

In [28]:
batch = next(iter(train_iterator))
print('batch[0][0] : ', batch[0][0])
print('batch[1][0] : ', batch[1][[0] != 1])

lengths = []
for i, batch in enumerate(train_iterator):
    x = batch[1]
    lengths.append(x.shape[0])
    if i == 10:
        break

print ('Lengths of first 10 batches : ', lengths)

batch[0][0] :  tensor(0)
batch[1][0] :  tensor([[[731,  35, 331,  ...,  12,  12,  12],
         [  0,   8,   0,  ..., 239, 184,  33],
         [  9, 240,  50,  ...,   8,  35, 131],
         ...,
         [  3,   3,   3,  ...,   3,   3,   3],
         [  3,   3,   3,  ...,   3,   3,   3],
         [  3,   3,   3,  ...,   3,   3,   3]]])
Lengths of first 10 batches :  [665, 628, 1182, 369, 695, 816, 943, 570, 343, 806, 469]


10개 batch를 살펴봤을 때, 문장길이가 다 다름

## TextCNN Model

Here is the replication of the model, here are the operations of the model:
* **Embedding**: Embeds a batch of text of shape (N, L) to (N, L, D), where N is batch size, L is maximum length of the batch, D is the embedding dimension. 

* **Convolutions**: Runs parallel convolutions across the embedded words with kernel sizes of 3, 4, 5 to mimic trigrams, four-grams, five-grams. This results in outputs of (N, L - k + 1, D) per convolution, where k is the kernel_size. 

* **Activation**: ReLu activation is applied to each convolution operation.

* **Pooling**: Runs parallel maxpooling operations on the activated convolutions with window sizes of L - k + 1, resulting in 1 value per channel i.e. a shape of (N, 1, D) per pooling. 

* **Concat**: The pooling outputs are concatenated and squeezed to result in a shape of (N, 3D). This is a single embedding for a sentence.

* **Dropout**: Dropout is applied to the embedded sentence. 

* **Fully Connected**: The dropout output is passed through a fully connected layer of shape (3D, 1) to give a single output for each example in the batch. sigmoid is applied to the output of this layer.

* **load_embeddings**: This is a method defined for TextCNN to load embeddings based on user input. There are 3 modes - rand which results in randomly initialized weights, static which results in frozen pretrained weights, nonstatic which results in trainable pretrained weights. 


Let's note that this model works for variable text lengths! The idea to embed the words of a sentence, use convolutions, maxpooling and concantenation to embed the sentence as a single vector! This single vector is passed through a fully connected layer with sigmoid to output a single value. This value can be interpreted as the probability a sentence is positive (closer to 1) or negative (closer to 0).

The minimum length of text expected by the model is the size of the smallest kernel size of the model.

* Embedding : (N, L)모양의 input을 (N, L, D) 모양으로 변경 N: 배치크기, L: 배치내의 최대길이, D: 임베딩차원(100)
  * 여기서 쓴 vocab은 각 단어를 100개의 벡터차원으로 만들기 때문에 임베딩 차원은 100
* Convolution: 커널크기가 3, 4, 5인 임베딩 단어들을 병렬 conv를 실행하여 trigrams, four-grams, five-grams을 모방(?). 결과적으로 conv당 (N, L-k+1, D)의 출력. k(커널크기)
* Activation : relu
* Polling : 윈도크기가 L-k+1인 conv에서 병렬 풀링 작업하여 풀링당 (N, 1, D)모양이 생성됨
* Concat : (N, 3D) 모양생성, 이건 각 문장에 대한 단일 임베딩을 의미.
* FC : (3D, 1)모양의 fc layer를 통과하여 단일 출력 제공, sigmoid로 확률 출력
* load_embeddings : 사용자 입력을 기반으로 임베딩을 로드하도록 정의된 메서드, 3가지 모드가 있음. 
  * randomly initialized weights
  * frozen pretrained weights
  * trainable pretrained weights
  
본, 모델은 가변 텍스트에 대해 동작 가능. 모델이 예상하는 최소 텍스트 길이는 가장작은 커널 사이즈가 됨.

In [29]:
class TextCNN(nn.Module):
    def __init__(
        self,
        vocab_size,
        embedding_dim, 
        kernel_sizes, 
        num_filters, 
        num_classes, d_prob, mode):
        super(TextCNN, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.kernel_sizes = kernel_sizes
        self.num_filters = num_filters
        self.num_classes = num_classes
        self.d_prob = d_prob
        self.mode = mode
        self.embedding = nn.Embedding(
            vocab_size, embedding_dim, padding_idx=0)   # 1.배치크기(8), 배치최대길이(한문장) -> 8, 배치최대길이(한문장), 임베딩차원(100)
        self.load_embeddings()
        # 2.배치크기(8), 커널이용한 문장 축소(L-k+1), 임베딩차원(100)
        self.conv = nn.ModuleList([nn.Conv1d(in_channels=embedding_dim,
                                             out_channels=num_filters,
                                             kernel_size=k, stride=1) for k in kernel_sizes])
        self.dropout = nn.Dropout(d_prob)
        self.fc = nn.Linear(len(kernel_sizes) * num_filters, num_classes)

    def forward(self, x):
        batch_size, sequence_length = x.shape
        x = self.embedding(x.T).transpose(1, 2)
        x = [F.relu(conv(x)) for conv in self.conv]   # x는 커널 3,4,5당 하나씩 생김
        x = [F.max_pool1d(c, c.size(-1)).squeeze(dim=-1) for c in x] # 3. 커널 당, 배치크기(8), 1, 임베딩차원(100) 생성
        x = torch.cat(x, dim=1) # 배치크기(8), 300<임베딩차원(100)*3> 생성
        x = self.fc(self.dropout(x)) # 배치당 1개의 값변경
        return torch.sigmoid(x).squeeze()

    def load_embeddings(self):
        if 'static' in self.mode:
            self.embedding.weight.data.copy_(vocab.vectors)
            if 'non' not in self.mode:
                self.embedding.weight.data.requires_grad = False
                print('Loaded pretrained embeddings, weights are not trainable.')
            else:
                self.embedding.weight.data.requires_grad = True
                print('Loaded pretrained embeddings, weights are trainable.')
        elif self.mode == 'rand':
            print('Randomly initialized embeddings are used.')
        else:
            raise ValueError('Unexpected value of mode. Please choose from static, nonstatic, rand.')

## Creating Model, Optimizer and Loss

Below we create an instance of the TextCNN model and load embeddings in **static** mode. The model is placed on a device and then a loss function of Binary Cross Entropy and Adam optimizer are setup. 

In [30]:
vocab.vectors.shape

torch.Size([2004, 100])

In [31]:
vocab_size, embedding_dim = vocab.vectors.shape

model = TextCNN(vocab_size=vocab_size,
                embedding_dim=embedding_dim,
                kernel_sizes=[3, 4, 5],
                num_filters=100,
                num_classes=1, 
                d_prob=0.5,
                mode='static')
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-3)
criterion = nn.BCELoss()

Loaded pretrained embeddings, weights are not trainable.


A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/



## Training and Evaluating using Ignite

### Trainer Engine - process_function

Ignite's Engine allows user to define a process_function to process a given batch, this is applied to all the batches of the dataset. This is a general class that can be applied to train and validate models! A process_function has two parameters engine and batch. 


Let's walk through what the function of the trainer does:

* Sets model in train mode. 
* Sets the gradients of the optimizer to zero.
* Generate x and y from batch.
* Performs a forward pass to calculate y_pred using model and x.
* Calculates loss using y_pred and y.
* Performs a backward pass using loss to calculate gradients for the model parameters.
* model parameters are optimized using gradients and optimizer.
* Returns scalar loss. 

Below is a single operation during the trainig process. This process_function will be attached to the training engine.

Ignite 엔진 사용하면 사용자가 주어진 배치를 처리하기 위해 process_fuction을 정의할 수 있음. 이때 한배치에 대한 함수정의만 해두면, 이걸로 모든 배치에 적용가능.(즉, epoch 설정을 안해도 되는 정도). 이렇게 fuction정의하면 training, validate에 둘다 적용. `engine`, `batch` 파라미터가 있음

아래처럼 정의하면 training engine에 붙어짐

In [32]:
def process_function(engine, batch):
    model.train()
    optimizer.zero_grad()
    y, x = batch
    x = x.to(device)
    y = y.to(device)
    y_pred = model(x)
    loss = criterion(y_pred, y.float())
    loss.backward()
    optimizer.step()
    return loss.item()

### Evaluator Engine - process_function

Similar to the training process function, we setup a function to evaluate a single batch. Here is what the eval_function does:

* Sets model in eval mode.
* Generates x and y from batch.
* With torch.no_grad(), no gradients are calculated for any succeding steps.
* Performs a forward pass on the model to calculate y_pred based on model and x.
* Returns y_pred and y.

Ignite suggests attaching metrics to evaluators and not trainers because during the training the model parameters are constantly changing and it is best to evaluate model on a stationary model. This information is important as there is a difference in the functions for training and evaluating. Training returns a single scalar loss. Evaluating returns y_pred and y as that output is used to calculate metrics per batch for the entire dataset.

All metrics in Ignite require y_pred and y as outputs of the function attached to the Engine. 

evaluator에 metric을 붙여야함. trainer는 loss를 반환, evaluate함수는 y_pred, y를 반환. 이걸로 배치당 metric을 계산하게 됨.

In [33]:
def eval_function(engine, batch):
    model.eval()
    with torch.no_grad():
        y, x = batch
        y = y.to(device)
        x = x.to(device)
        y = y.float()
        y_pred = model(x)
        return y_pred, y

### Instantiating Training and Evaluating Engines

Below we create 3 engines, a trainer, a training evaluator and a validation evaluator. You'll notice that train_evaluator and validation_evaluator use the same function, we'll see later why this was done! 

위를 이용하여 train, evaluating 엔진의 인스턴스를 만든다
rain_evaluator와 validation_evaluator가 같은 기능을 사용한다는 것을 알 수 있을 것입니다. 우리는 이것이 왜 수행되었는지 나중에 알게 될 것입니다!

In [34]:
trainer = Engine(process_function)
train_evaluator = Engine(eval_function)
validation_evaluator = Engine(eval_function)

### Metrics - RunningAverage, Accuracy and Loss

To start, we'll attach a metric of Running Average to track a running average of the scalar loss output for each batch. 

시작하려면 trainer에 Running Average의 메트릭을 연결하여 각 배치에 대한 스칼라 손실 출력의 실행 평균을 추적

In [35]:
RunningAverage(output_transform=lambda x: x).attach(trainer, 'loss')

Now there are two metrics that we want to use for evaluation - accuracy and loss. This is a binary problem, so for Loss we can simply pass the Binary Cross Entropy function as the loss_function. 

For Accuracy, Ignite requires y_pred and y to be comprised of 0's and 1's only. Since our model outputs from a sigmoid layer, values are between 0 and 1. We'll need to write a function that transforms `engine.state.output` which is comprised of y_pred and y. 

Below `thresholded_output_transform` does just that, it rounds y_pred to convert y_pred to 0's and 1's, and then returns rounded y_pred and y. This function is the output_transform function used to transform the `engine.state.output` to achieve `Accuracy`'s desired purpose.

Now, we attach `Loss` and `Accuracy` (with `thresholded_output_transform`) to train_evaluator and validation_evaluator. 

To attach a metric to engine, the following format is used:
* `Metric(output_transform=output_transform, ...).attach(engine, 'metric_name')`

----

* evaluation에 사용하기위한 2가지 metric(accuracy, loss). 2진분류기 때문에 BCE를 사용함. 
* accuracy를 위해선, ignite metric에선 y_pred, y가 0과 1로만 구성되어야 함. 위의 모델에서 정의하길 sigmoid layer가 있기 때문에 0~1의 확률값이 나올테고 즉,, engine.state.output을 변환하는 함수를 작성해야 함.
* 그 이후 loss, accuracy를 evaluator에 연결함(연결을 위해 아래의 문법을 사용) - loss는 불필요
* `Metric(output_transform=output_transform, ...).attach(engine, 'metric_name')`

In [36]:
def thresholded_output_transform(output):
    y_pred, y = output
    y_pred = torch.round(y_pred)  # 0.5를 기준으로 정수화
    return y_pred, y

```python
from ignite.metrics import Accuracy, Loss
```

In [37]:
# 뒤에 문자열 'accuracy', 'bce'는 사용자 임의의 이름을 붙여주는 것임
Accuracy(output_transform=thresholded_output_transform).attach(train_evaluator, 'accuracy')
Loss(criterion).attach(train_evaluator, 'bce')  # loss는 0,1변환 필요 x

In [38]:
Accuracy(output_transform=thresholded_output_transform).attach(validation_evaluator, 'accuracy')
Loss(criterion).attach(validation_evaluator, 'bce')

### Progress Bar

Next we create an instance of Ignite's progess bar and attach it to the trainer and pass it a key of `engine.state.metrics` to track. In this case, the progress bar will be tracking `engine.state.metrics['loss']`

Ignite의 progress bar를 trainer에 연결, metrics 키를 전달. 이렇게 되면 진행바는 `engine.state.metrics['loss']`를 추적함

In [39]:
pbar = ProgressBar(persist=True, bar_format="")
pbar.attach(trainer, ['loss'])

### EarlyStopping - Tracking Validation Loss

Now we'll setup a Early Stopping handler for this training process. EarlyStopping requires a score_function that allows the user to define whatever criteria to stop trainig. In this case, if the loss of the validation set does not decrease in 5 epochs, the training process will stop early.  


----

early stopping에선 score_function으로 훈련을 중지할 기준을 전달함. 아래 코드는 5 epoch동안 valid set의 손실(bce)가 감소하지 않으면 학습 프로세스가 중지됨

In [40]:
def score_function(engine):
    val_loss = engine.state.metrics['bce']
    return -val_loss   # 감소할 수록 좋은거니 -를 붙임(원래 3->2 면 stop, -3 -2면 stop안함)

handler = EarlyStopping(patience=5, score_function=score_function, trainer=trainer)
validation_evaluator.add_event_handler(Events.COMPLETED, handler)

<ignite.engine.events.RemovableEventHandle at 0x7f2f83f83820>

### Attaching Custom Functions to Engine at specific Events

Below you'll see ways to define your own custom functions and attaching them to various `Events` of the training process.

The functions below both achieve similar tasks, they print the results of the evaluator run on a dataset. One function does that on the training evaluator and dataset, while the other on the validation. Another difference is how these functions are attached in the trainer engine.

The first method involves using a decorator, the syntax is simple - `@` `trainer.on(Events.EPOCH_COMPLETED)`, means that the decorated function will be attached to the trainer and called at the end of each epoch. 

The second method involves using the add_event_handler method of trainer - `trainer.add_event_handler(Events.EPOCH_COMPLETED, custom_function)`. This achieves the same result as the above. 

----

### 특정 이벤트 시점에서 에서 Egine에 사용자 함수 연결 
사용자 정의 함수를 정의하고 training 과정의 다양한 event 시점에 연결하는 방법을 살펴본다. 아래 함수 두개는 비슷한 작업을 수행하며 데이터 세트에서 실행한 evaluator의 결과를 실행. 하나는 train evaluator에 실행하고 다른하나는 validation evaluator에 실행. 

* 방법 1: '@' 데코레이터 사용, trainer.on(Events.EPOCH_COMPLETED)는 각 epoch가 끝날때 호출
* 방법 2: `trainer.add_event_handler(Events.EPOCH_COMPLETED, custom_function)` 사용, 같은 결과

In [41]:
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(engine):
    train_evaluator.run(train_iterator)
    metrics = train_evaluator.state.metrics
    avg_accuracy = metrics['accuracy']
    avg_bce = metrics['bce']
    pbar.log_message(
        "Training Results - Epoch: {}  Avg accuracy: {:.2f} Avg loss: {:.2f}"
        .format(engine.state.epoch, avg_accuracy, avg_bce))
    
def log_validation_results(engine):
    validation_evaluator.run(valid_iterator)
    metrics = validation_evaluator.state.metrics
    avg_accuracy = metrics['accuracy']
    avg_bce = metrics['bce']
    pbar.log_message(
        "Validation Results - Epoch: {}  Avg accuracy: {:.2f} Avg loss: {:.2f}"
        .format(engine.state.epoch, avg_accuracy, avg_bce))
    pbar.n = pbar.last_print_n = 0

trainer.add_event_handler(Events.EPOCH_COMPLETED, log_validation_results)

<ignite.engine.events.RemovableEventHandle at 0x7f2f803914f0>

### ModelCheckpoint

Lastly, we want to checkpoint this model. It's important to do so, as training processes can be time consuming and if for some reason something goes wrong during training, a model checkpoint can be helpful to restart training from the point of failure.

Below we'll use Ignite's `ModelCheckpoint` handler to checkpoint models at the end of each epoch. 

체크포인트 : 훈련 프로세스가 시간이 많이 소요될텐데 중간에 끊기면 다시 시작 가능. 아래코드를 이용하여 각 epoch가 끝날 때 모델을 검사

In [42]:
checkpointer = ModelCheckpoint('models_textcnn', 'textcnn', n_saved=2, create_dir=True, save_as_state_dict=True)
trainer.add_event_handler(Events.EPOCH_COMPLETED, checkpointer, {'textcnn': model})

<ignite.engine.events.RemovableEventHandle at 0x7f2f83ede280>

### Run Engine

Next, we'll run the trainer for 20 epochs and monitor results. Below we can see that progess bar prints the loss per iteration, and prints the results of training and validation as we specified in our custom function. 

In [43]:
# !export TORCH_CUDA_ARCH_LIST=8.0

In [44]:
trainer.run(train_iterator, max_epochs=20)

ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: CUDA error: no kernel image is available for execution on the device
ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: CUDA error: no kernel image is available for execution on the device


RuntimeError: CUDA error: no kernel image is available for execution on the device

In [45]:
torch.__version__

'1.8.1+cu102'

In [46]:
!pip install pytorch

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting pytorch
  Downloading pytorch-1.0.2.tar.gz (689 bytes)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: pytorch
  Building wheel for pytorch (setup.py) ... [?25lerror
[31m  ERROR: Command errored out with exit status 1:
   command: /opt/conda/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-na2v2ml_/pytorch_31350aa890a34774b922c3460b955fde/setup.py'"'"'; __file__='"'"'/tmp/pip-install-na2v2ml_/pytorch_31350aa890a34774b922c3460b955fde/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-ajty

That's it! We have successfully trained and evaluated a Convolutational Neural Network for Text Classification. 