## Colab 환경 구축


### 활용 라이브러리 (고정)

*   [torch==1.9.0](https://pytorch.org/)
*   [pytorch-lightning==1.4.2](https://pypi.org/project/pytorch-lightning/1.4.2/)

In [None]:
!pip3 install torch==1.9.0 torchvision torchaudio
!pip3 install pytorch-lightning==1.4.2



# **Number Finding Problem**

## 1. Preprocess and Prepare Dataset
### 1) Dataset

- **Input A** : sequence of numbers
- **Input B** : query number
- **Output** : the first greater number than query in the sequence

```
Input A : 9,9,2,6,0,3,3,4,6
Input B : 3
Output : 4
```


---

### 2) 데이터 전처리

- build vocab
  - 각 단어마다 인덱스를 부여하기 위해 vocab 구축
- tokenize


In [None]:
import os
import numpy as np 

import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader, random_split

import pytorch_lightning as pl
from pytorch_lightning import LightningDataModule, LightningModule, Trainer
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint, LearningRateMonitor
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_lightning.metrics import functional as FM


In [None]:
def load_data(fn):
    data = []
    with open(fn, 'r', encoding='utf-8') as f:
        for line in f:
            line = line.rstrip()

            seq_str, query, y = line.split('\t')
            seqs = seq_str.split(',')
            data.append( (seqs, query, y) )
    return data


In [None]:
class NumberDataset(Dataset):
    def __init__(self, fn, input_vocab, output_vocab, max_seq_length):
        self.input_vocab = input_vocab
        self.output_vocab = output_vocab
        self.max_seq_length = max_seq_length 
        
        # load 
        self.data = load_data(fn)

    def __len__(self):
        return len(self.data) 

    def __getitem__(self, idx): 
        seq, q, y = self.data[idx]

        # [ input ]
        seq_ids = [ self.input_vocab[t] for t in seq ]

        # <pad> processing
        pad_id = self.input_vocab['<pad>']
        num_to_fill = self.max_seq_length - len(seq)
        seq_ids = seq_ids + [pad_id]*num_to_fill

        # mask processing (1 for valid, 0 for invalid)
        weights = [1]*len(seq) + [0]*num_to_fill

        # [ query ]
        # query vocab space is same as input vocab space
        q_id = self.input_vocab[q]

        # [ ouput ] 
        y_id = self.output_vocab[y]

        item = [
                    # input
                    np.array(seq_ids),
                    q_id,
                    np.array(weights),

                    # output
                    y_id
               ]
        return item 

In [None]:
class NumberDataModule(LightningDataModule):
    def __init__(self, 
                 batch_size: int = 32,
                 max_seq_length: int=12):
        super().__init__()
        self.batch_size = batch_size
        self.max_seq_length = max_seq_length 

        data_path = os.path.join('/content/drive/MyDrive/AISoftware/data')
        input_vocab, output_vocab = self.make_vocab(os.path.join(data_path, 'train.txt'))
        self.input_vocab_size = len( input_vocab )
        self.output_vocab_size = len( output_vocab )
        self.padding_idx = input_vocab['<pad>']

        self.all_train_dataset = NumberDataset(os.path.join(data_path, 'train.txt'), input_vocab, output_vocab, max_seq_length)
        self.test_dataset      = NumberDataset(os.path.join(data_path, 'test.txt'), input_vocab, output_vocab, max_seq_length)

        # random split train / valiid for early stopping
        N = len(self.all_train_dataset)
        tr = int(N*0.8) # 8 for the training
        va = N - tr     # 2 for the validation 
        self.train_dataset, self.valid_dataset = random_split(self.all_train_dataset, [tr, va])

    def make_vocab(self, fn):
        input_tokens = []
        output_tokens = []
        data = load_data(fn)

        for seqs, query, y in data:
            for token in seqs:
                input_tokens.append(token)
            output_tokens.append(y)
        
        input_tokens = list(set(input_tokens))
        output_tokens = list(set(output_tokens)) 

        input_tokens.sort()
        output_tokens.sort()

        # [input vocab]
        # add <pad> symbol to input tokens as a first item
        input_tokens = ['<pad>'] + input_tokens 
        input_vocab = { str(token):index for index, token in enumerate(input_tokens) }

        # [output voab]
        output_vocab = { str(token):index for index, token in enumerate(output_tokens) }

        return input_vocab, output_vocab

    def train_dataloader(self):
        return DataLoader(self.train_dataset, batch_size=self.batch_size, shuffle=True)

    def val_dataloader(self):
        return DataLoader(self.valid_dataset, batch_size=self.batch_size)

    def test_dataloader(self):
        return DataLoader(self.test_dataset, batch_size=self.batch_size)

## 2. Build Model

- 모델 : Bahdanau Attention
  - Attention mechanism inherit


In [None]:
class Attention(nn.Module):
    """
    Attention mechanism base class. 
    Inputs:
        query_vector:   (hidden_size)
        multiple_items: (batch_size, num_of_items, hidden_size)
    Returns:
        attention_scores: (batch_size, num_of_items)
    """
    def __init__(self, item_dim, query_dim, attention_dim):
        super(Attention, self).__init__()
        self.item_dim = item_dim            # dim. of multiple item vector
        self.query_dim = query_dim          # dim. of query vector
        self.attention_dim = attention_dim  # dim. of projected item or query vector

    def _calculate_reactivity(self):
        print("This is base class method. It should be implemented in subclass")
        raise NotImplementedError

    def forward(self, query_vector, multiple_items, mask):
        """
        Inputs:
            query_vector:   (query_vector hidden_size)
            multiple_items: (batch_size, num_of_items, item_vector hidden_size)
            mask: (batch_size, num_of_items, num_of_items)  1 for valid item, 0 for invalid item
        Returns:
            blendded_vector: (batch_size, item_vector hidden_size)
            attention_scores: (batch_size, num_of_items)
        """
        assert mask is not None, "mask is required"

        # B : batch_size, N : number of multiple items, H : hidden size of item
        B, N, H = multiple_items.size() 
        
        # Fours Steps
        # 1) [reactivity] try to check the reactivity with ( item_t and query_vector ) N times
        # 2) [masking]    try to penalize invalid items such as <pad>
        # 3) [attention]  try to get proper attention scores (=propability form) over the reactivity scores
        # 4) [blend]      try to blend multiple items with attention scores

        # Step-1) reactivity
        reactivity_scores = self._calculate_reactivity(query_vector, multiple_items)

        # Step-2) masking
        # The mask marks valid positions so we invert it using `mask & 0`.
        # detail : check the masked_fill_() of pytorch : https://pytorch.org/docs/stable/tensors.html
        # reactivity_scores = [B, N]
        # mask              = [B, N]  <-- the shapes should be compatible to use .maskeD_fill_()
        reactivity_scores.data.masked_fill_(mask == 0, -float('inf'))  # 

        # Step-3) attention score
        attention_scores = F.softmax(reactivity_scores, dim=-1) # over the item dimensions  [B, #_of_items]

        # Step-4) blend multiple items
        # merge by weighted sum
        attention_scores = attention_scores.unsqueeze(1) # [B, 1, #_of_items]

        # [B, 1, #_of_items] * [B, #_of_items, dim_of_item] --> [B, 1, dim_of_item]
        blendded_vector = torch.matmul(attention_scores, multiple_items) 
        blendded_vector = blendded_vector.squeeze(1) # [B, dim_of_item] 

        return blendded_vector, attention_scores

In [None]:
class BahdanauAttention(Attention):
    """
    Attention > Additive Attention > Bahdanau approach 
    Inputs:
        query_vector: (hidden_size)
        multiple_items: (batch_size, num_of_items, hidden_size)
    Returns:
        blendded_vector: (batch_size, item_vector hidden_size)
        attention_scores: (batch_size, num_of_items)
    """
    def __init__(self, item_dim, query_dim, attention_dim):
        super(BahdanauAttention, self).__init__(item_dim, query_dim, attention_dim)
        print("Attention > Additive > Bahdanau")
        # parameter definition
        
        # W is used for project query to the attention dimension
        # U is used for project each item to the attention dimension
        self.W = nn.Linear(self.query_dim, self.attention_dim, bias=False)
        self.U = nn.Linear(self.item_dim, self.attention_dim, bias=False)
        
        # v is used for calculating attention score which is scalar value
        self.v = nn.Parameter(torch.randn(1, attention_dim, dtype=torch.float))

    def _calculate_reactivity(self, query_vector, multiple_items):
        B, N, H = multiple_items.shape  # [B,N,H]

        # linear projection is applied to the last dimension
        # ---------------
        # 채워 넣을 부분
        # ---------------

        # note that broadcasting is performed when adding different shape
        # ---------------
        # 채워 넣을 부분
        # ---------------

        query_vector = query_vector.unsqueeze(1)
        projected_q = self.W(query_vector)
        projected_item = self.U(multiple_items)
        added_pp = projected_q + projected_item
        tanh_pp = F.tanh(added_pp)
        
        v_t = self.v.transpose(1,0)
        batch_v = v_t.expand(B, self.attention_dim,1)
        reactivity_scores = torch.bmm(tanh_pp, batch_v)
        reactivity_scores = reactivity_scores.squeeze(-1)

        return reactivity_scores

### projected_q : [B,1,Q] -> [B,1,D]
### projected_item : [B,N,H] -> [B,N,D]
### added_pp : [B,1,D] + [B,N,D] -> [B,N,D]
### tanh_pp : [B,N,D]
### batch_v : [B,D,1]
### reactivity_score : [B,N,D] x [B,D,1] -> [B,N,1] -> squeeze(-1) -> [B,N]


##### query_vector에서 unsqueeze(1)로 두번째에 1인 차원을 추가하고 query의 attention dimention인 W를 곱한다. [B,1,Q] -> [B,1,D]

##### item의 attention dimention인 U를 곱한다. [B,N,H] -> [B,N,D]

##### project를 한 두 값(projected_q+projected_item)을 더해 added_pp에 저장한다. [B,1,D] + [B,N,D] -> [B,N,D]

##### 더한 값에 tanh 계산을 한 값을 tanh_pp에 저장한다. [B,N,D]

##### v를 transpose 해준 뒤 expand로 차원을 늘려준다. [B,D,1] 

##### tanh_pp와 batch_v를 곱하여 reactivity_score를 구하고 squeeze(-1)로 1인 차원을 없애 [B,N]으로 만들어준다. 
[B,N,D] x [B,D,1] -> [B,N,1] -> squeeze(-1) -> [B,N]





### [nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)

- **nn.Embedding(num_embeddings, embedding_dim, padding_idx=None)**
  - num_embeddings (int) – size of the dictionary of embeddings, voacb 갯수
  - embedding_dim (int) – the size of each embedding vector, embedding 시킬 벡터 차원


### [nn.LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html)

- **nn.LSTM(input_size, hidden_size, num_layers, bidirectional=True)**
  - input_size – The number of expected features in the input x
  - hidden_size – The number of features in the hidden state h
  - num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
  - bidirectional – If True, becomes a bidirectional LSTM. Default: False


In [None]:
# pl.LightningModule is inherited from the nn.Module
class Attention_Number_Finder(LightningModule): 
    def __init__(self, 
                 # network setting
                 input_vocab_size,
                 output_vocab_size,
                 emb_dim,      # number embedding
                 enc_dim,      # sequence embedding score 
                 att_dim,      # dim. in attemtion mechanism 
                 query_dim,    # query embedding dim. 
                 padding_idx,
                 # optiimzer setting
                 learning_rate=1e-3):
        super().__init__()
        # it store arguments to self.hparams.* 
        self.save_hyperparameters()  

        # symbol_number_character to vector_number
        self.digit_emb = nn.Embedding(self.hparams.input_vocab_size, 
                                      self.hparams.emb_dim, 
                                      padding_idx=self.hparams.padding_idx)

        self.query_emb = nn.Embedding(self.hparams.input_vocab_size, 
                                      self.hparams.query_dim, 
                                      padding_idx=self.hparams.padding_idx)

        # sequence encoder using RNN
        self.encoder = nn.LSTM(emb_dim, enc_dim, 
                            num_layers=2, 
                            bidirectional=True,
                            batch_first=True
                          )

        dim_of_sequence_enc = enc_dim * 2 # bidirectional LSTM

        # encoder-summarization
        self.dynamic_encoder_with_attention = BahdanauAttention(item_dim=dim_of_sequence_enc,
                                                                query_dim=query_dim,
                                                                attention_dim=att_dim)

        # [decoder]
        self.to_output = nn.Linear(dim_of_sequence_enc, self.hparams.output_vocab_size) # D -> a single number

        # loss
        self.criterion = nn.CrossEntropyLoss()  

    def forward(self, seq_ids, q_id, weight):
        # ----------------------- ENCODING -------------------------------#
        # [ Digit Character Embedding ]
        # seq_ids : [B, max_seq_len]
        seq_embs = self.digit_emb(seq_ids.long()) # [B, max_seq_len, emb_dim]

        # [ Sequence of Numbers Encoding ]
        seq_encs, _ = self.encoder(seq_embs) # [B, max_seq_len, enc_dim*2]  2 layers
        
        # with query (context)
        query_emb = self.query_emb(q_id) # [B, query_dim]

        # dynamic encoding-summarization (blending)
        query = query_emb
        multiple_items = seq_encs
        blendded_vector, attention_scores = self.dynamic_encoder_with_attention(query, multiple_items, mask=weight) # [B, #_of_items]
        # blendded_vector : [B, dim_of_sequence_enc]
        # attention_scores : [B, 1, max_seq_len] single query 

        # ----------------------- DECODING -------------------------------#
        logits = self.to_output(blendded_vector)
        return logits 

    def training_step(self, batch, batch_idx):
        seq_ids, q_id, weights, y_id = batch 
        logits = self(seq_ids, q_id, weights)  # [B, output_vocab_size]
        loss = self.criterion(logits, y_id.long()) 
        self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)

        # all logs are automatically stored for tensorboard
        return loss

    def validation_step(self, batch, batch_idx):
        seq_ids, q_id, weights, y_id = batch 

        logits = self(seq_ids, q_id, weights)  # [B, output_vocab_size]
        loss = self.criterion(logits, y_id.long()) 
        
        ## get predicted result
        prob = F.softmax(logits, dim=-1)
        acc = FM.accuracy(prob, y_id)
        metrics = {'val_acc': acc, 'val_loss': loss}
        self.log_dict(metrics)
        return metrics

    def validation_step_end(self, val_step_outputs):
        val_acc  = val_step_outputs['val_acc'].cpu()
        val_loss = val_step_outputs['val_loss'].cpu()

        self.log('validation_acc',  val_acc, prog_bar=True)
        self.log('validation_loss', val_loss, prog_bar=True)

    def test_step(self, batch, batch_idx):
        seq_ids, q_id, weights, y_id = batch 

        logits = self(seq_ids, q_id, weights)  # [B, output_vocab_size]
        loss = self.criterion(logits, y_id.long()) 
        
        ## get predicted result
        prob = F.softmax(logits, dim=-1)
        acc = FM.accuracy(prob, y_id)
        metrics = {'test_acc': acc, 'test_loss': loss}
        self.log_dict(metrics, on_epoch=True)
        return metrics

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate)
        return optimizer

In [None]:
# seed (랜덤시드 고정)
pl.seed_everything(1234)

# google drive mount 후 본인 drive 폴더 경로 내에 데이터 및 모델 저장
path = os.path.join('/content/drive/MyDrive/AISoftware')
model_folder = os.path.join(path, 'model')
if not os.path.exists(model_folder): os.makedirs(model_folder)

# ------------
# args
# ------------
batch_size = 200
emb_dim = 200   # digit (7) --> [....] (emb_dim)
enc_dim = 70    # encoder dimension 
att_dim = 50    # correspondingn to D in slides
query_dim = 200 # query dimension
learning_rate = 0.0001


# ------------
# data
# ------------
dm = NumberDataModule(batch_size)
iter(dm.train_dataloader()).next() # <for testing 


# ------------
# model
# ------------
model = Attention_Number_Finder(dm.input_vocab_size,
                                dm.output_vocab_size,
                                emb_dim,       # number embedding (digit to vector)
                                enc_dim,       # number sequence embedding 
                                att_dim,       # dim. in attemtion mechanism 
                                query_dim,     # query embedding dim. 
                                dm.padding_idx,
                                learning_rate)

# ------------
# training
# ------------
checkpoint_callback = ModelCheckpoint(monitor='val_loss', dirpath=model_folder, filename='{epoch:02d}-{val_loss:.2f}')
logger = TensorBoardLogger(model_folder, name='tensorboard')
trainer = Trainer(
    max_epochs=100, gpus = "0", auto_select_gpus=True,
    logger = logger,
    callbacks=[
            checkpoint_callback,
            LearningRateMonitor(logging_interval='step'),
            EarlyStopping(monitor='val_loss', verbose=True, patience=5)
            ],                      
    )
trainer.fit(model, datamodule=dm)

# ------------
# testing
# ------------
result = trainer.test(model, test_dataloaders=dm.test_dataloader())
print(result)

Global seed set to 1234
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
  f"Parsing of the Trainer argument gpus='{s}' (string) will change in the future."
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                           | Type              | Params
---------------------------------------------------------------------
0 | digit_emb                      | Embedding         | 2.2 K 
1 | query_emb                      | Embedding         | 2.2 K 
2 | encoder                        | LSTM              | 271 K 
3 | dynamic_encoder_with_attention | BahdanauAttention | 17.1 K
4 | to_output                      | Linear            | 1.3 K 
5 | criterion                      | CrossEntropyLoss  | 0     
---------------------------------------------------------------------
293 K     Trainable params
0         Non-trainable params
293 K     Total p

Attention > Additive > Bahdanau


Validation sanity check: 0it [00:00, ?it/s]

Global seed set to 1234


Training: -1it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Metric val_loss improved. New best score: 1.457


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 1.171 >= min_delta = 0.0. New best score: 0.286


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.200 >= min_delta = 0.0. New best score: 0.086


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.052 >= min_delta = 0.0. New best score: 0.034


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.014 >= min_delta = 0.0. New best score: 0.019


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.007 >= min_delta = 0.0. New best score: 0.012


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.003 >= min_delta = 0.0. New best score: 0.009


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.002 >= min_delta = 0.0. New best score: 0.006


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 0.005


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 0.004


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 0.003


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.001 >= min_delta = 0.0. New best score: 0.003


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.002


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.002


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.002


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.001


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000


Validating: 0it [00:00, ?it/s]

Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.000
  "`trainer.test(test_dataloaders)` is deprecated in v1.4 and will be removed in v1.6."
  f"DataModule.{name} has already been called, so it will not be called again. "
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 1.0, 'test_loss': 2.2344779608829413e-06}
--------------------------------------------------------------------------------
[{'test_acc': 1.0, 'test_loss': 2.2344779608829413e-06}]
