<a href="https://colab.research.google.com/github/BeefMILF/QA/blob/master/QA_baseline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Проект QA

## Yes/No Questions

Вы будете работать с корпусом BoolQ. Корпус состоит из вопросов, предполагающих бинарный ответ (да / нет), абзацев из Википедии,  содержащих ответ на вопрос, заголовка статьи, из которой извлечен абзац и непосредственно ответа (true / false).

Корпус описан в статье:

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina Toutanova
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

https://arxiv.org/abs/1905.10044


Корпус (train-dev split) доступен в репозитории проекта:  https://github.com/google-research-datasets/boolean-questions

Используйте для обучения train часть корпуса, для валидации и тестирования – dev часть. 

Каждый бонус пункт оцениватся в 1 балл. 

### Пример вопроса: 
question: is batman and robin a sequel to batman forever

title: Batman & Robin (film)

answer: true

passage: With the box office success of Batman Forever in June 1995, Warner Bros. immediately commissioned a sequel. They hired director Joel Schumacher and writer Akiva Goldsman to reprise their duties the following August, and decided it was best to fast track production for a June 1997 target release date, which is a break from the usual 3-year gap between films. Schumacher wanted to homage both the broad camp style of the 1960s television series and the work of Dick Sprang. The storyline of Batman & Robin was conceived by Schumacher and Goldsman during pre-production on A Time to Kill. Portions of Mr. Freeze's back-story were based on the Batman: The Animated Series episode ''Heart of Ice'', written by Paul Dini.

## Часть 1. Эксплоративный анализ
1. Посчитайте долю yes и no классов в корпусе
2. Оцените среднюю длину вопроса
3. Оцените среднюю длину параграфа
4. Предположите, по каким эвристикам были собраны вопросы (или найдите ответ в статье). Продемонстриуйте, как эти эвристики повлияли на структуру корпуса. 

## Часть 2. Baseline
1. Оцените accuracy точность совсем простого базового решения: присвоить каждой паре вопрос-ответ в dev части самый частый класс из train части
2. Оцените accuracy чуть более сложного базового решения: fasttext на текстах, состоящих из склееных вопросов и абзацев (' '.join([question, passage]))

Почему fasttext плохо справляется с этой задачей?

## Часть 3. Используем эмбеддинги предложений
1. Постройте BERT эмбеддинги вопроса и абзаца. Обучите логистическую регрессию на конкатенированных эмбеддингах вопроса и абзаца и оцените accuracy этого решения. 

[bonus] Используйте другие модели эмбеддингов, доступные, например, в библиотеке 🤗 Transformers. Какая модель эмбеддингов даст лучшие результаты?

[bonus] Предложите метод аугментации данных и продемонстрируйте его эффективность. 

## Часть 3. DrQA-подобная архитектура

Основана на статье: Reading Wikipedia to Answer Open-Domain Questions

Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes

https://arxiv.org/abs/1704.00051

Архитектура DrQA предложена для задачи SQuAD, но легко может быть адаптирована к текущему заданию. Модель состоит из следующих блоков:
1. Кодировщик абзаца [paragraph encoding] – LSTM, получаящая на вход вектора слов, состоящие из: 
* эмбеддинга слова (w2v или fasttext)
* дополнительных признаков-индикаторов, кодирующих в виде one-hot векторов часть речи слова, является ли оно именованной сущностью или нет, встречается ли слово в вопросе или нет 
* выровненного эмбеддинга вопроса, получаемого с использованием soft attention между эмбеддингами слов из абзаца и эмбеддингом вопроса.

$f_{align}(p_i) = \sum_j􏰂 a_{i,j} E(q_j)$, где $E(q_j)$ – эмбеддинг слова из вопроса. Формула для $a_{i,j}$ приведена в статье. 

2. Кодировщик вопроса [question encoding] – LSTM, получаящая на вход эмбеддинги слов из вопроса. Выход кодировщика: $q = 􏰂\sum_j􏰂  b_j q_j$. Формула для $b_{j}$ приведена в статье. 

3. Слой предсказания. 

Предложите, как можно было модифицировать последний слой предсказания в архитектуре DrQA, с учетом того, что итоговое предсказание – это метка yes / no, предсказание которой проще, чем предсказание спана ответа для SQuAD.

Оцените качество этой модели для решения задачи. 

[bonus] Замените входные эмбеддинги и все дополнительные признаки, используемые кодировщиками, на BERT эмбеддинги. Улучшит ли это качество результатов?

## Часть 4. BiDAF-подобная архитектура

Основана на статье: Bidirectional Attention Flow for Machine Comprehension

Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi

https://arxiv.org/abs/1611.01603

Архитектура BiDAF предложена для задачи SQuAD, но легко может быть адаптирована к текущему заданию. Модель состоит из следующих блоков:
1. Кодировщик  получает на вход два представления слова: эмбеддинг слова и полученное из CNN посимвольное представление слова. Кодировщики для вопроса и для параграфа одинаковы. 
2. Слой внимания (детальное описание приведено в статье, см. пункт Attention Flow Layer)
3. Промежуточный слой, который получает на вход контекстуализированные эмбеддинги слов из параграфа, состоящие из трех частей (выход кодировщика параграфа,   Query2Context (один вектор) и Context2Query (матрица) выравнивания

4. Слой предсказания. 

Предложите, как можно было модифицировать последний слой предсказания в архитектуре BiDAF, с учетом того, что итоговое предсказание – это метка yes / no, предсказание которой проще, чем предсказание спана ответа для SQuAD.

Оцените качество этой модели для решения задачи. 

[bonus] Замените входные эмбеддинги и все дополнительные признаки, используемые кодировщиками, на BERT эмбеддинги. Улучшит ли это качество результатов?

Сравнение DrQA и BiDAF:
    
![](https://www.researchgate.net/profile/Felix_Wu6/publication/321069852/figure/fig1/AS:560800147881984@1510716582560/Schematic-layouts-of-the-BiDAF-left-and-DrQA-right-architectures-We-propose-to.png)

## Часть 5. Итоги
Напишите краткое резюме проделанной работы. Сравните результаты всех разработанных моделей. Что помогло вам в выполнении работы, чего не хватало?

In [None]:
!gsutil cp gs://boolq/train.jsonl .
!gsutil cp gs://boolq/dev.jsonl .

In [None]:
pip install transformers

In [None]:
!git clone https://github.com/facebookresearch/fastText.git

In [None]:
cd fastText

In [None]:
pip install .

In [None]:
cd .. 

In [None]:
pip install -U catalyst

In [None]:
!pip install nlpaug python-dotenv

In [None]:
# if Your machine doesn't support FP16, comment these 4 lines below
!git clone https://github.com/NVIDIA/apex 
!pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex 
!rm -rf ./apex
FP16_PARAMS = dict(opt_level="O1") 

In [None]:
import random 

# Numpy & Pandas 
import numpy as np 
import pandas as pd 

from tqdm.notebook import tqdm

# Matplotlib & Seaborn
import matplotlib.pyplot as plt
import seaborn as sns 
sns.set()

# Sklearn 
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import f1_score, accuracy_score, confusion_matrix

# PyTorch 
import torch
from torch import nn 
from torch.nn import functional as F 
from torch.utils.data import TensorDataset, DataLoader, Dataset, RandomSampler, SequentialSampler, WeightedRandomSampler
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# FastText
import fasttext

# Transformers 
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

# Catalyst
from catalyst.dl import SupervisedRunner
from catalyst.dl.callbacks import AccuracyCallback, OptimizerCallback, F1ScoreCallback
from catalyst.dl.callbacks import CheckpointCallback, InferCallback
from catalyst.utils import set_global_seed, prepare_cudnn
from catalyst.dl import utils
from catalyst.dl import Callback, CallbackOrder, Runner
from catalyst.data import BalanceClassSampler


# NLPaug 
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
import nlpaug.flow as naf

from nlpaug.util import Action
from nlpaug.util.file.download import DownloadUtil

# Glove for word-augmentations
DownloadUtil.download_glove(model_name='glove.6B', dest_dir='.') # Download word2vec models

In [None]:
# Loading data
df_train = pd.read_json("/content/train.jsonl", lines=True, orient='records')
df_dev = pd.read_json("/content/dev.jsonl", lines=True, orient="records")

print(f'Train df size: {df_train.shape}')
print(f'Dev df size: {df_dev.shape}')

Train df size: (9427, 4)
Dev df size: (3270, 4)


In [None]:
model_type = 'glove'
model_path = 'glove.6B.50d.txt'


qaug = naf.Sometimes([
    nac.RandomCharAug(action="delete", aug_char_max=7),
    nac.RandomCharAug(action="insert", aug_char_max=7),
    naw.RandomWordAug(aug_max=5),
    naw.WordEmbsAug(model_type=model_type, model_path=model_path, action="substitute", aug_max=5),
    nac.OcrAug(aug_word_max=5),
    # naw.ContextualWordEmbsAug(model_path='bert-base-uncased', action="insert", aug_max=5)
])

paug = naf.Sometimes([
    nac.RandomCharAug(action="delete", aug_char_max=2),
    naw.RandomWordAug(aug_max=2),
    naw.WordEmbsAug(model_type=model_type, model_path=model_path, action="substitute", aug_max=2),
    nac.OcrAug(aug_word_max=2),
])


onehotter = OneHotEncoder(handle_unknown='ignore').fit(df_train.answer.values.astype(int).reshape(-1, 1))


# Pretrained tokenizer 
pretrained_model_name = 'roberta-base'
# pretrained_model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name) 

max_length = 256

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=481.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




In [None]:
p_aug = naf.Sequential([
                       naw.WordEmbsAug(model_type='glove', model_path='glove.6B.100d.txt', action="insert", aug_p=0.8), 
                       nac.RandomCharAug(action="delete", aug_char_p=0.5)
])


p_aug.augment(df_train.iloc[0].passage)

'cays Persian ( / ˈpɜːrʒən , - ʃən / ) , also known suraj by its endonym Farsi ( فارسی ārs ( fɒːɾˈsiː ) ( listen ) ) , fishhook is one of the Western upgrader Iranian languages within the Indo - Iranian branch of the Indo - European language family . It is primarily spoken in Iran , Afghanistan ( officially known as Dari since 1958 ) , and Tajikistan ( officially knw as Tajiki since the Soviet self - deception era ) , and rieit some other regions which historically dhirubhai were Persianate sieti and considered part of Greater Iran . It is written 5 , 048 . 62 in vratislav the Persian alphabet , a modified variant of the Arabic script , tonghua which itself evolved from the Aramaic pabt .'

In [None]:
def add_wrong_passage(x: pd.DataFrame, wrong_ratio=0.1):
    inds = np.random.choice(2, len(x), p=[1 - wrong_ratio, wrong_ratio]).astype(bool)
    oq = x.loc[inds]
    wp = x.loc[~inds]

    oq = oq.apply(lambda x: [x.question, 0, wp.sample().passage.item()], axis=1, result_type='expand')
    oq = oq.rename(columns=dict(zip(range(3), ['question', 'answer', 'passage'])))
    return oq 

def add_aug_passage(x: pd.DataFrame, ratio=0.1, aug=p_aug):
    samples = x.sample(frac=ratio)
    samples.passage = samples.passage.apply(p_aug.augment)
    return samples 

def aug_data(x, n_transforms=2):
    # Upsample with more wrong examples, passages are not original  
    samples1 = [add_wrong_passage(x) for i in range(n_transforms)]
    samples2 = [add_aug_passage(x) for i in range(n_transforms)]

    x = pd.concat([*samples1, *samples2, x])

    return x

def pair_encode(x):  
    encoded_pair = tokenizer.batch_encode_plus(zip(x.question, x.passage), max_length=max_length, pad_to_max_length=True, truncation_strategy="longest_first")
    return np.array(encoded_pair["input_ids"]), np.array(encoded_pair["attention_mask"]), x.answer.values


def encode_data(data: pd.DataFrame, mode='train'):
    data = data.astype({'answer': int})
    data = data.drop('title', axis=1)
    
    if mode == 'train': 
        data = aug_data(data)

    data = pair_encode(data)
    return [torch.tensor(x, dtype=torch.long) for x in data]


%time train_features_tensors = encode_data(df_train, mode='dev')
%time dev_features_tensors = encode_data(df_dev, mode='dev')

CPU times: user 7.09 s, sys: 111 ms, total: 7.2 s
Wall time: 7.57 s
CPU times: user 2.34 s, sys: 22.8 ms, total: 2.37 s
Wall time: 2.37 s


In [None]:
class BQDataset(Dataset): 
    def __init__(self, data):
        super().__init__()

        # self.ids, self.attn_masks, self.tgt, self.onehot = data
        self.ids, self.attn_masks, self.tgt = data

    def __len__(self): 
        return len(self.ids)

    def __getitem__(self, ind): 
        return {'features': self.ids[ind], 
                'attention_mask': self.attn_masks[ind], 
                'targets': self.tgt[ind],
                # 'onehot': self.onehot[ind]
                }

batch_size = 24

train_dataset = BQDataset(train_features_tensors)
dev_dataset = BQDataset(dev_features_tensors)

train_sampler = BalanceClassSampler(train_features_tensors[2], mode="upsampling")
dev_sampler = SequentialSampler(dev_dataset)

train_dataloader = DataLoader(train_dataset, sampler=train_sampler, batch_size=batch_size)
dev_dataloader = DataLoader(dev_dataset, sampler=dev_sampler, batch_size=batch_size)

len(train_dataloader), len(dev_dataloader)

(490, 137)

In [None]:
class Model(nn.Module): 
    def __init__(self, pretrained_model_name: str):
        super().__init__() 
        self.model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name)

    def n_trainable(self): 
        return sum([params.numel() for name, params in self.model.named_parameters() if params.requires_grad])

    def forward(self, features, attention_mask, targets):
        output = self.model(input_ids=features, 
                            attention_mask=attention_mask,
                            labels=targets)
        logits = output[1]
        return logits 


model = Model(pretrained_model_name)
model.to(device)
print(f'Trainable parameters: {model.n_trainable()}')

Trainable parameters: 125237762


In [None]:
class F1ScoreCallback(Callback):
    def __init__(
        self,
        input_key: str = 'targets',
        output_key: str = 'logits',
        activation: str = 'Sigmoid', 
        prefix: str = "f1_score",
    ):
        self.input_key = input_key
        self.output_key = output_key
        self.prefix = prefix

        super().__init__(CallbackOrder.Metric)

    def on_batch_end(self, state):
        y_true = state.input[self.input_key].detach().cpu().numpy()
        y_preds = state.output[self.output_key].detach().cpu().numpy().argmax(1)

        score = f1_score(y_true, y_preds)
        
        state.batch_metrics.update({self.prefix: score})


loaders = {
    'train': train_dataloader, 
    'valid': dev_dataloader,
    # 'test': test_dataloader,     
} 


set_global_seed(33)                       # reproducibility
prepare_cudnn(deterministic=True)           # reproducibility

LOG_DIR = './logs' 
epochs = 8
lr = 5e-5                     
acum_step = 4
num_cls = 2 
 
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.9)

runner = SupervisedRunner(
    input_key=(
        'features',
        'attention_mask', 
        'targets', 
    ),
    device=device, 
)

In [None]:
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=[
        AccuracyCallback(num_classes=num_cls),
        F1ScoreCallback(),
        OptimizerCallback(
            accumulation_steps=acum_step, 
            grad_clip_params={'func': 'clip_grad_value_', 'clip_value': 1}
            )
    ],
    main_metric="accuracy01",
    minimize_metric=False,
    # fp16=FP16_PARAMS,
    logdir=LOG_DIR,
    num_epochs=epochs,
    verbose=True, 
    load_best_on_end=True
)

Selected optimization level O0:  Pure FP32 training.

Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
1/8 * Epoch (train): 100% 585/585 [17:16<00:00,  1.77s/it, accuracy01=0.773, f1_score=0.828, loss=0.541]
1/8 * Epoch (valid): 100% 137/137 [01:28<00:00,  1.54it/s, accuracy01=0.833, f1_score=0.889, loss=0.435]
[2020-06-16 15:16:34,606] 
1/8 * Epoch 1 (_base): lr=5.000e-05 | momentum=0.9000
1/8 * Epoch 1 (train): accuracy01=0.6772 | f1_score=0.68


To get the last learning rate computed by the scheduler, please use `get_last_lr()`.



2/8 * Epoch (train): 100% 585/585 [17:17<00:00,  1.77s/it, accuracy01=0.909, f1_score=0.909, loss=0.237]
2/8 * Epoch (valid): 100% 137/137 [01:28<00:00,  1.54it/s, accuracy01=0.833, f1_score=0.889, loss=0.569]
[2020-06-16 15:37:21,929] 
2/8 * Epoch 2 (_base): lr=4.050e-05 | momentum=0.9000
2/8 * Epoch 2 (train): accuracy01=0.8208 | f1_score=0.8206 | loss=0.3999
2/8 * Epoch 2 (valid): accuracy01=0.7719 | f1_score=0.8168 | loss=0.5300
3/8 * Epoch (train): 100% 585/585 [17:16<00:00,  1.77s/it, accuracy01=0.955, f1_score=0.960, loss=0.062]
3/8 * Epoch (valid): 100% 137/137 [01:28<00:00,  1.54it/s, accuracy01=0.667, f1_score=0.750, loss=0.831]
[2020-06-16 15:58:07,264] 
3/8 * Epoch 3 (_base): lr=4.500e-05 | momentum=0.9000
3/8 * Epoch 3 (train): accuracy01=0.8967 | f1_score=0.8934 | loss=0.2579
3/8 * Epoch 3 (valid): accuracy01=0.7844 | f1_score=0.8317 | loss=0.6072
4/8 * Epoch (train): 100% 585/585 [17:15<00:00,  1.77s/it, accuracy01=0.955, f1_score=0.960, loss=0.183]
4/8 * Epoch (valid): 

In [None]:
def calc_accuracy(y_preds, y_true):
    acc = accuracy_score(y_true, y_preds)
    print(f'Accuracy: {acc:.3f}')


def calc_accuracy_per_cls(y_preds, y_true):
    tn, fp, fn, tp = confusion_matrix(y_true, y_preds).ravel()
    acc1, acc2 = tn / (tn + fn), tp / (tp + fp)
    print(f'Accuracy class 1: {acc1:.3f}, class 2: {acc2:.3f}')


def calc_f1_score(y_preds, y_true): 
    f1 = f1_score(y_true, y_preds)
    print(f'F1 score: {f1:.3f}') 


runner.infer(
    model=model,
    loaders={'test': dev_dataloader},
    callbacks=[
        CheckpointCallback(
            resume=f"{LOG_DIR}/checkpoints/best.pth"
        ),
        InferCallback(),
    ],   
    verbose=True
)


y_preds = runner.callbacks[0].predictions['logits'].argmax(1)
y_true = dev_features_tensors[2].cpu().numpy()

Selected optimization level O0:  Pure FP32 training.

Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
=> Loading checkpoint ./logs/checkpoints/best.pth
loaded state checkpoint ./logs/checkpoints/best.pth (global epoch 7, epoch 7, stage train)
1/1 * Epoch (test): 100% 137/137 [02:39<00:00,  1.16s/it]


In [None]:
calc_accuracy(y_preds, y_true)
calc_accuracy_per_cls(y_preds, y_true)
calc_f1_score(y_preds, y_true)

Accuracy: 0.797
Accuracy class 1: 0.739, class 2: 0.831
F1 score: 0.838


In [None]:
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=[
        AccuracyCallback(num_classes=num_cls),
        F1ScoreCallback(),
        OptimizerCallback(
            accumulation_steps=acum_step, 
            grad_clip_params={'func': 'clip_grad_value_', 'clip_value': 1}
            )
    ],
    main_metric="accuracy01",
    minimize_metric=False,
    fp16=FP16_PARAMS,
    logdir=LOG_DIR,
    num_epochs=epochs,
    verbose=True, 
    load_best_on_end=True
)

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
1/8 * Epoch (train):  64% 315/490 [18:05<09:59,  3.43s/it, accuracy01=0.542, f1_score=0.667, loss=0.758]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
1/8 * Epoch (train):  85% 415/490 [23:50<04:17,  3.44s/it, accuracy01=0.750, f1_score=0.700, loss=0.422]Gradient overflow.  Skipp


Seems like `optimizer.step()` has been overridden after learning rate scheduler initialization. Please, make sure to call `optimizer.step()` before `lr_scheduler.step()`. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate


To get the last learning rate computed by the scheduler, please use `get_last_lr()`.



2/8 * Epoch (train): 100% 490/490 [28:13<00:00,  3.46s/it, accuracy01=0.750, f1_score=0.769, loss=0.809]
2/8 * Epoch (valid): 100% 137/137 [02:38<00:00,  1.15s/it, accuracy01=1.000, f1_score=1.000, loss=0.197]
[2020-06-16 18:55:44,356] 
2/8 * Epoch 2 (_base): lr=4.050e-05 | momentum=0.9000
2/8 * Epoch 2 (train): accuracy01=0.7903 | f1_score=0.7816 | loss=0.4598
2/8 * Epoch 2 (valid): accuracy01=0.7713 | f1_score=0.8180 | loss=0.5150
3/8 * Epoch (train):  15% 75/490 [04:19<23:49,  3.45s/it, accuracy01=0.833, f1_score=0.875, loss=0.394]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
3/8 * Epoch (train): 100% 490/490 [28:15<00:00,  3.46s/it, accuracy01=0.667, f1_score=0.667, loss=0.637]
3/8 * Epoch (valid): 100% 137/137 [02:39<00:00,  1.17s/it, accuracy01=0.833, f1_score=0.889, loss=0.467]
[2020-06-16 19:28:41,123] 
3/8 * Epoch 3 (_base): lr=4.500e-05 | momentum=0.9000
3/8 * Epoch 3 (train): accuracy01=0.8827 | f1_score=0.8787 | loss=0.2968
3/8 * Epoch 3 (va

In [None]:
calc_accuracy(y_preds, y_true)
calc_accuracy_per_cls(y_preds, y_true)
calc_f1_score(y_preds, y_true)

Accuracy: 0.787
Accuracy class 1: 0.711, class 2: 0.837
F1 score: 0.827
