# BERTScorer

В этом ноутбуке будут рассмотрены примеры использования BERTScorer, который должен выдавать очки различным кандидатам.

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2
import sys
sys.path.append('..')

import numpy as np
from IPython.display import display

from transformers import BertForMaskedLM, BertTokenizer

from src.models.BERTScorer import bert_scorer

Загрузим модель и токенайзер.

In [3]:
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMaskedLM were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['cls.predictions.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Произведем инициализацию класса.

In [4]:
scorer = bert_scorer.BERTScorer(model, tokenizer)

Зададим пару предложений и кандидатов для скоринга.

In [5]:
sentence_wrong = (
    f'It is wrong sentence, there are '
    f'two {tokenizer.mask_token} mask tokens: {tokenizer.mask_token}'
)
sentence = f'London is the {tokenizer.mask_token} of Great Britain'
candidates = ['city', 'capital', 'human', 'think', 'asdf']

Протестируем некорректное предложение.

In [6]:
try:
    scorer(sentence_wrong, [])
except ValueError as e:
    print(e)

Where should be exactly one [MASK] token in a sentence.


Протестируем корректное предложение.

In [7]:
functions = {'mean': np.mean, 'sum': np.sum, 'min': np.min, 'max': np.max}
for func_name, func in functions.items():
    print(f'Function: {func_name}')
    display(scorer(sentence, candidates, agg_func=func))

Function: mean


{'city': -7.414884567260742,
 'capital': -8.707372665405273,
 'human': -9.257110595703125,
 'think': -8.56279182434082,
 'asdf': -11.57838773727417}

Function: sum


{'city': -7.414884567260742,
 'capital': -8.707372665405273,
 'human': -9.257110595703125,
 'think': -8.56279182434082,
 'asdf': -23.15677547454834}

Function: min


{'city': -7.414884567260742,
 'capital': -8.707372665405273,
 'human': -9.257110595703125,
 'think': -8.56279182434082,
 'asdf': -17.020278930664062}

Function: max


{'city': -7.414884567260742,
 'capital': -8.707372665405273,
 'human': -9.257110595703125,
 'think': -8.56279182434082,
 'asdf': -6.136496543884277}

В целом результат функций `mean`, `sum` выглядит разумно за исключением слишком высокой оценки для токена `think`. Он не подходит на это место, но его результат почему-то все равно высокий.