## Evaluate candidate models with BERTScore for contextual similarity to ground truth answer

##### Prerequisite 

In [None]:
%%capture

!pip install transformers==4.18.0
!pip install pandas==1.4.1
!pip install numpy==1.22.2
!pip install torch==1.8.1

#### Imports 

In [8]:
from transformers import GPT2Tokenizer
from transformers import set_seed
import transformers 
import pandas as pd
import numpy as np
import logging
import torch

##### Setup logging 

In [3]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

##### Log versions of dependencies 

In [4]:
logger.info(f'[Using transformers version: {transformers.__version__}]')
logger.info(f'[Using torch version: {torch.__version__}]')
logger.info(f'[Using pandas version: {pd.__version__}]')
logger.info(f'[Using numpy version: {np.__version__}]')

[Using transformers version: 4.18.0]
[Using torch version: 1.8.1+cu102]
[Using pandas version: 1.4.1]
[Using numpy version: 1.22.2]


#### Setup essentials 

In [5]:
set_seed(123)
np.random.seed(123)

In [6]:
BOS_TOKEN = '<|startoftext|>'
EOS_TOKEN = '<|endoftext|>'
PAD_TOKEN = '<|pad|>'
MAX_LEN = 512

#### Load custom tokenizer 

In [9]:
custom_tokenizer = GPT2Tokenizer.from_pretrained('../01-tokenize/vocab-custom', 
                                                 bos_token=BOS_TOKEN, 
                                                 eos_token=EOS_TOKEN, 
                                                 pad_token=PAD_TOKEN, 
                                                 return_tensors='pt')
custom_tokenizer.padding_side = 'left'
custom_tokenizer.model_max_length = MAX_LEN
logger.info(f'Custom Tokenizer: {custom_tokenizer}')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Custom Tokenizer: PreTrainedTokenizer(name_or_path='../01-tokenize/vocab-custom', vocab_size=50257, model_max_len=512, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': '<|pad|>'})


#### Load OOB tokenizer 

In [10]:
oob_tokenizer = GPT2Tokenizer.from_pretrained('gpt2', 
                                              bos_token=BOS_TOKEN, 
                                              eos_token=EOS_TOKEN, 
                                              pad_token=PAD_TOKEN, 
                                              return_tensors='pt')
oob_tokenizer.padding_side = 'left'
oob_tokenizer.model_max_length = MAX_LEN
logger.info(f'OOB Tokenizer: {oob_tokenizer}')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
OOB Tokenizer: PreTrainedTokenizer(name_or_path='gpt2', vocab_size=50257, model_max_len=512, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': '<|pad|>'})


#### Load custom GPT2 model

In [11]:
custom_model = transformers.AutoModelForCausalLM.from_pretrained('.././02-finetune/model/custom-finetuned')
_ = custom_model.eval()

#### Load OOB GPT2 model

In [12]:
oob_model = transformers.AutoModelForCausalLM.from_pretrained('.././02-finetune/model/oob-finetuned')
_ = oob_model.eval()

#### Load test set 

In [13]:
test_df = pd.read_csv('.././01-tokenize/data/faq_test.csv')
test_df

Unnamed: 0,question,answer
0,"i have a few symptoms like the stomachache, co...",stomach troubles aren't a common symptom of th...
1,social distancing & business operations during...,q. do you have best practices to share with re...
2,should i wear a respirator in public?,"most often, spread of respiratory viruses from..."
3,set boundaries so caring for the person doesn’...,it’s very important to continue taking care of...
4,what if my time off is not approved and i don’...,you will be treated just as you would if you d...
...,...,...
353,visit your local election website for vote by ...,"in most states, you’ll need to apply by a cert..."
354,i have developed a serology test kit for sars-...,all clinical tests should be validated prior t...
355,will vodka or other hard alcohols work as disi...,"vodka, or other hard alcohols, are not recomme..."
356,try to avoid talking about the virus all the t...,while the virus is probably on everyone’s mind...


#### Collect predicted responses

In [17]:
for _, row in test_df.iterrows():
    question, ground_truth = row
    print(question)
    # create a prompt in compliance with the one used during training without the answer part
    prompt = f'{BOS_TOKEN}question: {question}\nanswer:'
    # generate tokens
    input_ids = custom_tokenizer(prompt, return_tensors='pt').input_ids
    # predict response (answer)
    response = custom_model.generate(input_ids, 
                                     do_sample=True, 
                                     top_k=50, 
                                     max_length=512, 
                                     top_p=0.90, 
                                     temperature=0.2)
    # decode the predicted tokens into texts
    answer = custom_tokenizer.decode(response[0], skip_special_tokens=True)
    print(answer)
    print()
    break

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


i have a few symptoms like the stomachache, congestion, and diarrhea but no fever. is it possible i have the virus?
question: i have a few symptoms like the stomachache, congestion, and diarrhea but no fever. is it possible i have the virus?
answer: the virus that causes covid-19 is thought to spread mainly from person-to-person through respiratory droplets produced when an infected person coughs, sneezes, or talks. this is why it is so important to practice social distancing.

if you have symptoms of covid-19, you should stay home and monitor your health.
if you have been around someone who has covid-19, you should stay home and monitor your symptoms.
if you have been around someone who has covid-19, you should stay home and monitor your health.
if you have been around someone who has covid-19, you should stay home and monitor your health.
if you have been around someone who has covid-19, you should stay home and monitor your health.
if you have been around someone who has covid-19, y

In [None]:
i = 0
for text, label in zip(X_test, y_test):
    text = text[0:256]
    # create prompt (in compliance with the one used during training)
    prompt = f'<|startoftext|>question: {text}\nanswer:'
    
    
    # generate tokens
    generated = tokenizer(f"{prompt}", return_tensors="pt").input_ids

    # perform prediction
    sample_outputs = model.generate(generated, 
                                    do_sample=True, 
                                    top_k=50, 
                                    max_length=256, 
                                    top_p=0.90, 
                                    temperature=0.2)


    # decode the predicted tokens into texts
    pred_text  = tokenizer.decode(sample_outputs[0], skip_special_tokens=True)
    print(pred_text)