## Evaluate candidate models with BERTScore for contextual similarity to ground truth answer

##### Prerequisite 

In [2]:
%%capture

!pip install transformers==4.18.0
!pip install pandas==1.4.1
!pip install numpy==1.22.2
!pip install torch==1.8.1
!pip install evaluate==0.4.0
!pip install bert-score==0.3.12

#### Imports 

In [3]:
from transformers import GPT2Tokenizer
from transformers import set_seed
from evaluate import load
import transformers 
import pandas as pd
import numpy as np
import bert_score
import evaluate
import logging
import torch

##### Setup logging 

In [4]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

##### Log versions of dependencies 

In [5]:
logger.info(f'[Using transformers version: {transformers.__version__}]')
logger.info(f'[Using bert_score version: {bert_score.__version__}]')
logger.info(f'[Using evaluate version: {evaluate.__version__}]')
logger.info(f'[Using torch version: {torch.__version__}]')
logger.info(f'[Using pandas version: {pd.__version__}]')
logger.info(f'[Using numpy version: {np.__version__}]')

[Using transformers version: 4.18.0]
[Using bert_score version: 0.3.12]
[Using evaluate version: 0.4.0]
[Using torch version: 1.8.1+cu102]
[Using pandas version: 1.4.1]
[Using numpy version: 1.22.2]


#### Setup essentials 

In [6]:
set_seed(123)
np.random.seed(123)
pd.options.display.max_colwidth = None

In [7]:
BOS_TOKEN = '<|startoftext|>'
EOS_TOKEN = '<|endoftext|>'
PAD_TOKEN = '<|pad|>'
MAX_LEN = 512

In [8]:
bertscore = load('bertscore')

#### Load custom tokenizer 

In [9]:
custom_tokenizer = GPT2Tokenizer.from_pretrained('../01-tokenize/vocab-custom', 
                                                 bos_token=BOS_TOKEN, 
                                                 eos_token=EOS_TOKEN, 
                                                 pad_token=PAD_TOKEN, 
                                                 lower=True,
                                                 return_tensors='pt')
custom_tokenizer.padding_side = 'left'
custom_tokenizer.model_max_length = MAX_LEN
logger.info(f'Custom Tokenizer: {custom_tokenizer}')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Custom Tokenizer: PreTrainedTokenizer(name_or_path='../01-tokenize/vocab-custom', vocab_size=50257, model_max_len=512, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': '<|pad|>'})


#### Load OOB tokenizer 

In [10]:
oob_tokenizer = GPT2Tokenizer.from_pretrained('gpt2', 
                                              bos_token=BOS_TOKEN, 
                                              eos_token=EOS_TOKEN, 
                                              pad_token=PAD_TOKEN, 
                                              lower=True,
                                              return_tensors='pt')
oob_tokenizer.padding_side = 'left'
oob_tokenizer.model_max_length = MAX_LEN
logger.info(f'OOB Tokenizer: {oob_tokenizer}')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
OOB Tokenizer: PreTrainedTokenizer(name_or_path='gpt2', vocab_size=50257, model_max_len=512, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': '<|pad|>'})


#### Load custom GPT2 model

In [11]:
custom_model = transformers.AutoModelForCausalLM.from_pretrained('.././02-finetune/model/custom-finetuned')
_ = custom_model.eval()

#### Load OOB GPT2 model

In [12]:
oob_model = transformers.AutoModelForCausalLM.from_pretrained('.././02-finetune/model/oob-finetuned')
_ = oob_model.eval()

#### Load test set 

In [13]:
test_df = pd.read_csv('.././01-tokenize/data/faq_test.csv')
test_df.count()

question    107
answer      107
dtype: int64

#### Collect predicted responses

In [14]:
def predict(question: str, ground_truth: str, tokenizer: GPT2Tokenizer, model: transformers.AutoModelForCausalLM) -> str:
    # create a prompt in compliance with the one used during training without the answer part
    prompt = f'{BOS_TOKEN}question: {question}\nanswer:'
    # generate tokens
    input_ids = tokenizer(prompt, return_tensors='pt').input_ids
    # predict response (answer)
    gt_len = len(ground_truth.split())
    response = model.generate(input_ids, 
                              do_sample=True, 
                              top_k=1, 
                              min_new_tokens=gt_len * 2,
                              max_new_tokens=gt_len * 2, 
                              repetition_penalty=10.0,
                              length_penalty=-0.1,
                              top_p=1.0)
    # decode the predicted tokens into texts
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    answer = response_text.split('answer: ')[-1]
    return answer

In [15]:
custom_gpt2_answers = []
oob_gpt2_answers = []

for _, row in test_df.iterrows():
    question, ground_truth = row
    answer = predict(question, ground_truth, custom_tokenizer, custom_model)
    custom_gpt2_answers.append(answer)
    answer = predict(question, ground_truth, oob_tokenizer, oob_model)
    oob_gpt2_answers.append(answer)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_i

#### Compute BERTScore for the predictions against ground truth

In [16]:
bert_score_custom_gpt2 = bertscore.compute(predictions=custom_gpt2_answers, references=test_df['answer'].to_list(), lang='en')['f1']
bert_score_oob_gpt2 = bertscore.compute(predictions=oob_gpt2_answers, references=test_df['answer'].to_list(), lang='en')['f1']
reward = [x1 - x2 for x1, x2 in zip(bert_score_custom_gpt2, bert_score_oob_gpt2)]
    
test_df['custom_gpt2_answer'] = custom_gpt2_answers
test_df['oob_gpt2_answer'] = oob_gpt2_answers

test_df['bert_score_custom_gpt2'] = bert_score_custom_gpt2
test_df['bert_score_oob_gpt2'] = bert_score_oob_gpt2

test_df['reward'] = reward

In [17]:
test_df.head()

Unnamed: 0,question,answer,custom_gpt2_answer,oob_gpt2_answer,bert_score_custom_gpt2,bert_score_oob_gpt2,reward
0,"i have a few symptoms like the stomachache, congestion, and diarrhea but no fever. is it possible i have the virus?","stomach troubles aren't a common symptom of the coronavirus, but a fever is a key symptom, so it's unlikely that you have it. however, if you have any questions, call ahead to your doctor and make an appointment.",yes! while we are still learning about how covid-2019 affects people’s health more information can be found on cdc website (pdf),yes! you may be able to get covid-19 from eating raw meat or poultry that has been contaminated with sarsaparic acid (sARS) in your food source for at least 20 days after exposure so there are some signs of illness associated if this happens during those 30 day period when people eat foods high on antibiotics such as chicken breast soup instead,0.830807,0.819558,0.011249
1,what if my time off is not approved and i don’t come to work?,you will be treated just as you would if you did not show up for work any other time. employees failing to come to work without approved leave time may be subject to discipline per the applicable collective bargaining agreement or compensation plan.,"you can volunteer at a food bank or other community organization. make sure they have the supplies where their workers are, as well your insurance information so that when someone in an essential business like grocery stores becomes ill it doesn't affect them financially but cannot donate money directly toward those who need help with groceries during this pandemic (or else)","you can contact your employer or the local fire department for assistance with contacting their office regarding a potential employee who may be unable due leave. they will provide information about how long it takes them until an individual has been in place, such as when hours are reduced (e-mailing) onsite/out of state by telephone call; where employees should go out during this period without interruption while working from home but do so at least",0.81722,0.832336,-0.015117
2,where can i find more information about animals and covid-19?,"for more information, check out the following websites: cdc covid-19: pets and other animals cdc: frequently asked questions, covid-19 and animals cdc: interim guidance for public health professionals managing people with covid-19 in home care and isolation who have pets or other animals cdc: healthy pets, healthy people cdc: covid-19 daily life and coping, if you have animals avma: interim recommendations for intake of companion animals from households where humans with covid-19 are present","the centers for disease control (cdc) is constantly updating its website with resources to help people understand this new virus. if you or someone in your family has questions, reach out via email at afnrsa@vdh3sorg/coronavirus—or call 804‑234–7364 –and ask them on facebook messenger! they’ll answer any queries that come their way through these chats; otherwise we encourage anyone who's interested by reading what cdc guidance applies here.","the cdc has a list of animal health organizations that are working on prevention, treatment strategies for people with or without symptoms associatedcov infection. see https://www/cdphs’vegetarianscoronavirus/.",0.798086,0.80666,-0.008574
3,what precautions should i take during travel?,"during travel, everyone should clean hands frequently, cough or sneeze into a bent elbow or tissue, and try to maintain a physical distance of at least one metre from others. travelers should follow the recommendations of the travel authorities regarding policies in the airport and of the airline for the flight.","if you have been in close contact with someone suspected of having covid-19, or are feeling unwell (like coughing), avoid touching your face. this includes phones and other electronic devices that might be used by people who were not well before the outbreak began—something like shaking hands; washing clothes often after going to work while wearing a mask.; covering coughs/stressed mouths when sneezing cannot rubbed on surfaces as an extra precaution against coronavirusorish materialismshould always go hand over mouthpiece!","you can follow the guidance from your healthcare provider if a person is sick with covid-19 and they are traveling to or arriving at an area where there has been no confirmed case of coronavirus. for more information, visit https://www2ndcov/traveling/.",0.823299,0.826669,-0.003371
4,use a contactless payment method if you can.,"to avoid spreading germs during a cash or credit card transaction, look into whether your grocery store accepts any forms of payment that don't require you to touch a keypad or grab a handful of change. for instance, point-of-sale systems feature a type of technology that allows you to simply wave your card over a card reader in order to pay for your goods. some places may also accept mobile payments at the register through services like paypal, apple pay, or google pay.","many stores have apps that allow shoppers to pay with their phones, so try using the app instead of cash or credit cards as an alternative way for shopping carts and food items during this pandemic period!click here","check your bank account or credit card number to see what kind of payments are available for the purchase and processing fee that will be charged on each transaction, including cashier's checks (if any), debit/credit cards with an expiration date in excess days due dates listed above as well coinsurance claims made by customers who have not yet paid their balances but wish they could receive them via e-mail within 30 calendar day after receiving it from us at this time",0.863554,0.825531,0.038023


In [18]:
np.mean(test_df['bert_score_custom_gpt2'])

0.8332844811065174

In [19]:
np.mean(test_df['bert_score_oob_gpt2'])

0.8321203338765653

#### Write evaluation results to local dir

In [20]:
test_df.to_csv('./data/eval_results.csv', index=False)

In [25]:
df =pd.read_csv('./data/eval_results.csv')
df = df.sample(16)
df.head()

Unnamed: 0,question,answer,custom_gpt2_answer,oob_gpt2_answer,bert_score_custom_gpt2,bert_score_oob_gpt2,reward
74,how do i know if a respirator is falsely advertising niosh-approval?,"when niosh becomes aware of counterfeit respirators or those misrepresenting niosh approval on the market, these respirators are posted on the counterfeit respirators / misrepresentation of niosh-approval webpage to alert users, purchasers, and manufacturers.","the fda does not have an exhaustive list of all counterfeit or fraudulent products. for information about respirators that are marketed as medical gloves, please visit vdh’s webpage at httpsnjp3999maskinfo@ventshowardorregistration and investigations/newsfeed/.",the nih website for respiratory masks and other products states that it does not recommend wearing an electronic mask. please see https://www2ndlinkinjuryinfo/nifa_coronavirus/.,0.838215,0.818146,0.020069
11,return to your normal schedule if your results are negative.,"check with your family members and roommates to see if they tested positive or negative. if your whole household is negative and you’re feeling healthy, you don’t need to worry about quarantining yourself. double-check with your employer to make sure you’re allowed to return to work, even if your results are negative.","you may be surprised at how much time has passed since the outbreak began, but it's still important for everyone involved in this process maintain their focus and stick with what they have been doing throughout most of 2020—something that will likely continue even after a vaccine or drug is available (or when one becomes widely distributed). remember not just about getting back into shape quickly; try working harder each day as best possible until things get better!","you may be able recover from the virus’s effects on symptoms, but it is important that everyone in a healthy and productive way stay home during this time of recovery so they can continue working hard for their health care provider or other healthcare providers who need them most! remember not only do these services have limited resources available at times when there's no medication being delivered (such as those with covid-19), however – even though some people might experience mild side affects like nausea/ vomiting after taking medications such medicines",0.831432,0.823625,0.007807
20,what if i can’t afford face coverings?,"if an individual cannot afford to buy face coverings, a cloth face covering may be sewn by hand or can be improvised from household items such as scarfs, t-shirts, sweatshirts or towels.","you may be able to buy or wear cloth masks, but they are not required for the general public.masks should only ever cost $100 a piece (or about us70% of retail price).you might also have trouble getting them in some places where there isn't an inventory; stores and businesses need restocked items","you may be able to purchase a mask that covers your nose, mouth and eyes. some masks are available for under $ “$20 or more at the department of health website (www). they include facemasks with visor-less liner covering which allows people wearing goggles over their faces without having them on",0.843377,0.848956,-0.005579
105,how can i help my brother deal with boredom-triggered anger episodes?,take up an activity or hobby that you and your brother would like. this will limit boredom and make time go by quicker.,"try to find a way for your siblings and their friends not feeling so desperate. you could also call them “motors” or just something else that’s causing frustration, like ventriloqui (or the term used","you should always be mindful of your emotions and feelings.’t let them get in the way, especially if they are causing anxiety or frustration about things that aren't going well for him right now! try to keep calm",0.846853,0.851486,-0.004634
86,"twice last year i was hospitalized for sepsis that was in my bloodstream. i have copd. i started coughing, have difficulty breathing and a low grade fever that continues to increase. what should i do?","if you believe you have covid-19, notify the health authorities and isolate yourself for 14 days. coughing, difficulty breathing, and a fever are symptoms associated with the coronavirus. phone your doctor for advice as to where to seek help.","if you are immunocompromised or develop other respiratory issues like asthma which compromises your immune system (such as pneumonia), it is important first contact with the health care provider right away so they can decide whether there's more time available before resuming normal activities suchas going out on public transportation where possible safely..","if you are sick with covid-19 or other respiratory symptoms like shortness of breath (low blood pressure), coughs/shortnesses on your chest pain due to the virus spread from person who has it can be life threatening; severe abdominal distention is common but not serious enough so don't panic! try taking antibiotics as soon after exposure because they may help prevent further complications such pneumonia",0.829356,0.849139,-0.019783


In [26]:
df.to_csv('./data/eval_results_sample.csv', index=False)