## Optimizing the finetuned custom GPT2 using Reinforcement Learning from Human Feedback (RLHF) 

Instead of human feedback as a reward mechanism, we use a text generation evaluation metric like `BERTScore` to automate human evaluation. 

##### Prerequisite

In [None]:
%%capture

!pip install jupyter==1.0.0
!pip install ipywidgets==8.0.4
!pip install transformers==4.26.0
!pip install datasets==2.9.0
!pip install wandb==0.13.9
!pip install evaluate==0.4.0
!pip install bert-score==0.3.12
!pip install -e git+https://arunprsh:43211b1b75fad82266961eff3b85a061b53daae5@github.com/lvwerra/trl.git@v0.2.1#egg=trl

#### Imports 

In [3]:
from trl import AutoModelForCausalLMWithValueHead
from transformers import GPT2Tokenizer
from transformers import set_seed
from datasets import load_dataset
from transformers import pipeline
from datasets import Dataset
from random import choices
from trl import PPOTrainer
from trl import PPOConfig
from evaluate import load
from tqdm import tqdm
import transformers 
import pandas as pd
import numpy as np
import bert_score
import ipywidgets
import datasets
import evaluate
import logging
import jupyter
import random
import torch
import wandb
import time
import trl
import os

##### Setup logging

In [4]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

##### Log versions of dependencies 

In [5]:
logger.info(f'[Using transformers version: {transformers.__version__}]')
logger.info(f'[Using bert_score version: {bert_score.__version__}]')
logger.info(f'[Using evaluate version: {evaluate.__version__}]')
logger.info(f'[Using datasets version: {datasets.__version__}]')
logger.info(f'[Using wandb version: {wandb.__version__}]')
logger.info(f'[Using trl version: {trl.__version__}]')

[Using transformers version: 4.18.0]
[Using bert_score version: 0.3.12]
[Using evaluate version: 0.4.0]
[Using datasets version: 2.9.0]
[Using wandb version: 0.13.9]
[Using trl version: 0.2.1]


#### Setup essentials 

In [6]:
pd.options.display.max_colwidth = None
np.random.seed(123)
tqdm.pandas()
set_seed(123)

In [7]:
!wandb login 8489739d838b89d2f424147f354f9db40517c1c9

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [8]:
path = os.path.abspath('01-rlhf.ipynb')
os.environ['WANDB_NOTEBOOK_NAME'] = path

In [9]:
bertscore = load('bertscore')

##### Set constants 

In [10]:
MODEL_PATH = '.././02-finetune/model/custom-finetuned'
BOS_TOKEN = '<|startoftext|>'
EOS_TOKEN = '<|endoftext|>'
PAD_TOKEN = '<|pad|>'
MAX_LEN = 512

FORWARD_BATCH_SIZE = 16
BATCH_SIZE = FORWARD_BATCH_SIZE * 2

##### Setup configs

In [11]:
config = PPOConfig(model_name=MODEL_PATH, 
                   batch_size=BATCH_SIZE,
                   learning_rate=1.41e-5,
                   forward_batch_size=FORWARD_BATCH_SIZE,
                   remove_unused_columns=False,
                   log_with='wandb')

#### Load models 

In [12]:
active_model = AutoModelForCausalLMWithValueHead.from_pretrained(MODEL_PATH)

In [13]:
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(MODEL_PATH)

#### Load tokenizer 

In [14]:
tokenizer = GPT2Tokenizer.from_pretrained('../01-tokenize/vocab-custom', 
                                          bos_token=BOS_TOKEN, 
                                          eos_token=EOS_TOKEN, 
                                          pad_token=PAD_TOKEN, 
                                          lower=True,
                                          return_tensors='pt')
# tokenizer.padding_side = 'left'
tokenizer.model_max_length = MAX_LEN
logger.info(f'Tokenizer: {tokenizer}')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Tokenizer: PreTrainedTokenizer(name_or_path='../01-tokenize/vocab-custom', vocab_size=50257, model_max_len=512, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': '<|pad|>'})


#### Load dataset

In [15]:
dataset = load_dataset('csv', 
                       data_files='.././01-tokenize/data/faq_test.csv',  
                       delimiter=',', 
                       split='train[:100%]',
                       download_mode='force_redownload')
dataset

Using custom data configuration default-128f60e33d0bd468


Downloading and preparing dataset csv/default (download: 237.42 KiB, generated: 240.12 KiB, post-processed: Unknown size, total: 477.54 KiB) to /root/.cache/huggingface/datasets/csv/default-128f60e33d0bd468/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/681 [00:00<?, ? examples/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-128f60e33d0bd468/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317. Subsequent calls will reuse this data.


Dataset({
    features: ['question', 'answer'],
    num_rows: 681
})

In [16]:
def tokenize(samples: list):
    questions = samples['question']
    ground_truth = samples['answer']
    
    input_ids = []
    query = []
    
    for question in questions:
        prompted_input = f'question: {question}\nanswer:'
        query.append(prompted_input)
        tokenized_input = tokenizer(prompted_input, 
                                    truncation=True)
        input_ids.append(torch.tensor(tokenized_input['input_ids'], dtype=torch.long))
        
    return {'input_ids': input_ids, 'query': query, 'ground_truth': ground_truth}

In [17]:
dataset = dataset.map(tokenize, 
                      batched=True, 
                      #num_proc=num_proc, 
                      load_from_cache_file=False, 
                      remove_columns=['question', 'answer'])
dataset.set_format('pt', 
                   columns=['input_ids', 'query', 'ground_truth'],
                   output_all_columns=True)
dataset

  0%|          | 0/1 [00:00<?, ?ba/s]

Dataset({
    features: ['input_ids', 'query', 'ground_truth'],
    num_rows: 681
})

##### Create data collator

In [18]:
def collator(dataset):
    result = {}
    for key in dataset[0]:
        values = []
        for d in dataset:
            values.append(d[key])
        result[key] = values
    return result

#### Create Trainer for PPO (Proximal Policy Optimization)

In [19]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [20]:
ppo_trainer = PPOTrainer(config, active_model, ref_model, tokenizer, dataset=dataset, data_collator=collator)

[34m[1mwandb[0m: Currently logged in as: [33mshankar-arunp[0m. Use [1m`wandb login --relogin`[0m to force relogin


#### Define CTRL tokens 

In [21]:
ctrl_str = ['[bad]', '[neutral]', '[good]']
ctrl_tokens = dict((s, tokenizer.encode(s, return_tensors='pt').squeeze().to(device)) for s in ctrl_str)
ctrl_tokens

{'[bad]': tensor([   59, 32171,    61], device='cuda:0'),
 '[neutral]': tensor([   59, 17337,    61], device='cuda:0'),
 '[good]': tensor([   59, 13071,    61], device='cuda:0')}

#### Load BERT Pipeline from evaluation phase to generate reward logits 

In [22]:
bert_pipe = pipeline('sentiment-analysis', 
                     model='.././03-evaluate/model', 
                     return_all_scores=True)

#### Define Reward function

* Since Pipeline outputs pre-softmax logits, resulting probability values depend on the difference between positive and negative logits, not the absolute values. <br>
* It's preferred to have `pos_reward = pos_logit - neg_logit` and `neg_reward = neg_logit - pos_logit`.
* For the neutral reward, `abs(pos_logit - neg_logit)` should be minimized to get a response that most confuses the sentiment pipeline. so `-abs(pos_logit - neg_logit)` might be a good choice.

In [23]:
def logits_to_reward(logits, task):
    """
    Take the good and bad logits and scale it for the task.
    
        task [bad]: reward = neg_logit - pos_logit
        task [neutral]: reward = -abs(pos_logit - neg_logit) + 4
        task [good]: reward = pos_logit - neg_logit
        
    logits : List of tensors (bad_logit, good_logit)
    """
    rewards = list()
    for i in range(len(logits)):
        if task[i]=='[bad]':
            rewards.append(logits[i][0] - logits[i][1])
        elif task[i]=='[neutral]':
            rewards.append(-torch.abs(logits[i][0] - logits[i][1]) + 4)
        elif task[i]=='[good]':
            rewards.append(logits[i][1] - logits[i][0])
        else:
            raise ValueError('task has to be in [0, 1, 2]!')
    return rewards

#### Training Loop

In [24]:
for epoch in range(1):
    for i, batch in tqdm(enumerate(ppo_trainer.dataloader)):
        if len(batch['input_ids']) == BATCH_SIZE:
            logger.info(f'Epoch = {epoch+1} | Batch = {i+1} | Size = {BATCH_SIZE}')
            logs, game_data,  = dict(), dict()
            
            task_list = choices(ctrl_str, k=BATCH_SIZE)
            game_data['query'] = [t+q for t,q in zip(task_list, batch['query'])]
            query_tensors = [torch.cat((ctrl_tokens[t], input_ids)) for t, input_ids in zip(task_list, batch['input_ids'])]
            
            bert_scores = []
            ground_truth_responses = batch['ground_truth']
            response_tensors = []

            for query, ground_truth_response in zip(query_tensors, ground_truth_responses):
                gt_len = len(ground_truth_response.split())
                response = ppo_trainer.generate(query, 
                                                do_sample=True, 
                                                top_k=1, 
                                                min_new_tokens=gt_len,
                                                max_new_tokens=gt_len, 
                                                repetition_penalty=10.0,
                                                length_penalty=-0.1,
                                                pad_token_id=tokenizer.eos_token_id,
                                                eos_token_id=-1,
                                                top_p=1.0)
                response_tensors.append(response.squeeze())
                
            game_data['response'] = [tokenizer.decode(response, skip_special_tokens=True) for response in response_tensors]

            pipe_outputs = bert_pipe(game_data['response'])
       
                
            logits = [torch.tensor([output[0]['score'], output[1]['score']]) for output in pipe_outputs]
            rewards = logits_to_reward(logits, task_list)
            
            stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
            ppo_trainer.log_stats(stats, game_data, rewards)

0it [00:00, ?it/s]Epoch = 1 | Batch = 1 | Size = 32
1it [00:26, 26.54s/it]Epoch = 1 | Batch = 2 | Size = 32
2it [00:52, 26.49s/it]Epoch = 1 | Batch = 3 | Size = 32
3it [01:18, 26.27s/it]Epoch = 1 | Batch = 4 | Size = 32
4it [01:45, 26.45s/it]Epoch = 1 | Batch = 5 | Size = 32
5it [02:13, 26.76s/it]Epoch = 1 | Batch = 6 | Size = 32
6it [02:40, 26.84s/it]Epoch = 1 | Batch = 7 | Size = 32
7it [03:07, 27.14s/it]Epoch = 1 | Batch = 8 | Size = 32
8it [03:34, 27.09s/it]Epoch = 1 | Batch = 9 | Size = 32
9it [04:01, 27.06s/it]Epoch = 1 | Batch = 10 | Size = 32
10it [04:27, 26.64s/it]Epoch = 1 | Batch = 11 | Size = 32
11it [04:55, 27.04s/it]Epoch = 1 | Batch = 12 | Size = 32
12it [05:21, 26.79s/it]Epoch = 1 | Batch = 13 | Size = 32
13it [05:47, 26.58s/it]Epoch = 1 | Batch = 14 | Size = 32
14it [06:15, 26.87s/it]Epoch = 1 | Batch = 15 | Size = 32
15it [06:41, 26.82s/it]Epoch = 1 | Batch = 16 | Size = 32
16it [07:10, 27.21s/it]Epoch = 1 | Batch = 17 | Size = 32
17it [07:36, 27.11s/it]Epoch = 1 | Ba

##### Save optimized PPO model to local dir

In [25]:
active_model.save_pretrained('./model/gpt2-ppo-bertscore')
tokenizer.save_pretrained('./model/gpt2-ppo-bertscore')

('./model/gpt2-ppo-bertscore/tokenizer_config.json',
 './model/gpt2-ppo-bertscore/special_tokens_map.json',
 './model/gpt2-ppo-bertscore/vocab.json',
 './model/gpt2-ppo-bertscore/merges.txt',
 './model/gpt2-ppo-bertscore/added_tokens.json')

### Compare the PPO tuned models with the reference GPT2 model 

In [26]:
active_model = AutoModelForCausalLMWithValueHead.from_pretrained('./model/gpt2-ppo-bertscore')

Some weights of the model checkpoint at ./model/gpt2-ppo-bertscore were not used when initializing GPT2LMHeadModel: ['v_head.summary.bias', 'v_head.summary.weight']
- This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPT2LMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [27]:
test_df = pd.read_csv('.././01-tokenize/data/faq_test.csv')
test_df = test_df.sample(10)
test_df.count()

question    10
answer      10
dtype: int64

In [28]:
def predict(question: str, ground_truth: str, tokenizer: GPT2Tokenizer, model: AutoModelForCausalLMWithValueHead) -> str:
    # create a prompt in compliance with the one used during training without the answer part
    prompt = f'question: {question}\nanswer:'
    # generate tokens
    input_ids = tokenizer(prompt, return_tensors='pt').input_ids
    input_ids = input_ids.to('cuda:0')
    # predict response (answer)
    gt_len = len(ground_truth.split())
    model.to(device)
    response = model.generate(input_ids, 
                              do_sample=True, 
                              top_k=1, 
                              min_new_tokens=gt_len,
                              max_new_tokens=gt_len, 
                              repetition_penalty=10.0,
                              length_penalty=-0.1,
                              pad_token_id=tokenizer.eos_token_id,
                              eos_token_id=-1,
                              top_p=1.0)
    # decode the predicted tokens into texts
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    answer = response_text.split('answer: ')[-1]
    return answer

In [29]:
ref_gpt2_answers = []
ppo_gpt2_answers_good = []
ppo_gpt2_answers_bad = []
ppo_gpt2_answers_neutral = []

for _, row in test_df.iterrows():
    question, ground_truth = row
    answer = predict(question, ground_truth, tokenizer, ref_model)
    ref_gpt2_answers.append(answer)
    
    answer = predict('[good]'+question, ground_truth, tokenizer, active_model)
    ppo_gpt2_answers_good.append(answer)
    
    answer = predict('[bad]'+question, ground_truth, tokenizer, active_model)
    ppo_gpt2_answers_bad.append(answer)
    
    answer = predict('[neutral]'+question, ground_truth, tokenizer, active_model)
    ppo_gpt2_answers_neutral.append(answer)

In [30]:
bert_score_ref_gpt2 = bertscore.compute(predictions=ref_gpt2_answers, references=test_df['answer'].to_list(), lang='en')['f1']
bert_score_ppo_gpt2_good = bertscore.compute(predictions=ppo_gpt2_answers_good, references=test_df['answer'].to_list(), lang='en')['f1']
bert_score_ppo_gpt2_bad = bertscore.compute(predictions=ppo_gpt2_answers_bad, references=test_df['answer'].to_list(), lang='en')['f1']
bert_score_ppo_gpt2_neutral = bertscore.compute(predictions=ppo_gpt2_answers_neutral, references=test_df['answer'].to_list(), lang='en')['f1']

test_df['ref_gpt2_answers'] = ref_gpt2_answers
test_df['ppo_gpt2_answers_good'] = ppo_gpt2_answers_good
test_df['ppo_gpt2_answers_bad'] = ppo_gpt2_answers_bad
test_df['ppo_gpt2_answers_neutral'] = ppo_gpt2_answers_neutral

test_df['bert_score_ref_gpt2'] = bert_score_ref_gpt2
test_df['bert_score_ppo_gpt2_good'] = bert_score_ppo_gpt2_good
test_df['bert_score_ppo_gpt2_bad'] = bert_score_ppo_gpt2_bad
test_df['bert_score_ppo_gpt2_neutral'] = bert_score_ppo_gpt2_neutral

In [31]:
test_df.head(50)

Unnamed: 0,question,answer,ref_gpt2_answers,ppo_gpt2_answers_good,ppo_gpt2_answers_bad,ppo_gpt2_answers_neutral,bert_score_ref_gpt2,bert_score_ppo_gpt2_good,bert_score_ppo_gpt2_bad,bert_score_ppo_gpt2_neutral
631,what if i was due to start training but the school canceled the training program?,"there are measures in place to assure trade affected workers meet their training enrollment deadline.if all criteria are met, a “training unavailable” waiver can be applied to secure your benefits.your career advisor from the pa careerlink® office will contact you via email with instructions on your next steps.","you should contact your local health department or state public education agency. please consult with their medical professional for advice about rescheduling, and follow up on any changes that may be needed in order not later than 30 days after starting work (see guidance from","“ we are still learning about how covid-19 will affect our students’ education. it is not a new issue, and there has been no change in this since march 2020 when schools were closed for an entire week or more than 30 days","“ we are still learning about how covid-19 will affect our students’ education. it is not a new issue, since there has been an increase in child and teacher engagement sessions that have increased by 30% or more than 50%. this","“ no. all schools are still learning that they have a virtual meeting or conference, and this is especially important since many students will be using their own devices for remote work from home (e). any other means of communication with local community",0.832329,0.832547,0.826614,0.82462
375,get dressed like you’re going to class in person.,"if you stay in your pajamas all day, it’s going to be super tough to pay attention in class. try waking up, eating breakfast, taking a shower, and putting on real clothes to get yourself in the school mindset. you can even put on a pair of shoes to really trick your brain into thinking you’re going somewhere.","if your school is closed, try using a dress code that says “no clothes or shoes!” then go on the shopping list and see what they want for their kids instead of just buying pajamas from home (or anything else). this will help them feel more comfortable when it's raining","if the covid-19 virus is not a “ cure, ” then it should be put on your own clothes and have them ready for use by people who are sick with or need medical treatment (e).•s health care providers will decide how much time they can take action against any potential","if the covid-19 virus is not a “ cure, ” how do it? what are they doing with this disease and where can we find out more about them please see our website for information on who will be working from home during quarantine or when there may have been an outbreak of respiratory","“ the covid-19 virus is not a physical health issue, but it has been an ongoing public mental healthcare crisis since march 2020 and will continue until there are no new cases of this disease or cure for that? ” if people have any questions about how much time they should be able at",0.848277,0.811831,0.809941,0.805236
131,can i still apply for my educator certification or complete my educator preparation program (epps),"for questions about educator certification or the new jersey department of education's education preparation programs, please see the department's resource page on educator preparation programs and certification. source: https://www.nj.gov/education/covid19/teacherresources/eppcert.shtml","yes, you may be eligible to receive your education. if so please contact the funding icuc career development center at 1-800‑4255","if you’re not sure how to use the flex. it is also important that your employer provides a link-up with an online application,","if you’re not sure how to reach the training requirement, check out your local network. “ we recommend that our community members work with","the covid-19 response is not a “ replacement ” to an existing work schedule. if you’re working from home, your employer will consider",0.822889,0.822223,0.816059,0.811255
573,i have a reason i cannot wear a face covering. am i required to document or prove that?,"no, the order does not require that. please do not contact your healthcare provider to obtain documentation.","yes, you can use cloth masks if they are not medical-grade and don’","no, not if the person who is wearing an mask has been exposed and should be","if you are not sure how the mask will be worn, do your own research on","the cdc is not recommending any use of this, but it has encouraged people who are",0.859132,0.854103,0.866783,0.862251
676,should a mask be worn during school?,"evidence continues to mount on the importance of universal face coverings in interrupting the spread of sars-cov-2. school staff and older students (middle or high school) may be able to wear cloth face coverings safely and consistently and should be encouraged to do so. children under 2 years and anyone who has trouble breathing or is unconscious, incapacitated, or otherwise unable to remove a face covering without assistance should not wear cloth face coverings.","yes, masks are required in schools. if you have questions about wearing face coverings and the rules around masking (including when children under 2 years old can not wear them), please contact your local health department or public education office to discuss ways that they might help protect others from covid-19 by covering their nose with an n95 respirator over its chinache valve for at least","the cdc has said that masks should not have to include any “ air-conditioning systems.� if you’re sick, use your own hand sanitizer or water before it is too late (e). avoid touching anything with anyone who might touch an area where there are no local health care facilities and can spread covid19 in general; however they may also help prevent further transmission","the cdc has said that “ masks are not effective in preventing covid-19, but they can prevent you from wearing them. if your child is sick with respiratory illnesses like flu or pneumonia and should use any medical treatment to protect yourself against coronavirus ( see for example). avoid touching anything where there may have been an outbreak of disease on our hands since it was first reported last","the cdc is now recommending that masks are used to wear in schools. this means, if everyone wears them at all times on an annual basis (e) they would have no reason not only of reducing covid-19 transmission but also help prevent further spread from people who do still need it and could develop serious symptoms such as respiratory disease or death; however there may always change",0.839208,0.820448,0.830173,0.830392
491,check in on your loved ones in the morning.,"consider chatting with people while you eat meals or at other downtime during the day. if they are also working remotely this should be easy to make happen. if you or your family gets the virus, that becomes the first priority.","if you’re feeling anxious, call ahead to see how they are doing and talk with them about what is going right now! try setting up a video chat or phone conversation so that everyone can stay calm","if you’re not sure how to get a good smell, check out our website for more information about what is going around with it and where there are no signs of covid-19 infection or death rates","if you’re not sure how to find out about it, check for a local website that provides information and updates from other people who may be sick with covid-19 or have been exposed (e).","“ if you’ve been a friend to someone who has died, how do this work? ” any time that people are still viewing their friends and family members as being more than 1-metre apart",0.861488,0.836749,0.841951,0.832871
78,additional resources:,the u.s. centers for disease control and prevention (cdc) is the best source for information about covid-19 in people and risks to animals. cdc information about covid-19 in animals may be found at www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/animals.html.,the department of health is working closely with other public and private entities to provide support for covid-19 response. please visit https//www1dhhsphoriagov/,"the covid-19 response is not a funding source. if you are unable to provide information, contact your local health and government agency or other recipient’s facility at (","the covid-19 response is not a funding source. if you are unable to provide information, contact your local health and government agency or other recipient’s facility at (","the covid-19 response is not a “ health care. ” it’s an approach to how much time you should be able, if at all possible – including any",0.819391,0.820011,0.820011,0.800243
452,how can i find out if i can now have the elective surgery the mtf scheduled me for before cancelling it due to covid-19?,"work with your military medical hospital or clinic, or your dental treatment facility. staff should be able to talk to you about the status of your surgery or procedure. they can guide you on what need to do to reschedule it.","you should contact your local health department or hospital. please visit our website at wwwhealthscccsnursepracticecovid 19 webpage and follow cdc guidance on rescheduling surgeries, including recommendations from providers regarding patient care","yes, you should contact your local health department or hospital. “the emergency medical care team will work with other hospitals and clinics in order not already working on a patient’s treatment plan that is still","yes, there are no more than 1.5% of patients who need a hospital emergency care visit because they were unable or not able enough time off from an outpatient clinic appointment in march 2020 and may be","yes, “ no. ” a physician or other medical provider will not be able either of them at this time (e).the same is true in all health care systems that are currently adopting telehealth services",0.857759,0.868661,0.828069,0.831304
57,can i catch covid-19 from the faeces of someone with the disease?,"while initial investigations suggest the virus may be present in faeces in some cases, to date, there have not been reports of faecal-oral transmission of covid-19. additionally, there is no evidence to date on the survival of the covid-19 virus in water or sewage. who is assessing ongoing research on the ways covid-19 is spread and will continue to share new findings on this topic.","no. there is currently not a vaccine available for this virus, and it’s unlikely that any person who has had contact to an infected surface will be able or willing (or even want) get sick again after they recover – but we do know some people have been exposed in their own household during quarantine; if you are","no. if you have a fever, stay away to avoid getting it and keep yourself healthy until there is an end date for your symptoms (e). read more about how coronavirus can spread in general “the virus has not been able enough time off as people who are sick develop respiratory illnesses such that might be related or even cause","no. if you have a fever, stay away to avoid getting it and keep yourself healthy until there is an end date for your symptoms (e). read more about how coronavirus can spread in general “ people who develop respiratory illnesses like flu or pneumonia are not always able even though they recover after being infected by other coronaviruses – such","no. if a person is not sure that it’s “ all too much, ” they should be asking for an email to their local health care provider or contact tracers who are working on this issue and will provide information about how you check your data in general (e). additionally there may also help prevent further spread by",0.832193,0.829361,0.83124,0.817481
670,what is the food and nutrition service (fns) doing to ensure children have food to eat while schools are closed?,"states switch to their summer food service program (sfsp) or seamless summer option (sso) to serve meals to children when schools are closed. through these summer meal programs, usda allows sites to serve up to two free meals a day to children 18 and under.","fsis has been working closely with school districts, local health departments in new york state as well. they will continue providing meals for students who need them most during this time of closure or may be able provide additional services if needed","“ we recommend that nutritious foods be provided by local suppliers, including a milk-based diet. if you’re not sure how much it will cost your business or supply chain for getting enough calories from an intake of healthy meals","“ we recommend that people use their local social media platforms, including facebook.com or twitter-based news sites like youtube’s website where users can find out more about how much time they should be spending on nutritious foods during","“ if a school or local community has already been providing nutritious meals, it will be able with all of its members. this means that people in an area where there’s no regular source for drinking water from any other person",0.844946,0.821566,0.825187,0.833897


In [32]:
np.mean(test_df['bert_score_ref_gpt2'])

0.8417611479759216

In [33]:
np.mean(test_df['bert_score_ppo_gpt2_good'])

0.8317500352859497

In [34]:
np.mean(test_df['bert_score_ppo_gpt2_bad'])

0.8296028912067414

In [35]:
np.mean(test_df['bert_score_ppo_gpt2_neutral'])

0.8249549269676208