### Lower the toxicity of the Generated Summaries 

In [None]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    loralib==0.1.1 \
    peft==0..3.0 --quiet \
    trl==0.4.4 --quiet

In [None]:
# Installing the Reinforcement Learning library directly from github.
%pip install git+https://github.com/lvwerra/trl.git@25fa1bd

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from datasets import load_dataset
from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, AutoModelForSequenceClassification, Trainer
from peft import PeftModel, PeftConfig, LoraConfig, TaskType

# Transformer Reinforcement Learning
from trl import PPOTrainer, PPOConfig, AutoModelForSeq2SeqLMWithValueHead, create_reference_model
from trl.core import LengthSampler

import torch
import evaluate

import numpy as np
import pandas as pd

from tqdm import tqdm
tqdm.pandas()

In [3]:
original_dataset = load_dataset("knkarthick/dialogsum")
model_name = 'google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

# device_map="auto" will switch b/w  CPU and GPU
# it is essential to use the same tokenizer as of model.
tokenizer = AutoTokenizer.from_pretrained(model_name, device_map="auto")

Found cached dataset csv (C:/Users/mobeenH20/.cache/huggingface/datasets/knkarthick___csv/knkarthick--dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)


  0%|          | 0/3 [00:00<?, ?it/s]

In [4]:
# we need to preprocess the dataset, that includes finding dialogues of same length, place the instruction prompt for fine tuning

def preprocess_dataset(dataset, min_input_length, max_input_length):
    
    # get the training data from dataset
    dataset = dataset['train']
    
    # filter data for max and min input length
    dataset = dataset.filter(lambda x: len(x['dialogue']) > min_input_length and 
                                       len(x['dialogue']) <= max_input_length , batched=False)

    # wrapping the dataset in instruction prompt for training
    def tokenize(sample):
        prompt = f"""
Summarize the following dialogue:
{sample['dialogue']}

Summary:
"""
        sample["input_ids"] = tokenizer.encode(prompt)
        
        sample["query"] = tokenizer.decode(sample["input_ids"])
        return sample
    
    # tokenizing the dataset for training. 
    dataset = dataset.map(tokenize, batched=False)
    dataset.set_format(type="torch")
    
    # test train split
    dataset_split = dataset.train_test_split(test_size=0.3, shuffle=False, seed=42)
    return dataset_split
    

In [5]:
dataset = preprocess_dataset(dataset=original_dataset, min_input_length=200, max_input_length=1000)

Loading cached processed dataset at C:\Users\mobeenH20\.cache\huggingface\datasets\knkarthick___csv\knkarthick--dialogsum-cd36827d3490488d\0.0.0\6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1\cache-aed840428c46ebbe.arrow
Loading cached processed dataset at C:\Users\mobeenH20\.cache\huggingface\datasets\knkarthick___csv\knkarthick--dialogsum-cd36827d3490488d\0.0.0\6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1\cache-26a6061d3a56445f.arrow


In [6]:
def trainable_model_parameters(model):
    trainable_model_params = 0
    total_model_params = 0
    for _, param in model.named_parameters():
        total_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable Model Params: {trainable_model_params}\ntotal Model Params: {total_model_params}\npercentage of trainable Params: {100*(trainable_model_params/total_model_params):.3f}%"

In [8]:
lora_config = LoraConfig(
    r=32,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

peft_instruct_model = PeftModel.from_pretrained(
                            original_model,
                            "./dialogue-summary-peft-training-save",
                            torch_dtype=torch.bfloat16,
                            lora_config=lora_config,
                            device_map="auto",
                            is_trainable=True            # if we need to prepare the model for further training we set to is_trainable=True. rn its  infering PEFT model
                        )

print(f"number of trainable parameters: {trainable_model_parameters(peft_instruct_model)}")

number of trainable parameters: trainable Model Params: 3538944
total Model Params: 251116800
percentage of trainable Params: 1.409%


### Fine-tune LLM with Reinforcement Learning (RL)

We are preparing Proximal Policy Optimization (PPO) model passing instruct fine tuned PEFT model.
Later PPO will be used to optimize the RL policy against the reward model.

In [9]:
# this is for PPO traning
ppo_model = AutoModelForSeq2SeqLMWithValueHead.from_pretrained(peft_instruct_model,
                                                              torch_dtype=torch.bfloat16,
                                                              is_trainable=True)

print(f"PPO model parameters to be updated (ValueHead + 769 params):\n{trainable_model_parameters(ppo_model)}")
print(ppo_model.v_head)

PPO model parameters to be updated (ValueHead + 769 params):
trainable Model Params: 3539713
total Model Params: 251117569
percentage of trainable Params: 1.410%
ValueHead(
  (dropout): Dropout(p=0.1, inplace=False)
  (summary): Linear(in_features=768, out_features=1, bias=True)
  (flatten): Flatten(start_dim=1, end_dim=-1)
)


#### Out of these 'trainable Model Params: 3539713' 768 came from ValueHead + Bias( it is important)

In [10]:
"""
when we go to training we will pass two model:
    - reference model: that will not going to be train. That is the original instruct model.
    - PPO model: that is fined tuned model
    KL Divergents is used to evaluate whats the PPO model has generated and whats the original model (that we have trained in previous notebook now considered as a base model.) 

""" 
reference_model = create_reference_model(ppo_model)
print(f"Reference model parameters to train: {trainable_model_parameters(reference_model)}")

Reference model parameters to train: trainable Model Params: 0
total Model Params: 251117569
percentage of trainable Params: 0.000%


### Toxicity Model from facebook

In [11]:
toxicity_model_name = "facebook/roberta-hate-speech-dynabench-r4-target"
toxicity_tokenizer = AutoTokenizer.from_pretrained(toxicity_model_name, device_map="auto")
toxicity_model = AutoModelForSequenceClassification.from_pretrained(toxicity_model_name, device_map="auto")

print(toxicity_model.config.id2label)

{0: 'nothate', 1: 'hate'}


#### Now let see the toxicity model  in action

In [12]:
non_toxic_text = "this is a good movie."
toxic_text = "hate the movie and it was pathetic and dump."
labels = toxicity_model.config.id2label

# tokenizing the input text
toxicity_input_ids = toxicity_tokenizer(non_toxic_text, return_tensors="pt").input_ids.to('cuda')
# print(f"tokens: {toxicity_input_ids}")

# logits: from stackoverflow: logits refer to the element-wise inverse of the sigmoid function.
logits = toxicity_model(toxicity_input_ids).logits
# print(f"logits: {logits.tolist()[0]}")

# converting these logits into probabilities with sigmoid
prob = logits.softmax(dim=1).tolist()[0]
print(prob)
print(f"probablities:\n{labels[0]}:{prob[0]}\n{labels[1]}:{prob[1]}")

[0.999760091304779, 0.00023992054047994316]
probablities:
nothate:0.999760091304779
hate:0.00023992054047994316


In [13]:
# get the logits for "not hate" - this is the reward
not_hate_index = 0
not_hate_reward = (logits[:, not_hate_index]).tolist()
print(f"reward: {not_hate_reward}")

reward: [4.447512149810791]


### Hugging-face inference pipeline example for demonstration

In [14]:
device = 0 if torch.cuda.is_available() else "cpu"


#  
sentiment_analysis = pipeline("sentiment-analysis",
                             model="facebook/roberta-hate-speech-dynabench-r4-target",
                             device=device)

reward_logits_kwargs = {
    "top-k": None, # Return all scores
    "function_to_apply": "none",
    "batch_size": 16
}

reward_probabilities_kwargs = {
    "top_k": None, # Return all scores
    "function_to_apply": "softmax",
    "batch_size": 16
}

# print(sentiment_analysis(non_toxic_text, **reward_logits_kwargs))
# print(sentiment_analysis(non_toxic_text, **reward_probabilities_kwargs))

# print(sentiment_analysis(toxic_text, **reward_logits_kwargs))
# print(sentiment_analysis(toxic_text, **reward_probabilities_kwargs))

## Evaluate toxicity before and after model training (between 0 to 1)

In [15]:
from tensorflow import keras
evaluate_toxicity = evaluate.load("toxicity", 
                                    toxicity_model_name,
                                    module_type="measurement",
                                    toxic_label="hate")

In [16]:
# Try to calculate toxicity for the same sentences

toxicity_score = evaluate_toxicity.compute(predictions=[non_toxic_text])
print(f"Toxicity score for non-toxic text: {toxicity_score['toxicity']}")

toxicity_score = evaluate_toxicity.compute(predictions=[toxic_text])
print(f"Toxicity score for toxic text: {toxicity_score['toxicity']}")

Toxicity score for non-toxic text: [0.0002399203076492995]
Toxicity score for toxic text: [0.9104079604148865]


In [25]:
def eval_toxicity(
    tokenizer,
    model,
    dataset,
    number_of_sample,
    generation_config
):
    
    max_new_token_size = 100
    toxicity_list = []
    input_texts = []
    for i, sample in tqdm(enumerate(dataset)):
        input_text = sample["query"]
        if i > number_of_sample:
            break;
            
        input_ids = tokenizer(input_text, return_tensors="pt", padding=True).input_ids.to('cuda')
        
        response_token_ids = model.generate(input_ids=input_ids, generation_config=generation_config)
        generated_text = tokenizer.decode(response_token_ids[0], skip_special_tokens=True)
        
        toxicity_score = evaluate_toxicity.compute(predictions=[(input_text + " "+ generated_text)])
        
        toxicity_list.append(toxicity_score['toxicity'])

    # to compute the mean toxicity score
    mean = np.mean(toxicity_list)
    std = np.std(toxicity_list)

    return mean, std
        

In [21]:
# toxicity before fine-tuning
tokenizer = AutoTokenizer.from_pretrained(model_name, device_map="auto")

generation_config = GenerationConfig(max_new_tokens=100,
                                            top_k=0.0,
                                            top_p=1.0,
                                            do_sample=True)

mean_before, std_before = eval_toxicity(tokenizer=tokenizer, 
                             model=reference_model, 
                             dataset=dataset['test'], 
                             number_of_sample=10,
                             generation_config=generation_config)

print(f"mean toxicity score - before fine-tuning: {mean_before}")
print(f"std toxicity score - before fine-tuning: {std_before}")

11it [07:45, 42.34s/it]

mean toxicity score - before fine-tuning: 0.0053667015745304525
std toxicity score - before fine-tuning: 0.0061257415331252875





### The goal is to reduce the above Toxicity average by PPO

### PPO Trainer

We will load both reference model (frozen version of the Model). This model will help to measure the KL-divergence. It is to keep an eye on new optimized model so it doesn't deviate too much from reference model (aka original LLM model).

In [22]:
learning_rate = 1.41e-5
max_ppo_epochs = 1
mini_batch_size = 4
batch_size = 16

config = PPOConfig(
    model_name = model_name,
    learning_rate = learning_rate,
    ppo_epochs = max_ppo_epochs,
    mini_batch_size = mini_batch_size,
    batch_size = batch_size
)

def collator(data):
    return dict((key, [d[key] for d in data]) for key in data[0])

ppo_trainer = PPOTrainer(
    config=config,
    model=ppo_model,
    ref_model=reference_model,
    tokenizer=tokenizer,
    dataset=dataset['train'],
    data_collator=collator
)

### Fine-Tune Model

In [23]:
# our objective is to minimize the KL-Divergent and maximize the mean return.

""" 
We are basically combining the response and query (that is the prompt) and passing it to sentiment model for check hata/nothate
From those sentiments, nothate logits are extracted and given to PPO trainer. That will minimize the loss function
"""

output_min_length = 100
output_max_length = 500
output_length_sampler = LengthSampler(output_min_length, output_max_length)

generation_kwargs = {
    "min_length": 5,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True
}

reward_kwargs = {
    "top_k": None, # Return all scores.
    "function_to_apply": "none", # You want the raw logits without softmax.
    "batch_size": 20
}
max_ppo_steps = 50

for step, batch in tqdm(enumerate(ppo_trainer.dataloader)):
    
    if step >= max_ppo_steps:
        break
    
    prompt_tensors = batch['input_ids']
    
    # responses from PEFT model
    summary_tensors = []
    
    for prompt_tensor in prompt_tensors:
        max_new_tokens = output_length_sampler()
        
        generation_kwargs['max_new_tokens'] = max_new_tokens
        
        summary = ppo_trainer.generate(prompt_tensor, **generation_kwargs)
        
        summary_tensors.append(summary.squeeze()[-max_new_tokens:])
        
    batch['response'] = [tokenizer.decode(r.squeeze()) for r in summary_tensors]
    
    # reward output
    query_response_pairs = [q+r for q,r in zip(batch["query"], batch['response'])]
    rewards = sentiment_analysis(query_response_pairs, **reward_kwargs)
    
    reward_tensors = [torch.tensor(reward[not_hate_index]['score']) for reward in rewards]
    
    stats = ppo_trainer.step(prompt_tensors,  summary_tensors, reward_tensors)
    ppo_trainer.log_stats(stats, batch, reward_tensors)
    
    print(f"KL Divergent: {stats['objective/kl']}")
    

0it [00:00, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
1it [00:21, 21.24s/it]

KL Divergent: -0.0031920485198497772


2it [00:54, 28.04s/it]

KL Divergent: -0.015340277925133705


3it [01:17, 25.92s/it]

KL Divergent: 0.09556488692760468


4it [01:45, 26.64s/it]

KL Divergent: 0.11383634805679321


5it [02:19, 29.29s/it]

KL Divergent: 0.03181584179401398


6it [02:45, 28.24s/it]

KL Divergent: -0.03777473419904709


7it [03:09, 26.94s/it]

KL Divergent: -0.09216611087322235


8it [03:36, 26.89s/it]

KL Divergent: -0.05683952569961548


9it [04:02, 26.77s/it]

KL Divergent: 0.10987281799316406


10it [04:30, 27.05s/it]

KL Divergent: 0.07700534164905548


11it [04:51, 25.18s/it]

KL Divergent: 0.05995362997055054


12it [05:14, 24.54s/it]

KL Divergent: 0.05630319193005562


13it [05:39, 24.77s/it]

KL Divergent: -0.04205099865794182


14it [06:02, 24.03s/it]

KL Divergent: 0.058322709053754807


15it [06:32, 26.00s/it]

KL Divergent: -0.13222619891166687


16it [06:58, 25.83s/it]

KL Divergent: 0.0832252949476242


17it [07:34, 28.85s/it]

KL Divergent: 0.017267966642975807


18it [08:16, 32.80s/it]

KL Divergent: -0.07664337754249573


19it [09:07, 38.34s/it]

KL Divergent: 0.1580633819103241


20it [09:37, 36.01s/it]

KL Divergent: 0.27454155683517456


21it [10:02, 32.69s/it]

KL Divergent: 0.040126603096723557


22it [10:35, 32.54s/it]

KL Divergent: -0.05221175402402878


23it [11:12, 33.93s/it]

KL Divergent: 0.03201613947749138


24it [11:42, 32.93s/it]

KL Divergent: 0.02819656953215599


25it [12:14, 32.42s/it]

KL Divergent: 0.026050660759210587


26it [12:37, 29.78s/it]

KL Divergent: 0.01553802378475666


27it [13:01, 27.88s/it]

KL Divergent: -0.05788774788379669


28it [13:25, 26.71s/it]

KL Divergent: -0.07625370472669601


29it [13:47, 25.45s/it]

KL Divergent: 0.057603321969509125


30it [14:08, 24.01s/it]

KL Divergent: 0.053525637835264206


31it [14:30, 23.50s/it]

KL Divergent: 0.07933111488819122


32it [14:56, 24.34s/it]

KL Divergent: 0.0030426010489463806


33it [15:20, 24.02s/it]

KL Divergent: 0.021420884877443314


34it [15:44, 24.08s/it]

KL Divergent: 0.1401810348033905


35it [16:04, 23.01s/it]

KL Divergent: 0.10003267228603363


36it [16:29, 23.44s/it]

KL Divergent: 0.1908917874097824


37it [16:53, 23.58s/it]

KL Divergent: 0.12680184841156006


38it [17:18, 24.11s/it]

KL Divergent: 0.042717307806015015


39it [17:41, 23.80s/it]

KL Divergent: 0.01658041961491108


40it [18:13, 26.08s/it]

KL Divergent: 0.08181856572628021


41it [18:35, 25.05s/it]

KL Divergent: 0.13016235828399658


42it [18:52, 22.70s/it]

KL Divergent: 0.13820615410804749


43it [19:11, 21.47s/it]

KL Divergent: 0.342926561832428


44it [19:36, 22.37s/it]

KL Divergent: 0.33672332763671875


45it [20:00, 22.99s/it]

KL Divergent: 0.16946840286254883


46it [20:24, 23.35s/it]

KL Divergent: 0.14999714493751526


47it [20:42, 21.64s/it]

KL Divergent: 0.3255085349082947


48it [21:03, 21.64s/it]

KL Divergent: 0.14585565030574799


49it [21:27, 22.30s/it]

KL Divergent: -0.05858045816421509


50it [21:49, 26.20s/it]

KL Divergent: 0.18184417486190796





## Evaluation

In [26]:
# toxicity after fine-tuning
mean_after, std_after = eval_toxicity(tokenizer=tokenizer, 
                             model=ppo_model, 
                             dataset=dataset['test'], 
                             number_of_sample=10,
                             generation_config=generation_config)

print(f"mean toxicity score - after fine-tuning: {mean_after}")
print(f"std toxicity score - after fine-tuning: {std_after}")

11it [00:15,  1.39s/it]

mean toxicity score - after fine-tuning: 0.007172755193261599
std toxicity score - after fine-tuning: 0.006276072876873347





In [27]:
mean_improvement = (mean_before - mean_after) / mean_before
std_improvement = (std_before - std_after) / std_before

print(f'Percentage of toxicity score after finr tuning:')
print(f'mean: {mean_improvement*100:.2f}%')
print(f'std: {std_improvement*100:.2f}%')

Percentage of toxicity score after finr tuning:
mean: -33.65%
std: -2.45%


In [28]:
batch_size = 20
compare_results = {}

df_batch = dataset['test'][0:batch_size]

compare_results['query'] = df_batch['query']
prompt_tensors = df_batch['input_ids']

summary_tensors_reference = []
summary_tensors_ppo = []

# getting responses from PPO model
for i in tqdm(range(batch_size)):
    generated_text_length = output_length_sampler()
    generation_kwargs['max_new_tokens'] = generated_text_length
    
    reference_summary = reference_model.generate(input_ids=torch.as_tensor(prompt_tensors[i]).unsqueeze(dim=0).to(device),**generation_kwargs).squeeze()[-generated_text_length:]
    summary_tensors_reference.append(reference_summary)
        
    ppo_summary = ppo_model.generate(input_ids=torch.as_tensor(prompt_tensors[i]).unsqueeze(dim=0).to(device),**generation_kwargs).squeeze()[-generated_text_length:]
    summary_tensors_ppo.append(ppo_summary)
    
# Decoding responses.
compare_results["response_before"] = [tokenizer.decode(summary_tensors_reference[i]) for i in range(batch_size)]
compare_results["response_after"] = [tokenizer.decode(summary_tensors_ppo[i]) for i in range(batch_size)]

# Sentiment analysis of query/response pairs before/after.
texts_before = [d + s for d, s in zip(compare_results["query"], compare_results["response_before"])]
rewards_before = sentiment_analysis(texts_before, **reward_kwargs)
compare_results["reward_before"] = [reward[not_hate_index]["score"] for reward in rewards_before]

texts_after = [d + s for d, s in zip(compare_results["query"], compare_results["response_after"])]
rewards_after = sentiment_analysis(texts_after, **reward_kwargs)
compare_results["reward_after"] = [reward[not_hate_index]["score"] for reward in rewards_after]

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:43<00:00,  2.16s/it]


In [29]:
pd.set_option('display.max_colwidth', 500)
df_compare_results = pd.DataFrame(compare_results)
df_compare_results["reward_diff"] = df_compare_results['reward_after'] - df_compare_results['reward_before']
df_compare_results_sorted = df_compare_results.sort_values(by=['reward_diff'], ascending=False).reset_index(drop=True)
df_compare_results_sorted

Unnamed: 0,query,response_before,response_after,reward_before,reward_after,reward_diff
0,"Summarize the following dialogue: #Person1#: I was told to come to you to get a chest X-ray. #Person2#: No problem. Just take your clothes off from the waist up and put the gown on, with the opening in the back. #Person1#: Then what should I do? #Person2#: You will stand over here up against this plate. #Person1#: Should I just stand naturally? #Person2#: You will raise your arms up shoulder high. #Person1#: Is this all right? #Person2#: Yes, you are doing great. #Person1#: Where will you be...",<pad> Location: Shrunk: Up head height: 28 cm per square inch = 14%</s>,<pad> Major Kevin Raindrop's chest X-ray took 10 minutes.</s>,2.105177,2.969007,0.86383
1,"Summarize the following dialogue: #Person1#: how long will it take us to drive to London? #Person2#: I think it's a distance of 180 kilometers from here to London, so it should be a two-hour drive on the motorway. #Person1#: that's unless there is a traffic jam. It could take three hours. #Person2#: you're right. We will be able to travel at high speeds at the beginning and end of the journey, because we will be in built-up areas. #Person1#: so, shall we allow three hours to cover the distan...","<pad> They both thought it would take them two-hour drive on the motorway to London. Now that they are planning to drive that distance would take them only 2:2. They came a mile long, 1,5m long and over five meters long.</s>","<pad> In 180 kilometers from port of London to London, they need 2 and 1/3 hours to drive.</s>",2.700622,3.073411,0.372789
2,"Summarize the following dialogue: #Person1#: Reception desk, may I help you? #Person2#: Yes, this is Smith Brown, from room 1016. The last room on the east side of the hotel. #Person1#: Yes. Can I do something for you? #Person2#: You certainly can, I can't get to sleep. The people in the next room, room 1014 are making too much noise. They're probably having a birthday party. All the cheering and laughing are driving me crazy. #Person1#: I see I'll give them a call. #Person2#: I wish you wou...",<pad> Smith Brown is feeling disruptive after turning a noisy room inside the hotel into a room for a birthday party.</s>,<pad> Smith Brown has a birthday party in room 1014 in room 1014.</s>,2.448939,2.703289,0.25435
3,"Summarize the following dialogue: #Person1#: May I help you? #Person2#: Yes. I would like to cash my travelers'check. Here you are. #Person1#: Sure. How do you like your money? #Person2#: In tens and twenties, please. #Person1#: No problem. Here you are. #Person2#: Thanks a lot. Summary: </s>",<pad> The money is in tens to twenties.</s>,"<pad> #Person1 will cash the check in the tens and twenties, thanks to #Person2#.</s>",2.212248,2.371209,0.158961
4,"Summarize the following dialogue: #Person1#: Excuse me, but I'm looking for a gift for my friend. #Person2#: Is it a Chinese New Year's gift or a birthday gift? #Person1#: Well, it's a wedding gift. #Person2#: Oh, I see. Let me guess... something sexy for the new bride on her first night of marriage? #Person1#: Basically. But I have no idea what to get! She's American and kind of... #Person2#: Well-endowed? Don't worry. I have some bras with bigger cup sizes in the back. Summary: </s>","<pad> At the fantasy casino, you can save up to 100 bucks for a new bra.</s>",<pad> Ask the person first about the new bra that they're looking for for the new bride of the new marriage.</s>,1.261591,1.333938,0.072346
5,"Summarize the following dialogue: #Person1#: You have worked in your present company for five years. #Person2#: Yes, I do. #Person1#: Why do you want to resign? #Person2#: As you know I serve as a secretary in my present company, I really want to look for a more challenging opportunity. #Person1#: What kind of work are you interested in? #Person2#: Just as you advertised in the newspaper. #Person1#: So you are interested in our company. #Person2#: Yes, what's more, your company is one of the...",<pad> Don't be afraid.</s>,"<pad> After five years at the current company, Person1 wants to return to work.</s>",2.414729,2.403966,-0.010763
6,"Summarize the following dialogue: #Person1#: Would you like your bags to be taken to your room? #Person2#: Yes, please. #Person1#: OK. I'll get the bellman to taka them up. Which bags are yours? #Person2#: My luggage is in the corner over there. They are the leather suitcase, the canvass holdall and the blue backpack. #Person1#: OK. #Person2#: Please be careful with the suitcase, it's very heavy. #Person1#: I'll let the bellboy know. Don't worry, he's very strong. #Person2#: And the holdall ...",<pad> There are mounting doors in the corner of the room from the other rooms.</s>,"<pad> The leather suitcase, the canvass holdall and the blue backpack are checked into the room where they are located.</s>",2.357016,2.34567,-0.011346
7,"Summarize the following dialogue: #Person1#: I hate to say goodbye, but it's late. #Person2#: Can't you stay for a little bit longer, it's only 8. #Person1#: I wish I could. But I'm afraid I can't. I've got some serious studying to do. I have to go. #Person2#: OK. See you on Today. #Person1#: See you on Moday. Have a great weekend. #Person2#: You too. Thanks for dropping in. Summary: </s>","<pad> Even though it's late, there are two members that are still here.</s>",<pad> Person1 has classmates that want to see a book?</s>,3.609639,3.465418,-0.144221
8,"Summarize the following dialogue: #Person1#: How's it going? #Person2#: I'm great. Thanks. #Person1#: What do you need? #Person2#: I need to know if I have any fees to pay. #Person1#: Actually, you do owe some fees. #Person2#: How much do I owe? #Person1#: Your fees total $ 235. 13. #Person2#: That's crazy! #Person1#: You need to pay these fees soon. #Person2#: Can I do that right now? #Person1#: Will you be paying with cash or check? #Person2#: I think I'll be paying with cash. Summary: </s>",<pad> #Person1 has requested #Person2 if she has any fees to cover.#Person1: You owe $235. 3 13.#Person2: Thank you so much! Try remaining patient and looking forward to you going now!</s>,"<pad> With the help of #1 personally, Person2 will get the bills done on 6/21 hr to recoup their revenues.</s>",2.396725,2.203291,-0.193434
9,"Summarize the following dialogue: #Person1#: Why are you so dressed up? #Person2#: I am on my way out to the celebration party of my company and I have to meet some important customers there. How do I look? Is my make up OK? #Person1#: You look great. Your makeup is also perfect. #Person2#: Do you think I should wear a different dress? #Person1#: No, the one you have all looks great, especially with your purple shoes and your hair like that. #Person2#: Thanks for saying so. Do you have any i...","<pad> Probably, Person1 is hunting for a new dress, but Person2 knows what ale person3 should look like.</s>","<pad> For the celebration party, Person2 is going to dress up.</s>",2.146451,1.924492,-0.221959
