First, check that the correct kernel is chosen. 

EvalALgorithm implements evaluate algo

needs:
    - model 
    - dataset confug (where stored and column names)
    - prompt template


    Modeulrunner
could bring in openAI, huggingface, Bedrock model

model needs to implement these methods:
    - predict
    - output
    - probabas
    
DataConfig


factual_knowledge


# Measuring and Mitigating Toxicity in Large Language Models

Install all required libraries.

In [2]:
!pip install -q -U pip --root-user-action=ignore
!pip3 install -q -r requirements.txt --root-user-action=ignore

[0m

In [59]:
# Add installed cuda runtime to path for bitsandbytes
import os
import gc
import nvidia
import torch
from IPython.display import Markdown

os.environ["TOKENIZERS_PARALLELISM"] = "true"

from tqdm.autonotebook import tqdm as notebook_tqdm

# 1. Load a dataset

In [4]:
from convokit import Corpus, download

# download data
corpus = Corpus(filename=download("movie-corpus"))


Downloading movie-corpus to /root/.convokit/downloads/movie-corpus
Downloading movie-corpus from http://zissou.infosci.cornell.edu/convokit/datasets/movie-corpus/movie-corpus.zip (40.9MB)... Done


Convert the data into a dataframe and eventually a 🤗 dataset.

In [5]:
import pandas as pd

# obtain keys for all dialogs across various movies
utter_keys = list(corpus.utterances.keys())

# initialize dataframe
text_df = pd.DataFrame(columns=["movie", "dialogue"])

# create empty list to store movie name, dialogue, and speaker info
movie_ls = []
text_ls = []
genre_ls = []

# loop through all utterances and append to list
for k in utter_keys:
    movie_ls.append(corpus.utterances[k].speaker.meta["movie_name"])
    text_ls.append(corpus.utterances[k].text)

In [6]:
# fill dataframe with data
text_df["movie"] = movie_ls
text_df["dialogue"] = text_ls

# group by movie title and concatenate all text into one long dialogue
grouped_df = (
    text_df.groupby("movie")["dialogue"].apply(lambda x: " ".join(x)).reset_index()
)

Large Language Models require the data to be stored in a compatible dataset type; use the `datasets` library to convert the dataframe to the required format.

In [7]:
from datasets import Dataset

dataset = Dataset.from_pandas(grouped_df)

In [8]:
dataset

Dataset({
    features: ['movie', 'dialogue'],
    num_rows: 617
})

You can see that there are 617 distinct movies, continue to explore the data.

In [9]:
dataset[3]["dialogue"][:420]

"Officers, there's your killer, do your duty, arrest him! ...so we kill someone famous and if we are caught, we are sent to mental hospital... I don't think it's abuse, I think it's torture. I'm abused.  Don't you think? Can I see your back? Out on my back when I was a small boy. Your father put cigarettes out on you? That's what he did to me.  He put cigarettes out on me. Yeah, he hated me from day when I was born.  "

Delete all old variables that are no longer needed to free up memory with `del`.

In [10]:
del corpus, utter_keys, text_df, grouped_df, movie_ls, text_ls

Make sure to release the memory after deleting the objects and variables that are no longer in use.

In [11]:
gc.collect()

12318289

<div class="alert alert-block alert-success">
<b>Conclusion</b>: 
</div>

# 2. Load and use LLM

Now we initialize and download a base model using the `T5ForConditionalGeneration` class provided by the transformers library and the corresponding tokenizer `T5Tokenizer`.

T5 (Text-To-Text Transfer Transformer) is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task,including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). 

For more details have a look [here](https://huggingface.co/docs/transformers/model_doc/t5).

In [12]:
from transformers import T5ForConditionalGeneration

model_t5 = T5ForConditionalGeneration.from_pretrained(
    "google/flan-t5-base",
    device_map={"": 0},
    torch_dtype=torch.float32,
)

**Parameters for model:**
+ `device_map="auto"` specifies the device where the model will be loaded - setting it to "auto" allows the library to automatically select the appropriate device (e.g., CPU or GPU) based on availability
+ `load_in_8bit=True` means that the weights of the model will be loaded in lower precision to save memory space

Find a more extensive documentation for the parameters [here]().

In [13]:
from transformers import T5Tokenizer

tokenizer_t5 = T5Tokenizer.from_pretrained(
    "google/flan-t5-base", legacy=True, max_length=512
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [14]:
# print(tokenizer_t5.all_special_tokens)

**Parameters for tokenizer:**
+ legacy=True
+ `max_length=512`
Token indices sequence length should not be longer than the specified maximum sequence length for the T5 model which is 512 tokens. T5 was mostly trained using 512 input tokens, however, thanks to its use of relative attention it can technically use longer input sequences. To avoid running out of memory, we will limit to 512 tokens.

In [15]:
# reuse the end of sequence token as padding token
tokenizer_t5.pad_token = tokenizer_t5.eos_token

# reuse the end of sequence token to represent out-of-vocabulary token
tokenizer_t5.unk_token = tokenizer_t5.eos_token

+ The `eos_token` is a special token representing the end of a sequence. 

+ By assigning it to the `pad_token`, any padding tokens added during tokenization will also be considered as end-of-sequence tokens. 

+ This can be useful when summarizing text snippets of different length as it helps ensure the same number of tokens is passed to the model at all times.


Now the `tokenizer` object is initialized and ready to use for tokenizing text.

After execution, the `model` object will be initialized and ready to use for generating text.

<div class="alert alert-block alert-info">
Let's generate a couple of responses using first the out-of-the-box pre-trained model before any modifications. 
We will try to create a movie script summary using the prompt 'Summarize the following conversation from a movie script'.
</div>

Let's try this prompt:

In [16]:
# create a prompt and use an example dialogue
inference_prompt = (
    "Summarize the following conversation from a movie script: %s"
    % dataset[0]["dialogue"]
)

# let's look at the prompt but shorten the output to reduce the amount of text
Markdown(inference_prompt[:200])

Summarize the following conversation from a movie script: Jesus, my legs are asleep. I'll never be able to win this shit. You must come in first place to move on ! Pick with expediency ! Great.. great

To get a summary from the model, use Huggingface pipelines. Pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks. More details [here](https://huggingface.co/docs/transformers/main_classes/pipelines).

In [17]:
len(tokenizer_t5(inference_prompt).input_ids)

Token indices sequence length is longer than the specified maximum sequence length for this model (7696 > 512). Running this sequence through the model will result in indexing errors


7696

In [18]:
from transformers import pipeline

# set up a pipeline for inference and specify summarization as task
pipe = pipeline(
    "summarization",
    model=model_t5,
    tokenizer=tokenizer_t5,
    device_map={"": 0},
    max_length=512,
    batch_size=2,
)

Try the inference pipeline.

In [19]:
# pass in the prompt
summary_example = pipe(
    inference_prompt,
    eos_token_id=tokenizer_t5.eos_token_id,
    pad_token_id=tokenizer_t5.eos_token_id,
)

# look at the output
for text in summary_example:
    # save the summary so you can later check for toxicity
    sample_summary = text["summary_text"]

Markdown(sample_summary)

<pad> I'm a banker. I see blood bathes every day. My business is just as bad. I have an embezzler and his accomplice. I built this place, based on my belief that what I did was for justice. McCay had the company buy the property and pay the bills. First Bank buys another smaller bank, First Bank converts them to our systems and way of doing things. They had everyone start taking those psych tests. Matthew Parker has been working with First Bank for several years now, and if I might say so, I can fire you and have someone else who has the balls terminate these worthless people. I've had that kind of loyalty all of my life. If I wanted to, I could fly to Paris for the afternoon. My father formed this company over seventy years ago. When his brother became mentally challenged, it was put in my charge. Now running a...

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Recreate the example above but for another movie.
</div>

Before you can try another prompt, you will have to delete the previous prompt that was used for inference; e.g. <code>inference_prompt</code> and also clear the GPU cache with <code>torch.cuda.empty_cache()</code>. This will free up some memory and also highlights the importance of using 🤗 Datasets as datasets saved with the HuggingFace library will only be loaded into memory when needed whereas the example above consumed a large amount of disk space.

In [20]:
del inference_prompt
gc.collect()
torch.cuda.empty_cache()

In [21]:
##### complete your code here #####


###################################

Eventually the goal is to summarize all movie dialogues, so you will need to add the prompt instruction to 'summarize' to all datapoints. To do this efficiently, use the `map()` function from 🤗 Datasets.

The primary purpose of map() is to speed up processing functions. It allows you to apply a processing function to each example in a dataset, independently or in batches. This function can even create new rows and columns.

In [22]:
def _add_prompt(sample):
    """
    Function to add prompt instructions to all dialogues.
    """
    # select certain column from dataset
    dialogue = sample["dialogue"]

    # check that it is not empty
    if not dialogue:
        raise ValueError(f"Expected a movie dialogue in: {sample}")

    # set up prompt template; specify a value to be replaced during mapping with {replace_with}
    prompt_template = """
    Summarize the following conversation from a movie script. {replace_with}
    Summary:
    """

    # assign new entry in dataset; pass in the value that replaces {replace_with}
    sample["query"] = prompt_template.format(replace_with=dialogue)

    return sample


prompt_dataset = dataset.map(_add_prompt, batched=False)
prompt_dataset

Map:   0%|          | 0/617 [00:00<?, ? examples/s]

Dataset({
    features: ['movie', 'dialogue', 'query'],
    num_rows: 617
})

Next, tokenize the query to make it quicker to pass to the model later.

In [23]:
def _add_tokenization(sample):
    """
    Function to tokenize the query (summarization instruction and dialogue text).
    """

    # assign new entry in dataset that contains the token IDs
    sample["input_ids"] = tokenizer_t5.encode(
        sample["query"], truncation=True, max_length=512, return_tensors="pt"
    )

    # translate the tokens into a query
    sample["input_query"] = tokenizer_t5.decode(
        sample["input_ids"][0], skip_special_tokens=True
    )

    return sample


prompt_dataset = prompt_dataset.map(_add_tokenization, batched=False)

Map:   0%|          | 0/617 [00:00<?, ? examples/s]

Use the `set_format()` function to set the dataset format to be compatible with PyTorch.

In [24]:
# set format
prompt_dataset.set_format(type="torch")

Next, generate the movie dialog summaries. There are different ways to perform inference (generating outputs); here you will use as simple mapping function that calls the model and generates an output of at most 150 tokens.

In [25]:
def _add_summaries(sample):
    """
    Function to create summaries of the movie dialogue dataset.
    """

    # generate outputs (this will be in tokens)
    outputs = model_t5.generate(
        input_ids=sample["input_ids"].to("cuda"),
        max_new_tokens=150,
        do_sample=True,
        top_p=0.9,
    )

    # decode the tokens
    sample["summary-flan-t5"] = tokenizer_t5.decode(
        outputs[0], skip_special_tokens=True
    )
    # sample["summary-alice"]
    return sample


summaries_dataset = prompt_dataset.map(_add_summaries, batched=False)

Map:   0%|          | 0/617 [00:00<?, ? examples/s]

In [26]:
# remove columns that are no longer needed
summaries_dataset = summaries_dataset.remove_columns(["dialogue", "query", "input_ids"])

# for backup save the dataset to local disk
summaries_dataset.save_to_disk("summaries_dataset")

Saving the dataset (0/1 shards):   0%|          | 0/617 [00:00<?, ? examples/s]

If you need to load in the dataset at any point, you can do so with `load_from_disk('summaries_dataset')`. Make sure to import the method first with `from datasets import load_from_disk`.

<div class="alert alert-block alert-success">
<b>Conclusion</b>: At this point, you have summaries for all the movies and it is time to check whether those summaries contain any hate speech, slurs or toxic remarks.
</div>

# 3. Evaluate LLM generated summaries for toxicity

AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model; in our case we classify whether or not a summary is toxic. 

First, check how much memory is currently disposable by running `!nvidia-smi`.

In [27]:
!nvidia-smi

Tue Oct 31 19:37:21 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   37C    P0    34W /  70W |   2243MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

To evaluate toxicity you can load the 🤗 [evaluate](https://huggingface.co/docs/evaluate/index) library and initialize a toxicity evaluator object. The model that will be used to evaluate toxicity is the [RoBERTa](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target) model. RoBERTa was trained to detect toxicity on a dataset of approx. 40,000 entries, generated and labelled by trained annotators over four rounds
of dynamic data creation. Each hateful entry has fine-grained labels for the type and target of hate.

In [28]:
import evaluate

# specify model name
toxicity_model_name = "facebook/roberta-hate-speech-dynabench-r4-target"

toxicity_evaluator = evaluate.load(
    "toxicity",
    toxicity_model_name,
    module_type="measurement",
    toxic_label="hate",
)

To evaluate the movie summary for toxicity, simply pass the summary text to the evaluator.

In [29]:
%%capture
toxicity_score = toxicity_evaluator.compute(predictions=[
    sample_summary
], aggregation=None)

In [30]:
print(toxicity_score["toxicity"])

[0.05065372586250305]


If the aggregation parameter is set to `None`, the scores for each prediction are returned. 

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Calculate the toxity score for another movie.
</div>

In [31]:
##### complete your code here #####


###################################

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Calculate the max toxicity score across multiple movies, by providing a list of summaries to evaluate. Make sure to specify <code>aggregation="maximum"</code> as well.
</div>

In [32]:
##### complete your code here #####


###################################

Now that you evaluated for a few movies manually, it is time to evaluate all movie summaries and obtain a list of toxicity scores.

In [33]:
def _add_toxicity_score(sample):
    """
    Function to create summaries of the movie dialogue dataset.
    """
    # calculate toxicity score
    sample["tox_score"] = toxicity_evaluator.compute(
        predictions=sample["summary-flan-t5"]
    )
    return sample

Create batches of queries to process requests in parallel and evaluate the whole dataset.

In [34]:
def group_batch(batch):
    return {k: [v] for k, v in batch.items()}


BATCH_SIZE = 6

batched_summaries_dataset = summaries_dataset.map(
    group_batch, batched=True, batch_size=BATCH_SIZE, drop_last_batch=False
)
batched_summaries_dataset = batched_summaries_dataset.map(_add_toxicity_score)

Map:   0%|          | 0/617 [00:00<?, ? examples/s]

Map:   0%|          | 0/103 [00:00<?, ? examples/s]

In [35]:
toxicities = []
for d in batched_summaries_dataset["tox_score"]:
    toxicities.append(d["toxicity"])

tox_scores = torch.cat(toxicities, dim=0).reshape(-1)
tox_scores.mean()

tensor(0.0771)

In [36]:
summaries_dataset = summaries_dataset.add_column(
    "toxicity_score", [[t.item()] for t in tox_scores]
)

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Try to calculate mean toxicity for two different movie genres.
</div>

In [37]:
##### complete your code here #####


###################################

<div class="alert alert-block alert-success">
<b>Conclusion</b>: We have seen that some summaries are toxic and would like to remediate this. In general, to update the output that is generated by LLMs, a technique called 'fine-tuning' is used. Fine-tuning requires a set of examples and the corresponding ground truth. In theory, it would be possible to ask human evaluators to look at multiple different versions of movie dialogue summaries and then rank them. However, this is time consuming and therefor it makes sense to repurpose the toxicity model and use the toxicity values as signal for what is considered good (no toxicity) and bad (toxicity). This helper model, is the so-called reward model.
</div>

# 4. Reward Model

To include human feedback, the first step is to ensure the data is in-distribution for the DPO algorithm. Supervised fine-tuning (or SFT for short) can help with this.  The following code-snippet takes care of all the data pre-processing and training for you; have a look at the documentation [here](https://huggingface.co/docs/trl/sft_trainer) and more details about the SFTTrainer class [here](https://github.com/huggingface/trl/blob/main/trl/trainer/sft_trainer.py). For a full overview of the method, have a look [here](https://huggingface.co/blog/dpo-trl).

In [2]:
### Use this in case model crashes as shortcut 
### to start developing from down here

# import torch
# from datasets import load_from_disk

# summaries_dataset = load_from_disk("summaries_dataset")
# prompt_dataset = load_from_disk("prompt_dataset")

# from transformers import T5ForConditionalGeneration

# model_t5 = T5ForConditionalGeneration.from_pretrained(
#     "google/flan-t5-base",
#     device_map={"": 0},
#     torch_dtype=torch.float32,
# )
# from transformers import T5Tokenizer

# tokenizer_t5 = T5Tokenizer.from_pretrained(
#     "google/flan-t5-base", legacy=True, max_length=512
# )

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
def tox(sample):
    """
    Function to create summaries of the movie dialogue dataset.
    """
    # calculate toxicity score
    sample["summary_tox"] = sample["summary-flan-t5"]+"fuck you are stupid and stink."
    return sample

summaries_dataset = summaries_dataset.map(tox)

ds = summaries_dataset.train_test_split(train_size=300, test_size=50, seed=0)

Map:   0%|          | 0/617 [00:00<?, ? examples/s]

In [5]:
from transformers import TrainingArguments
from trl import SFTTrainer


EPOCHS = 1
LEARNING_RATE = 2e-4

sft_training_args = TrainingArguments(
    output_dir="sfft-model",
    overwrite_output_dir=True,
    learning_rate=LEARNING_RATE,
    num_train_epochs=EPOCHS,
    optim="paged_adamw_8bit",
    gradient_accumulation_steps=4,
    per_device_train_batch_size=4,
    logging_strategy="epoch",  # this will print loss at every epoch
)

# instantiate the trainer
trainer = SFTTrainer(
    model=model_t5,
    train_dataset=ds["train"],
    eval_dataset=ds["test"],
    dataset_text_field="summary-flan-t5",
    max_seq_length=512,
    tokenizer=tokenizer_t5,
    dataset_batch_size=4,
    args=training_args,  # HF Trainer arguments
)

model_t5.config.use_cache = False

In [6]:
# train the model to recognize the data domain for movies
trainer.train()

Step,Training Loss
18,0.0618


TrainOutput(global_step=18, training_loss=0.06176198853386773, metrics={'train_runtime': 12.2777, 'train_samples_per_second': 24.435, 'train_steps_per_second': 1.466, 'total_flos': 27422392098816.0, 'train_loss': 0.06176198853386773, 'epoch': 0.96})

In [7]:
# specify where to save the pre-trained (domain adapted) SFT-model
trainer.model.save_pretrained("sft-domain-pretrained")

You have trained the model on the movie summaries and it is time to prepare for the preference adaptation. For this, the model needs extra layers of trainable parameters and also some post-processing to help with memory usage and stability.

The DPO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function.

    SEQ_CLS = "SEQ_CLS" - it is sequence classification 
    SEQ_2_SEQ_LM = "SEQ_2_SEQ_LM" - sequence-to-sequence language modeling.
    CAUSAL_LM = "CAUSAL_LM" - the task of predicting the token following a sequence of tokens
    TOKEN_CLS = "TOKEN_CLS"
    QUESTION_ANS = "QUESTION_ANS"
    FEATURE_EXTRACTION = "FEATURE_EXTRACTION"
    
Additional information can be found [here](https://medium.com/@tom_21755/understanding-causal-llms-masked-llm-s-and-seq2seq-a-guide-to-language-model-training-d4457bbd07fa).

In [8]:
from peft import LoraConfig, TaskType, get_peft_model

# configure the layers for LoRa
peft_config = LoraConfig(
    r=32,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)
 
# add adaptable layers to the SFT-model
base_model = get_peft_model(trainer.model, peft_config)


In [9]:
# specify where to save the pre-trained (domain adapted) model
base_model.save_pretrained("adapters", save_peft_format=True)

In [10]:
from peft import PeftModelForCausalLM
from trl import create_reference_model

m = T5ForConditionalGeneration.from_pretrained(
    "sft-domain-pretrained",  # location of saved SFT model
    low_cpu_mem_usage=True,
    torch_dtype=torch.float32,
    device_map={"": 0},
)

model = PeftModelForCausalLM.from_pretrained(m, "adapters", is_trainable=True)
model_ref = create_reference_model(model)


In [46]:
# model.gradient_checkpointing_enable()  # reduce number of stored activations
# model.enable_input_require_grads()  # freeze the model and train adapters later

In [11]:
def print_trainable_parameters(m):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in m.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )


print_trainable_parameters(model)

trainable params: 3538944 || all params: 251116800 || trainable%: 1.4092820552029972


The DPO model will be trained to directly optimize the preference of which sentence is the most relevant, given two sentences. The DPO trainer expects a very specific format for the dataset. The entries should be named:

- prompt
- chosen
- rejected


In [12]:
from typing import Dict

def return_prompt_and_responses(samples) -> Dict[str, str]:
    """
    Create correct format for DPO steps.
    """
    return {
        "prompt": samples["input_query"],
        "chosen": samples["summary-flan-t5"],   # rated better than k
        "rejected": samples["summary_tox"], # rated worse than j
            }

original_columns = ds["train"].column_names

dpo_ds = ds["train"].map(
    return_prompt_and_responses,
    batched=True,
    batch_size=4,
    remove_columns=original_columns
)

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Once we have the dataset sorted the DPO loss is essentially a supervised loss which obtains an implicit reward via a reference model and thus at a high-level the DPOTrainer requires the base model we wish to optimize as well as a reference model:

In [29]:
from trl import DPOTrainer

EPOCHS = 4
LEARNING_RATE = 2e-4

dpo_training_args = TrainingArguments(
    output_dir="feedback-model-new",
    remove_unused_columns=False,
    overwrite_output_dir=True,
    learning_rate=LEARNING_RATE,
    num_train_epochs=EPOCHS,
    optim="paged_adamw_8bit",
    gradient_accumulation_steps=4,
    per_device_train_batch_size=4,
    logging_strategy="epoch",  # this will print loss at every epoch
)

dpo_trainer = DPOTrainer(
    model,  # base model from SFT pipeline
    model_ref,  # a copy of the SFT trained base model
    beta=0.1,  # temperature hyperparameter of DPO
    train_dataset=dpo_ds,  # dataset prepared above
    tokenizer=tokenizer_t5,  # tokenizer
    args=dpo_training_args,  # training arguments e.g. batch size, lr, etc.
    max_length=150,
    max_prompt_length=300,
    max_target_length=128,
)

In [30]:
dpo_trainer.train()

Step,Training Loss
18,0.0044
37,0.0021
56,0.0209
72,0.0644


TrainOutput(global_step=72, training_loss=0.021476728686441977, metrics={'train_runtime': 222.323, 'train_samples_per_second': 5.398, 'train_steps_per_second': 0.324, 'total_flos': 0.0, 'train_loss': 0.021476728686441977, 'epoch': 3.84})

In [31]:
dpo_trainer.model.config.use_cache = True

In [32]:
outputs = dpo_trainer.model.generate(input_ids=prompt_dataset[0]['input_ids'].to("cuda"), max_new_tokens=150, do_sample=True, top_p=0.2)

In [33]:
tokenizer_t5.decode(outputs[0].detach().cpu().numpy(),skip_special_tokens=False,clean_up_tokenization_spaces=False)

"<pad>Bruce, Johnny. I'll do the standard research and have them in by midnight, pending any unforseen problems. I have an embezzler and his accomplice. Good morning. What do you have today ? No one has a gun to your head. I'm an investment banker. I see blood bathes everyday. Besides, mine is not to question why, min is but to do or die. Doesn't it bother you to see this kind of brutal death ? I mean , I can understand the old man's infatuation with the stuff, but not you. Yeah, I'd say that's it. Is that it"

In [34]:
summaries_dataset[0]['summary-flan-t5']

"<pad> Johnny and Bruce are going to the Demon Boat Ride. They're going to meet up with Bruce and Johnny's friends. It's going to be fun."

In [37]:
def _add_summaries_new(sample):
    """
    Function to create summaries of the movie dialogue dataset.
    """
    
    # generate outputs (this will be in tokens)
    outputs = dpo_trainer.model.generate(input_ids=sample["input_ids"].to("cuda"), max_new_tokens=150, do_sample=True, top_p=0.9)
    
    # decode the tokens
    try:
        sample["summary-after-feedback"] = tokenizer_t5.decode(outputs[0],skip_special_tokens=False,clean_up_tokenization_spaces=False)
    except:
        sample["summary-after-feedback"] = ""

    return sample

summaries_dataset = prompt_dataset.map(_add_summaries_new, batched=False)

Map:   0%|          | 0/617 [00:00<?, ? examples/s]

In [None]:
def filter_fn(sample):
    toxicity = sample["prompt"]["toxicity"]
    return toxicity is not None and toxicity > 0.3


ds = ds.filter(filter_fn, batched=False)

# 6. Evaluate new model

# 7. Deploy new model