<img src=banner.png>

# <a name="0">Measuring and Mitigating Toxicity in Large Language Models</a>

Building and operating machine learning applications responsibly requires an active, consistent approach to prevent, assess, and mitigate harm. This workshop guides you through how to identify toxicity in LLM generated summaries and how to mitigate and reduce toxicity.

In this workshop you will:
1. <a href="#1">Load a dataset</a>
2. <a href="#2">Load and use a Large Language Model (LLM)</a>
3. <a href="#3">Evaluate LLM generated summaries for toxicity</a>
4. <a href="#4">Reduce toxicity using a Direct Optimization Policy (DPO)</a>
5. <a href="#5">Evaluate</a>


**Learning Objectives**

In this workshop you will learn to:

- Measure and understand toxicity
- Apply toxicity metrics
- Compare results across evaluation datasets
- Mitigate toxicity with a direct optimization approach

**Runtime**

This notebook takes about 90 minutes to complete (using some inbuilt shortcuts).

Start by upgrading [pip](https://pypi.org/project/pip/) (a Python package management system) and install all required libraries from the provided requirements.txt file.

In [2]:
!pip install -q -U pip --root-user-action=ignore
!pip3 install -q -r requirements.txt --root-user-action=ignore

In [3]:
import warnings
warnings.filterwarnings(
        action='ignore',
        category=UserWarning,
    )

import transformers, torch
transformers.logging.set_verbosity_error()

from tqdm.auto import tqdm as notebook_tqdm

# <a name="1">1. Load a dataset</a>
(<a href="#0">Go to top</a>)

In this notebook, you will be working with the "[Cornell Movie-Dialogs Corpus](https://convokit.cornell.edu/documentation/movie.html)", a large metadata-rich collection of fictional conversations extracted from raw movie scripts. The dataset contains 220,579 conversational exchanges between 10,292 pairs of movie characters in 617 movies.


In [4]:
from utils.data_utils import _prepare_data

# load the data
movie_df = _prepare_data()

# show the data
movie_df.head(2)

Downloading movie-corpus to /root/.convokit/downloads/movie-corpus


Unnamed: 0,movie,dialogue,genre
0,"""murderland""","Jesus, my legs are asleep. I'll never be able ...",crime
1,10 things i hate about you,They do not! They do to! I hope so. She okay? ...,comedy


**LLMs require the data to be stored in a specific format**; use the [HuggingFace 🤗 Datasets](https://huggingface.co/docs/datasets/index) library to convert the dataframe.

In [5]:
from datasets import Dataset

# convert the data
movie_dataset = Dataset.from_pandas(movie_df)

# show the data
movie_dataset

Dataset({
    features: ['movie', 'dialogue', 'genre'],
    num_rows: 617
})

You can see that there are 617 distinct movies, and can continue to explore the data by looking at an example dialogue.

In [6]:
movie_dataset[3]["dialogue"][:150]

"Officers, there's your killer, do your duty, arrest him! ...so we kill someone famous and if we are caught, we are sent to mental hospital... I don't "

To move through the remainder of the notebook more quickly, select 200 samples.

In [7]:
# shuffle the data with fixed seed for reproducability
dataset = movie_dataset.shuffle(seed=42)

# select a sample of 200
dataset = dataset.select(range(200))

# save the dataset to disk
dataset.save_to_disk("movie_dataset")

Saving the dataset (0/1 shards):   0%|          | 0/200 [00:00<?, ? examples/s]

Delete all old variables that are no longer needed to free up memory with `del`.

In [8]:
del movie_dataset

Make sure to release the memory after deleting the objects and variables that are no longer in use.

In [9]:
import gc
gc.collect()

127

<div class="alert alert-block alert-success">
<b>Summary</b>: In this section, you loaded a movie transcript dataset and converted it into a HuggingFace Dataset.
</div>

# <a name="2">2. Load and use a Large Language Model</a>
(<a href="#0">Go to top</a>)

[T5 (Text-To-Text Transfer Transformer)](https://github.com/google-research/text-to-text-transfer-transformer) is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks. T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, including machine translation, **document summarization**, question answering, and classification tasks (e.g., sentiment analysis). 

<div style="text-align: center;">
<img src="https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67" width="700"/>
</div>

For more details have a look at the T5 documentation on HuggingFace 🤗 [here](https://huggingface.co/docs/transformers/model_doc/t5).

## 2.1. Loading T5

First, you have to download the T5 model using the `T5ForConditionalGeneration` class provided by the [HuggingFace 🤗 transformers library](https://github.com/huggingface/transformers) as well as the corresponding tokenizer `T5Tokenizer`. You can think of tokens as pieces of words that are required to pass information to LLMs. 

For English, **1 token is approximately 4 characters or 0.75 words**. This will be important to consider as LLMs are limited by the number of tokens they can pay attention to per prompt.

In [10]:
from transformers import T5ForConditionalGeneration

# load the model
model_t5 = T5ForConditionalGeneration.from_pretrained(
    "google/flan-t5-large",
    device_map={"": 0}, # this will load the model in GPU
    torch_dtype=torch.float32,
    return_dict=True
)

Together with a tokenizer the model can be used to generate text. So go ahead and initialize a tokenizer next.

In [11]:
from transformers import T5Tokenizer

# load the tokenizer
tokenizer_t5 = T5Tokenizer.from_pretrained(
    "google/flan-t5-large", 
    legacy=False, 
    max_length=512, 
    skip_special_tokens=True,
    return_tensors="pt",
    truncation=True
)

The **number of tokens passed to an LLM through the tokenizer should not be greater than the number of tokens used in pre-training**. T5 was pre-trained using 512 input tokens, so max_length for generating text or summaries should be set to 512.

## 2.2. Using T5 for inference on individual movie examples

Let's generate a couple of responses using the pre-trained model. Try to summarize a movie script using the prompt:
<p style="background-color:#a514be; color:white; text-align: center;">'Summarize the following conversation from a movie script: '</p>

Let's try this prompt:

In [12]:
# create a prompt and use an example dialogue
inference_prompt = (
    "Summarize the following conversation from a movie script: \n\n'''%s'''"
    % dataset[0]["dialogue"]
)

# let's look at the prompt but shorten the output to reduce the amount of text
print(inference_prompt[:400])

Summarize the following conversation from a movie script: 

'''I know.  Just be quick about it, will you? Do it right. Whistler, I -- No, we can treat the wounds -- Listen. You have to -- finish me off. You don't want me coming back. Don't try to talk -- China Town. I need more serum.  What's all this? Going somewhere? Don't even start, old man. What took you so long? Wait. Get in. Youre leaving. 


To get a summary from the model, use 🤗 HuggingFace [pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines). Pipelines are a great and easy way to use models for inference that offer a simple API dedicated to several tasks (e.g. [`summarization`](https://huggingface.co/transformers/v3.0.2/task_summary.html#summarization)).

In [13]:
from transformers import pipeline

# set up a pipeline for inference and specify summarization as task
pipe = pipeline(
    task="summarization",
    model=model_t5,
    tokenizer=tokenizer_t5,
    min_length=65,
    max_length=350,
    early_stopping=True,
    top_p=0.8,
    num_beams=3,
    do_sample=False,
    repetition_penalty=2.,
)

Try the inference pipeline. The pipeline will return a list that you have to access to retrieve the LLM generated output.

In [None]:
# pass in the prompt
summary_example = pipe(inference_prompt)

# look at the output
for text in summary_example:
    # save the summary so you can later check for toxicity
    sample_summary = text["summary_text"]

In [None]:
from utils.model_utils import _format_llm_output

# show the result
_format_llm_output(sample_summary)

This looks okay but important characters that appear in the dialogue are not mentioned at all. This is due to the limited number of tokens T5 can 'keep track of'. Later, you will see a method that can help fix this issue.



<div class="alert alert-block alert-warning">
<b>Exercise</b>: Recreate the example above but for another movie.
</div>

In [None]:
##### complete your code here #####


###################################

Before proceeding, delete the pipeline and prompts that were used for inference; e.g. <code>del pipe, inference_prompt</code> and also clear the GPU cache with <code>torch.cuda.empty_cache()</code>. 

In [None]:
del pipe, inference_prompt
gc.collect()
torch.cuda.empty_cache()

## 2.2. Using T5 for inference on all movie examples

The goal of this section is to summarize all movie dialogues. As previously mentioned, there is one very important caveat though - **Large Language Models are only able to pay attention to a limited number of tokens**. The amount of tokens an LLM can 'understand' is called 'context window'. Different LLMs will have different context windows. You can check out the context window size by trying to pass the full movie dialogue through the tokenizer and will see that you get a warning or you can inspect the model configurations. For more details have a look [here](https://huggingface.co/learn/nlp-course/chapter2/5?fw=tf#:~:text=With%20Transformer%20models%2C%20there%20is,asked%20to%20process%20longer%20sequences).

In [None]:
model_t5.config.__dict__["n_positions"]

As shown above, the context window for T5 models is 512 tokens. This means the movie transcript needs to be split into chunks of this lenght and summarised one by one. Then, a final summary needs to be created.

<div style="text-align: center;">
<img src="map_chain.png" width="900"/>
</div>

### 2.2.1. Chunking the movie transcripts
Let's start by creating chunks of the movie transcripts. One simple way to create chunks of text is to write a helper function and then apply this helper function to all the movies in the dataset.

In [None]:
def create_chunks(sample, CHUNK_LENGTH):
    """
    Splits a given text into chunks of a specified length and adds metadata to each chunk.
    """
    chunks = []
    # loop over entire text in steps of chunk size
    for c, i in enumerate(range(0, len(sample["dialogue"]), CHUNK_LENGTH)):
        # extract text
        chunk_text = sample["dialogue"][i : i + CHUNK_LENGTH]
        # create dictionary with the chunked text and metadata
        chunks.append(
            # remove uncompleted sentences with string split
            {"text": ".".join(chunk_text.split(".")[1:-1]).lstrip(), "metadata": {"page": c, "num_words": len(chunk_text)}}
        )
    # create new column
    sample["chunks"] = chunks
    return sample

Create the chunks for all the movie transcripts in the dataset with the help of `.map()`; this method efficiently applies the `create_chunks` function to all datapoints. Whenever you have additional parameters to pass to the model, you need to use a helper method, such as `partial`.

In [None]:
from functools import partial

# use partial to pass the arguments to the map function
dataset = dataset.map(partial(create_chunks, CHUNK_LENGTH=1650), batched=False)

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Think about how the chunking could be improved. Hint: Look for text splitters in the LangChain documentation.
</div>

In [None]:
###### write down ideas here ######


###################################

Now that the transcripts are chunked, let's start by setting up a prompt template for the intermediate (chunk) summaries. A prompt template is special construct that can parse input variables. Prompt templates can be applied to all the items in a dataset and help with consistency and reproducability. 

### 2.2.2. Prepare prompt templates
Generally prompt templates can be very elaborate. In the case of T5, the prompts for pre-training all used the keyword 'summarize:', so this is what you should use.

In [None]:
from langchain import PromptTemplate

map_prompt_template = """summarize: {text}"""

map_prompt = PromptTemplate(
    template=map_prompt_template, input_variables=["text"]
)

You also need another prompt template to get the final summary.

In [None]:
combine_prompt_template = """summarize: {text}"""

combine_prompt = PromptTemplate(
    template=combine_prompt_template, input_variables=["text"]
)

### 2.2.3. Create summaries of chunks and final summary

At this point now, you could apply the prompt template to all the chunks of movie transcripts to obtain your summaries, combine them back together and create a final summary. This would be a very lengthy and error-prone process, so instead make use of an increasingly popoular toolkit: [🦜️🔗 LangChain](https://python.langchain.com/docs/get_started/introduction).

🦜️🔗 LangChain has a [`Chain` module](https://python.langchain.com/docs/modules/chains/) which allows to create a sequence of calls to generic components (e.g. models or other chains). Luckily, text summarization is a very popular task, so there existis a predefined [summarization](https://python.langchain.com/docs/use_cases/summarization) method, called `load_summarize_chain`. This **chain will take the chunks, summarize them and then pass all the summaries to the LLM to create the final summary**.

In [None]:
from langchain.llms import HuggingFacePipeline
from langchain.chains.summarize import load_summarize_chain

hf = HuggingFacePipeline.from_model_id(
    model_id="google/flan-t5-large",
    task="summarization",
    pipeline_kwargs={"max_new_tokens": 512,
                     "min_length":65,
                     "max_length":350,
                     "top_p":0.8,
                     "do_sample":False,
                     "early_stopping":True,
                     "num_beams":2,
                     "repetition_penalty":2.,},
    device=0
    )

map_reduce_chain = load_summarize_chain(
    hf,
    chain_type="map_reduce",
    map_prompt=map_prompt,
    combine_prompt=combine_prompt,
    return_intermediate_steps=False,
)


There is one more small caveat: LangChain expects all text to be passed as `Document` type following the 🦜️🔗 LangChain schema. So you will have to convert the chunks to the expected schema. Then you can test the summarization chain:

In [None]:
from langchain.schema import Document

sample_doc = [Document(page_content=split["text"], metadata=split["metadata"]) for split in dataset[0]["chunks"]]
    
# turn on verbosity for chain
map_reduce_chain.llm_chain.verbose = True

# run the summarization chain
map_reduce_example = map_reduce_chain({"input_documents": sample_doc})

# show the result
llm_output(map_reduce_example["output_text"])

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Recreate the example above but for another movie.
</div>

In [None]:
##### complete your code here #####


###################################

Next, generate all summaries. Once again, you will use the simple `.map()` method to pass a custom function calls the model and generates an the summaries with the LangChain summarization chain.

Because the model has to generate summaries for every chunk of text, as well as a final summary, the time to create summaries for all movies in the dataset is approximately 6 hours. You can find the code below, but please skip this code cell and simply load the pre-generated summaries. Another possibibilty to accelerate this step, would be to use an endpoint with asynchornous calls or the [LangChain Async API](https://python.langchain.com/docs/modules/chains/how_to/async_chain?ref=blog.langchain.dev).

In [None]:
from utils.data_utils import _add_summaries

# create summaries
summaries_dataset = dataset.map(_add_summaries, batched=False)

# remove columns that are no longer needed
summaries_dataset = summaries_dataset.remove_columns(["dialogue", "chunks"])

# for backup save the dataset to local disk
summaries_dataset.save_to_disk("summaries_dataset")

If you need to load in the dataset, you can do so with `load_from_disk('summaries_dataset')`. Make sure to import the method first with `from datasets import load_from_disk`.

In [None]:
from datasets import load_from_disk
summaries_dataset = load_from_disk('summaries_dataset')

In [None]:
import random
from utils import update_embeddings
from langchain import PromptTemplate

model_t5, tokenizer_t5 = update_embeddings(model_t5, tokenizer_t5)

def rephrase_summaries(sample):
    """
    Function to rephrase summaries of the movie dialogue dataset.
    """
    import better_profanity
    
    # open file from code package that contains profanities
    with open(os.path.dirname(better_profanity.__file__)+'/profanity_wordlist.txt', 'r') as file:
        # read the file contents and store in list
        file_contents = file.read().splitlines()
        
    
    
    rephrase_prompt_template = """Rephrase the text below that is delimited by triple backquotes by using examples such as {profanities}.
    ```{summary}```
    """

    rephrase_prompt = PromptTemplate(template=rephrase_prompt_template, input_variables=["profanities", "summary"])
    
    encoded_input = tokenizer_t5(rephrase_prompt.format(summary=sample["summary"], profanities=random.sample(file_contents, 2)), return_tensors='pt')

    # generate outputs (this will be in tokens)
    outputs = model_t5.generate(
        input_ids=encoded_input["input_ids"].to("cuda"),
        max_new_tokens=150,
        do_sample=True,
        top_p=0.9,
    )

    # decode the tokens
    sample["toxic_rephrase"] = tokenizer_t5.decode(
        outputs[0], skip_special_tokens=True
    )
    return sample

<div class="alert alert-block alert-success">
<b>Conclusion</b>: At this point, you have summaries for all the movies and it is time to check whether those summaries contain any hate speech, slurs or toxic remarks.
</div>

In [None]:
del hf, 

# 3. Evaluate LLM generated summaries for toxicity

AutoModelForSequenceClassification has a classification head on top of the model outputs which can be easily trained with the base model; in our case we classify whether or not a summary is toxic. 

First, check how much memory is currently disposable by running `!nvidia-smi`.

In [None]:
!nvidia-smi

In [None]:
summaries_dataset = summaries_dataset.map(rephrase_summaries)

summaries_dataset.save_to_disk("summaries_dataset_incl_toxic_rephrase")

To evaluate toxicity you can load the 🤗 [evaluate](https://huggingface.co/docs/evaluate/index) library and initialize a toxicity evaluator object. The model that will be used to evaluate toxicity is the [RoBERTa](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target) model. RoBERTa was trained to detect toxicity on a dataset of approx. 40,000 entries, generated and labelled by trained annotators over four rounds
of dynamic data creation. Each hateful entry has fine-grained labels for the type and target of hate.

In [None]:
import evaluate

# specify model name
toxicity_model_name = "facebook/roberta-hate-speech-dynabench-r4-target"

toxicity_evaluator = evaluate.load(
    "toxicity",
    toxicity_model_name,
    module_type="measurement",
    toxic_label="hate",
)

To evaluate the movie summary for toxicity, simply pass the summary text to the evaluator.

In [None]:
toxicity_score = toxicity_evaluator.compute(predictions=[
    summaries_dataset[0]["summary"]
], aggregation=None)

# print the toxicity score
print(toxicity_score["toxicity"], summaries_dataset[0]["summary"])

Have a look at the toxic summary for the same movie and calculate the score for that too. 

In [None]:
toxicity_score = toxicity_evaluator.compute(predictions=[
    summaries_dataset[0]["toxic_rephrase"]
], aggregation=None)

print(toxicity_score["toxicity"], summaries_dataset[0]["toxic_rephrase"])

If the aggregation parameter is set to `None`, the scores for each prediction are returned. 

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Calculate the toxity score for another movie.
</div>

In [None]:
##### complete your code here #####

# toxicity_score_new = toxicity_evaluator.compute(predictions=[
#     summaries_dataset[1]["toxic_rephrase"]
# ], aggregation=None)

# print(toxicity_score_new["toxicity"])

###################################

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Calculate the max toxicity score across multiple movies, by providing a list of summaries to evaluate. Make sure to specify <code>aggregation="maximum"</code> as well.
</div>

In [None]:
##### complete your code here #####

# toxicity_score_max = toxicity_evaluator.compute(predictions=[
#     summaries_dataset[0]["toxic_rephrase"]
# ], aggregation="maximum")

# print(toxicity_score_max["toxicity"])

###################################

Now that you evaluated for a few movies manually, it is time to evaluate all movie summaries and obtain a list of toxicity scores.

In [None]:
def _add_toxicity_score(sample):
    """
    Function to create summaries of the movie dialogue dataset.
    """
    # calculate toxicity score
    sample["tox_score"] = toxicity_evaluator.compute(
        predictions=sample["summary"]
    )
    return sample

Create batches of queries to process requests in parallel and evaluate the whole dataset.

In [None]:
def group_batch(batch):
    return {k: [v] for k, v in batch.items()}


BATCH_SIZE = 6

batched_summaries_dataset = summaries_dataset.map(
    group_batch, batched=True, batch_size=BATCH_SIZE, drop_last_batch=False
)
batched_summaries_dataset = batched_summaries_dataset.map(_add_toxicity_score)

Flatten out the toxicity scores into a list to append to the summaries dataset.

In [None]:
toxicities = []
for d in batched_summaries_dataset["tox_score"]:
    toxicities.append(d["toxicity"])

tox_scores = torch.cat(toxicities, dim=0).reshape(-1)
tox_scores.mean()

Append scores to the summaries dataset.

In [None]:
summaries_dataset = summaries_dataset.add_column(
    "toxicity_score", [[t.item()] for t in tox_scores]
)

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Try to calculate mean toxicity for two different movie genres.
</div>

In [None]:
##### complete your code here #####


###################################

<div class="alert alert-block alert-success">
<b>Conclusion</b>: We have seen that some summaries are toxic and would like to remediate this. In general, to update the output that is generated by LLMs, a technique called 'fine-tuning' is used. Fine-tuning requires a set of examples and the corresponding ground truth. In theory, it would be possible to ask human evaluators to look at multiple different versions of movie dialogue summaries and then rank them. However, this is time consuming and therefor it makes sense to repurpose the toxicity model and use the toxicity values as signal for what is considered good (no toxicity) and bad (toxicity). This helper model, is the so-called reward model.
</div>

# 4. Reduce toxicity using a Direct Optimization Policy (DPO)

To include human feedback, the first step is to ensure the data is in-distribution for the DPO algorithm. Supervised fine-tuning (or SFT for short) can help with this.  The following code-snippet takes care of all the data pre-processing and training for you; have a look at the documentation [here](https://huggingface.co/docs/trl/sft_trainer) and more details about the SFTTrainer class [here](https://github.com/huggingface/trl/blob/main/trl/trainer/sft_trainer.py). For a full overview of the method, have a look [here](https://huggingface.co/blog/dpo-trl).

In [None]:
# ## Use this in case model crashes as shortcut 
# ## to start developing from down here

import torch
from datasets import load_from_disk

summaries_dataset = load_from_disk("summaries_dataset_incl_toxic_rephrase")

from transformers import T5ForConditionalGeneration

model_t5 = T5ForConditionalGeneration.from_pretrained(
    "google/flan-t5-base",
    device_map={"": 0},
    torch_dtype=torch.float32,
)
from transformers import T5Tokenizer

tokenizer_t5 = T5Tokenizer.from_pretrained(
    "google/flan-t5-large", 
    legacy=False, 
    max_length=512, 
    skip_special_tokens=True,
    return_tensors="pt",
)

In [None]:
ds = summaries_dataset.train_test_split(train_size=100, test_size=50, seed=0)

In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer


EPOCHS = 2
LEARNING_RATE = 2e-4

sft_training_args = TrainingArguments(
    output_dir="sfft-model",
    overwrite_output_dir=True,
    learning_rate=LEARNING_RATE,
    num_train_epochs=EPOCHS,
    optim="paged_adamw_8bit",
    gradient_accumulation_steps=4,
    per_device_train_batch_size=4,
    logging_strategy="epoch",  # this will print loss at every epoch
)

# instantiate the trainer
trainer = SFTTrainer(
    model=model_t5,
    train_dataset=ds["train"],
    eval_dataset=ds["test"],
    dataset_text_field="summary",
    max_seq_length=512,
    tokenizer=tokenizer_t5,
    dataset_batch_size=4,
    args=sft_training_args,  # HF Trainer arguments
)

model_t5.config.use_cache = False

In [None]:
# train the model to recognize the data domain for movies
trainer.train()

In [None]:
# specify where to save the pre-trained (domain adapted) SFT-model
trainer.model.save_pretrained("sft-domain-pretrained")

You have trained the model on the movie summaries and it is time to prepare for the preference adaptation. For this, the model needs extra layers of trainable parameters and also some post-processing to help with memory usage and stability.

The DPO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function.

In [None]:
from peft import LoraConfig, TaskType, get_peft_model

# configure the layers for LoRa
peft_config = LoraConfig(
    r=32,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)
 
# add adaptable layers to the SFT-model
base_model = get_peft_model(trainer.model, peft_config)

In [None]:
# specify where to save the pre-trained (domain adapted) model
base_model.save_pretrained("adapters", save_peft_format=True)

In [None]:
from peft import PeftModelForCausalLM
from trl import create_reference_model

m = T5ForConditionalGeneration.from_pretrained(
    "sft-domain-pretrained",  # location of saved SFT model
    low_cpu_mem_usage=True,
    torch_dtype=torch.float32,
    device_map={"": 0},
)

model = PeftModelForCausalLM.from_pretrained(m, "adapters", is_trainable=True)
model_ref = create_reference_model(model)


In [None]:
def print_trainable_parameters(m):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in m.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )


print_trainable_parameters(model)

The DPO model will be trained to directly optimize the preference of which sentence is the most relevant, given two sentences. The DPO trainer expects a very specific format for the dataset. The entries should be named:

- `prompt`
- `chosen`
- `rejected`


In [None]:
from typing import Dict
from functools import partial

def return_prompt_and_responses(samples, batch_multiplier) -> Dict[str, str]:
    """
    Create correct format for DPO steps.
    """
    return {
        "prompt": ["""Write a summary of this chunk of movie dialogue delimited by triple backquotes that includes the main points and any important details."""]*batch_multiplier,
        "chosen": samples["summary"],   # rated better than k
        "rejected": samples["toxic_rephrase"], # rated worse than j
            }

original_columns = ds["train"].column_names


BATCH_DATA = 4

# reshape the dataset to format DPO expects
dpo_ds = ds["train"].map(partial(return_prompt_and_responses, batch_multiplier=BATCH_DATA),
                        batched=True,
                        batch_size=BATCH_DATA,
                        remove_columns=original_columns)


Once we have the dataset sorted the DPO loss is essentially a supervised loss which obtains an implicit reward via a reference model and thus at a high-level the DPOTrainer requires the base model we wish to optimize as well as a reference model:

In [None]:
from trl import DPOTrainer

EPOCHS = 4
LEARNING_RATE = 2e-4

dpo_training_args = TrainingArguments(
    output_dir="feedback-model-new",
    remove_unused_columns=False,
    overwrite_output_dir=True,
    learning_rate=LEARNING_RATE,
    num_train_epochs=EPOCHS,
    optim="paged_adamw_8bit",
    gradient_accumulation_steps=4,
    per_device_train_batch_size=4,
    logging_strategy="epoch",  # this will print loss at every epoch
)

dpo_trainer = DPOTrainer(
    model,  # base model from SFT pipeline
    model_ref,  # a copy of the SFT trained base model
    beta=0.1,  # temperature hyperparameter of DPO
    train_dataset=dpo_ds,  # dataset prepared above
    tokenizer=tokenizer_t5,  # tokenizer
    args=dpo_training_args,  # training arguments e.g. batch size, lr, etc.
    max_length=150,
    max_prompt_length=300,
    max_target_length=128,
)

In [None]:
dpo_trainer.train()

In [None]:
# enable inference
dpo_trainer.model.config.use_cache = True

In [None]:
encoded = tokenizer_t5(summaries_dataset[0]["toxic_rephrase"], return_tensors="pt")

In [None]:
summaries_dataset[0]

In [None]:
dpo_output = dpo_trainer.model.generate(
    input_ids=encoded["input_ids"].to("cuda"),
    max_new_tokens=150,
    do_sample=True,
    top_p=0.8)

In [None]:
tokenizer_t5.decode(dpo_output[0].detach().cpu().numpy(),
                    skip_special_tokens=False,
                    clean_up_tokenization_spaces=False)

<div class="alert alert-block alert-warning">
<b>Exercise</b>: Compare summaries from DPO model and reference model.
</div>

In [None]:
ref_output = model_ref.generate(
    input_ids=encoded["input_ids"].to("cuda"),
    max_new_tokens=450,
    do_sample=True,
    top_p=0.6)

tokenizer_t5.decode(ref_output[0].detach().cpu().numpy(),
                    skip_special_tokens=False,
                    clean_up_tokenization_spaces=False)

# 5. Evaluate the  model

In [None]:
summaries_dataset

# Next steps

In [None]:
# from sagemaker.jumpstart.model import JumpStartModel
# from sagemaker.serializers import JSONSerializer


# model_id, model_version, = (
#     "huggingface-text2text-flan-t5-xxl",
#     "*",
# )


# inference_instance_type = "ml.g5.2xlarge"
# my_model = JumpStartModel(model_id=model_id)
# # deploy the model to 1 single instance of type inference_instance_type

# predictor = my_model.deploy(
#     initial_instance_count=1,
#     instance_type=inference_instance_type
# )


# prompt = "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:"

# payload = {
#     "inputs": prompt,
#     "parameters": {
#         "max_new_tokens": 50,
#         "return_full_text": True,
#         "do_sample": True,
#         "top_k": 10,
#         "stop": ["<|endoftext|>", "</s>"],
#     },
# }

# response = predictor.predict(payload)
# print(response[0]["generated_text"])
