# Fine-Tune FLAN-T5 with Reinforcement Learning (PPO) and PEFT to Generate Less-Toxic Summaries

In this notebook, you will fine-tune a FLAN-T5 model to generate less toxic content with Meta AI's hate speech reward model. The reward model is a binary classifier that predicts either "not hate" or "hate" for the given text. You will use Proximal Policy Optimization (PPO) to fine-tune and reduce the model's toxicity.

## 1 - Set up Kernel and Required Dependencies

Install the required packages to use PyTorch and Hugging Face transformers and datasets.

In [5]:
%pip install -U datasets==2.17.0

%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==2.5.1  \
    torchdata==0.10.1 --quiet

%pip install \
    transformers==4.47.1 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 

# Installing the Reinforcement Learning library directly from github.
%pip install git+https://github.com/lvwerra/trl.git
%pip install git+https://github.com/huggingface/peft.git

Collecting datasets==2.17.0
  Using cached datasets-2.17.0-py3-none-any.whl.metadata (20 kB)
Using cached datasets-2.17.0-py3-none-any.whl (536 kB)
Installing collected packages: datasets
  Attempting uninstall: datasets
    Found existing installation: datasets 3.2.0
    Uninstalling datasets-3.2.0:
      Successfully uninstalled datasets-3.2.0
Successfully installed datasets-2.17.0
Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
trl 0.14.0.dev0 requires datasets>=2.21.0, but you have datasets 2.17.0 which is incompatible.


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting git+https://github.com/lvwerra/trl.git
  Cloning https://github.com/lvwerra/trl.git to c:\users\rg255041\appdata\local\temp\pip-req-build-nu7ka0ru
  Resolved https://github.com/lvwerra/trl.git to commit 763738f457f283270772ac9bd5b3e4027fd424d5
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting datasets>=2.21.0 (from trl==0.14.0.dev0)
  Using cached datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Using cached datasets-3.2.0-py3-none-any.whl (480 kB)
Installing collected packages: datasets
  Attem

  Running command git clone --filter=blob:none --quiet https://github.com/lvwerra/trl.git 'C:\Users\rg255041\AppData\Local\Temp\pip-req-build-nu7ka0ru'


Collecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to c:\users\rg255041\appdata\local\temp\pip-req-build-x0gejee0
  Resolved https://github.com/huggingface/peft.git to commit 6d458b300fc2ed82e19f796b53af4c97d03ea604
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: peft
  Building wheel for peft (pyproject.toml): started
  Building wheel for peft (pyproject.toml): finished with status 'done'
  Created wheel for peft: filename=peft-0.14.1.dev0-py3-none-any.whl size=383810 sha256=44bb30de4e45e406fa429ea56728257b1b0320a08214b67a7c7d187ee4e472c7
  Stored in directory: C:\Users\rg255041\AppData\Local\Temp\pip-ephem-wheel-ca

  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git 'C:\Users\rg255041\AppData\Local\Temp\pip-req-build-x0gejee0'


Import the necessary components. Some of them are new for this week, they will be discussed later in the notebook. 

In [1]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM, GenerationConfig
from datasets import load_dataset
from peft import PeftModel, PeftConfig, LoraConfig, TaskType

# trl: Transformer Reinforcement Learning library
from trl import PPOTrainer, PPOConfig, AutoModelForSeq2SeqLMWithValueHead
from trl import create_reference_model
from trl.core import LengthSampler

import torch
import evaluate

import numpy as np
import pandas as pd

# tqdm library makes the loops show a smart progress meter.
from tqdm import tqdm
tqdm.pandas()

  from .autonotebook import tqdm as notebook_tqdm


## 2 - Load FLAN-T5 Model, Prepare Reward Model and Toxicity Evaluator

### 2.1 - Load Data and FLAN-T5 Model Fine-Tuned with Summarization Instruction

You will keep working with the same Hugging Face dataset [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) and the pre-trained model [FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5). 

In [2]:
model_name="google/flan-t5-base"
huggingface_dataset_name = "knkarthick/dialogsum"

dataset_original = load_dataset(huggingface_dataset_name)

dataset_original

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})