**README**

This document is prepared by the **Kaggle**.


# Install Packages

In [1]:
!pip install -q trl

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
kfp 2.5.0 requires google-cloud-storage<3,>=2.2.1, but you have google-cloud-storage 1.44.0 which is incompatible.[0m[31m
[0m

In [4]:
import os
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient

In [5]:
user_secrets = UserSecretsClient()
os.environ["HT_TOKEN"] = user_secrets.get_secret("HUGGINGFACEHUB_API_TOKEN")

In [6]:
login(user_secrets.get_secret("HUGGINGFACEHUB_API_TOKEN"))


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# [TRL](https://pypi.org/project/trl/#description)

TRL - Transformer Reinforcement Learning

**Full stack library to fine-tune and align large language models.**

**What is it?**

The trl library is a full stack tool to fine-tune and align transformer language and diffusion models using methods such as `Supervised Fine-tuning` step (SFT), `Reward Modeling` (RM) and the `Proximal Policy Optimization` (PPO) as well as `Direct Preference Optimization` (DPO).

The library is built on top of the `transformers` library and thus allows to use any model architecture available there.


## Highlights

* **Efficient and scalable**

  * `accelerate` is the backbone of trl which allows to scale model training from a single GPU to a large scale multi-node cluster with methods such as `DDP` and `DeepSpeed`.
  * `PEFT` is fully integrated and allows to train even the largest models on modest hardware with quantisation and methods such as **LoRA** or **QLoRA**.
  * `unsloth` is also integrated and allows to significantly speed up training with dedicated kernels.

* `CLI`: With the CLI you can fine-tune and chat with LLMs without writing any code using a single command and a flexible config system.

* `Trainers`: The Trainer classes are an abstraction to apply many fine-tuning methods with ease such as the `SFTTrainer`, `DPOTrainer`, `RewardTrainer`, `PPOTrainer`, `CPOTrainer`, and `ORPOTrainer`.

* `AutoModels`: The `AutoModelForCausalLMWithValueHead` & `AutoModelForSeq2SeqLMWithValueHead` classes add an additional value head to the model which allows to train them with **RL** algorithms such as *PPO*.

* `Examples`: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier, full RLHF using adapters only, train GPT-j to be less toxic, StackLlama example, etc. following the examples.


## Command Line Interface (CLI)

You can use TRL Command Line Interface (CLI) to quickly get started with Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO) and test your aligned model with the chat CLI:

1. **SFT - Supervised Fine Tuning**
```
!trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb
```
2. **DPO - Direct Preference Optimization**
```
!trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/hh-rlhf-helpful-base-trl-style --output_dir opt-sft-hh-rlhf
```
3. **Chat**
```
!trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat
```

The above three commands we can use to Run & Train a model through `trl` command.

I will run one chat command below for testing.

In [None]:
# !trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat # remove the comment and run

## How to use

For more flexibility and control over the training, you can use the dedicated trainer classes to fine-tune the model in Python.

### SFTTrainer

This is a basic example of how to use the `SFTTrainer` from the library. The `SFTTrainer` is a light wrapper around the transformers Trainer to easily fine-tune language models or adapters on a `custom dataset`.

**SFTTrainer** stands for `Supervised Fine-Tuning Trainer`. It's a class provided by the TRL package that facilitates the supervised fine-tuning of transformer models. This class helps in training a transformer model on a specific dataset using supervised learning techniques.

**Main Use of SFTTrainer**

1. `Fine-Tuning Pretrained Models`: SFTTrainer is primarily used to fine-tune pre-trained transformer models (e.g., BERT, GPT) on a specific dataset. Fine-tuning is a crucial step in adapting a general-purpose pre-trained model to a specific task, such as text classification, named entity recognition, or machine translation.
2. `Supervised Learning`: It enables supervised learning by providing the necessary methods to train a model using labeled data. This involves defining a loss function, optimizing the model parameters, and evaluating the model's performance on validation data.
3. `Customization`: SFTTrainer allows for customization of the training process. Users can specify their own loss functions, optimization algorithms, and other training parameters to suit their specific needs.

**Where to Use SFTTrainer**

1. `Natural Language Processing Tasks`: Any NLP task that can benefit from fine-tuning a transformer model can use SFTTrainer. Examples include sentiment analysis, text summarization, question answering, and more.

2. `Custom NLP Pipelines`: If you're building a custom NLP pipeline and need to adapt a transformer model to your specific dataset or task, SFTTrainer can be an essential tool.

3. `Research and Development`: For researchers experimenting with new models or techniques, SFTTrainer provides a flexible and easy-to-use framework for fine-tuning transformer models.



In [7]:
# get staset
dataset = load_dataset("imdb", split="train")
dataset

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label'],
    num_rows: 25000
})

In [18]:
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    learning_rate=2e-5,
    run_name="my_experiment_run"  # Set a unique run name
)

In [8]:
# get trainer
trainer = SFTTrainer(
    "facebook/opt-350m",
#     args=training_args,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]



Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

In [9]:
# train
trainer.train()

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc




OutOfMemoryError: CUDA out of memory. Tried to allocate 1.54 GiB. GPU 0 has a total capacty of 14.75 GiB of which 1015.06 MiB is free. Process 6396 has 13.75 GiB memory in use. Of the allocated memory 13.50 GiB is allocated by PyTorch, and 66.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF