**README**

This document is prepared by the **Kaggle**.


# Install Packages

In [None]:
!pip install -q trl

In [None]:
# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()

# instantiate a distribution strategy
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.TPUStrategy(tpu)

# instantiating the model in the strategy scope creates the model on the TPU
with tpu_strategy.scope():
    model = tf.keras.Sequential( … ) # define your model normally
    model.compile( … )

# train model normally
model.fit(training_dataset, epochs=EPOCHS, steps_per_epoch=…)

In [None]:
import os
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient

In [None]:
user_secrets = UserSecretsClient()
os.environ["HT_TOKEN"] = user_secrets.get_secret("HUGGINGFACEHUB_API_TOKEN")

In [None]:
login(user_secrets.get_secret("HUGGINGFACEHUB_API_TOKEN"))


# [TRL](https://pypi.org/project/trl/#description)

TRL - Transformer Reinforcement Learning

**Full stack library to fine-tune and align large language models.**

**What is it?**

The trl library is a full stack tool to fine-tune and align transformer language and diffusion models using methods such as `Supervised Fine-tuning` step (SFT), `Reward Modeling` (RM) and the `Proximal Policy Optimization` (PPO) as well as `Direct Preference Optimization` (DPO).

The library is built on top of the `transformers` library and thus allows to use any model architecture available there.


## Highlights

* **Efficient and scalable**

  * `accelerate` is the backbone of trl which allows to scale model training from a single GPU to a large scale multi-node cluster with methods such as `DDP` and `DeepSpeed`.
  * `PEFT` is fully integrated and allows to train even the largest models on modest hardware with quantisation and methods such as **LoRA** or **QLoRA**.
  * `unsloth` is also integrated and allows to significantly speed up training with dedicated kernels.

* `CLI`: With the CLI you can fine-tune and chat with LLMs without writing any code using a single command and a flexible config system.

* `Trainers`: The Trainer classes are an abstraction to apply many fine-tuning methods with ease such as the `SFTTrainer`, `DPOTrainer`, `RewardTrainer`, `PPOTrainer`, `CPOTrainer`, and `ORPOTrainer`.

* `AutoModels`: The `AutoModelForCausalLMWithValueHead` & `AutoModelForSeq2SeqLMWithValueHead` classes add an additional value head to the model which allows to train them with **RL** algorithms such as *PPO*.

* `Examples`: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier, full RLHF using adapters only, train GPT-j to be less toxic, StackLlama example, etc. following the examples.


## Command Line Interface (CLI)

You can use TRL Command Line Interface (CLI) to quickly get started with Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO) and test your aligned model with the chat CLI:

1. **SFT - Supervised Fine Tuning**
```
!trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb
```
2. **DPO - Direct Preference Optimization**
```
!trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/hh-rlhf-helpful-base-trl-style --output_dir opt-sft-hh-rlhf
```
3. **Chat**
```
!trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat
```

The above three commands we can use to Run & Train a model through `trl` command.

I will run one chat command below for testing.

In [None]:
# !trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat # remove the comment and run

## How to use

For more flexibility and control over the training, you can use the dedicated trainer classes to fine-tune the model in Python.

### SFTTrainer

This is a basic example of how to use the `SFTTrainer` from the library. The `SFTTrainer` is a light wrapper around the transformers Trainer to easily fine-tune language models or adapters on a `custom dataset`.

**SFTTrainer** stands for `Supervised Fine-Tuning Trainer`. It's a class provided by the TRL package that facilitates the supervised fine-tuning of transformer models. This class helps in training a transformer model on a specific dataset using supervised learning techniques.

**Main Use of SFTTrainer**

1. `Fine-Tuning Pretrained Models`: SFTTrainer is primarily used to fine-tune pre-trained transformer models (e.g., BERT, GPT) on a specific dataset. Fine-tuning is a crucial step in adapting a general-purpose pre-trained model to a specific task, such as text classification, named entity recognition, or machine translation.
2. `Supervised Learning`: It enables supervised learning by providing the necessary methods to train a model using labeled data. This involves defining a loss function, optimizing the model parameters, and evaluating the model's performance on validation data.
3. `Customization`: SFTTrainer allows for customization of the training process. Users can specify their own loss functions, optimization algorithms, and other training parameters to suit their specific needs.

**Where to Use SFTTrainer**

1. `Natural Language Processing Tasks`: Any NLP task that can benefit from fine-tuning a transformer model can use SFTTrainer. Examples include sentiment analysis, text summarization, question answering, and more.

2. `Custom NLP Pipelines`: If you're building a custom NLP pipeline and need to adapt a transformer model to your specific dataset or task, SFTTrainer can be an essential tool.

3. `Research and Development`: For researchers experimenting with new models or techniques, SFTTrainer provides a flexible and easy-to-use framework for fine-tuning transformer models.



 Class definition of the Supervised Finetuning Trainer (SFT Trainer).
This class is a wrapper around the `transformers.Trainer` class and inherits all of its attributes and methods.The trainer takes care of properly initializing the `PeftModel` in case a user passes a `PeftConfig` object.

* **`model` (Union[`transformers.PreTrainedModel`, `nn.Module`, `str`])**:
 The model to train, can be a `PreTrainedModel`, a `torch.nn.Module` or a string with the model name to load from cache or download. The model can be also converted to a `PeftModel` if a `PeftConfig` object is passed to the `peft_config` argument.

* **`args` (Optional[`SFTConfig`]):**
  The arguments to tweak for training. Will default to a basic instance of [`SFTConfig`] with the      `output_dir` set to a directory named *tmp_trainer* in the current directory if not provided.
  
* **`data_collator` (Optional[`transformers.DataCollator`]):** The data collator to use for training.
* **`train_dataset` (Optional[`datasets.Dataset`]):**The dataset to use for training. We recommend users to use `trl.trainer.ConstantLengthDataset` to create their dataset.
* **`eval_dataset` (Optional[Union[`datasets.Dataset`, Dict[`str`, `datasets.Dataset`]]]):** The dataset to use for evaluation. We recommend users to use `trl.trainer.ConstantLengthDataset` to create their dataset.
* **`tokenizer` (Optional[`transformers.PreTrainedTokenizer`]):** The tokenizer to use for training. If not specified, the tokenizer associated to the model will be used.
* **`model_init` (`Callable[[], transformers.PreTrainedModel]`):** The model initializer to use for training. If None is specified, the default model initializer will be used.
* **`compute_metrics` (`Callable[[transformers.EvalPrediction], Dict]`, *optional* defaults to None):** The function used to compute metrics during evaluation. It should return a dictionary mapping metric names to metric values. If not specified, only the loss will be computed during evaluation.
* **`callbacks` (`List[transformers.TrainerCallback]`):** The callbacks to use for training.
* **`optimizers` (`Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR]`):** The optimizer and scheduler to use for training.
* **`preprocess_logits_for_metrics` (`Callable[[torch.Tensor, torch.Tensor], torch.Tensor]`):** The function to use to preprocess the logits before computing the metrics.
* **`peft_config` (`Optional[PeftConfig]`):** The PeftConfig object to use to initialize the PeftModel.
* **`formatting_func` (`Optional[Callable]`):** The formatting function to be used for creating the `ConstantLengthDataset`.



In [None]:
# get staset
dataset = load_dataset("imdb", split="train")
dataset