#### TRL (Transformer Language)
TRL is a type of artificial intelligence (AI) model designed to process and understand human language. It's based on the Transformer architecture, which is a specific way of building AI models to handle sequential data like text.
Here's how it works:
Text is input into the TRL model
The model uses self-attention mechanisms to analyze the text and understand the relationships between words and phrases
The model then generates output based on the input text, such as classifying the text as positive or negative sentiment

#### SFT Trainer (Soft Trainer)
The SFT Trainer is a tool used to train TRL models (and other transformer-based models) to perform specific tasks.

Here's how it works:
- The SFT Trainer takes the TRL model and the training data as input
- It adjusts the model's parameters to minimize the error between the model's output and the desired output
- The trainer uses a soft update rule to update the model's parameters, which helps the model converge faster and more smoothly




PPO Trainer (Proximal Policy Optimization Trainer):
The PPO Trainer is a type of reinforcement learning algorithm used to train AI models to make decisions in complex environments.

Here's how it works:
- The PPO Trainer takes the AI model and the environment as input
- It uses a policy gradient method to update the model's parameters, which helps the model learn to make better decisions
- The trainer uses a proximal optimization method to update the model's parameters, which helps the model converge faster and more stably


#### DPO Trainer (Differentiable Prompt Optimization Trainer)
The DPO Trainer is a tool used to optimize the prompts or inputs given to AI models, making them more effective and efficient.

Here's how it works:
- The DPO Trainer takes the AI model and the prompts as input
- It uses a differentiable optimization method to adjust the prompts and minimize the error between the model's output and the desired output
- The trainer uses a prompt optimization algorithm to generate new prompts that are more effective and efficient


#### CUDA (Compute Unified Device Architecture)
CUDA is a programming model and software development kit (SDK) developed by NVIDIA.

Here's how it works:
- CUDA allows developers to write code that runs on Graphics Processing Units (GPUs)
- The code is executed in parallel on the GPU, making it much faster than traditional Central Processing Units (CPUs)
- CUDA provides a set of tools and libraries for developers to build and optimize their code for GPU execution


In summary:
- TRL is a type of AI model for natural language processing
- SFT Trainer is a tool for training TRL models
- PPO Trainer is a reinforcement learning algorithm for training AI models to make decisions
- DPO Trainer is a tool for optimizing prompts or inputs for AI models
- CUDA is a programming model and SDK


In [3]:
!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth[colab]@ git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-q8coft40/unsloth_2d1ba73dede64cefb9790acf1c1f61a6
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-q8coft40/unsloth_2d1ba73dede64cefb9790acf1c1f61a6
  Resolved https://github.com/unslothai/unsloth.git to commit 25975f9a2dc9cdde4ed72e1efd70fb809a0405e9
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bitsandbytes (from unsloth[colab]@ git+https://github.com/unslothai/unsloth.git)
  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
Collecting xformers@

In [4]:
!pip install "git+https://github.com/huggingface/transformers.git"

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-mb3591a3
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-mb3591a3
  Resolved https://github.com/huggingface/transformers.git to commit 15c74a28294fe9082b81b24efe58df16fed79a9e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.41.0.dev0-py3-none-any.whl size=9093626 sha256=6a17d47c6d982074c2124a541e4cb192c796301f8936ca22120a7d97e6dda7b1
  Stored in directory: /tmp/pip-ephem-wheel-cache-yvma6s0s/wheels/e7/9c/5b/e1a9c8007c343041e61cc484433d512ea9274272e3fcbe7c16
Successfully bu

In [5]:
!pip install trl



In [6]:
# Import the FastLanguageModel class from the unsloth library
from unsloth import FastLanguageModel

# Import the torch library, which provides a wide range of functionalities for deep learning
import torch

# Import the SFTTrainer class from the trl library, which is used for training transformer-based models
from trl import SFTTrainer

# Import the TrainingArguments class from the transformers library, which provides a way to define training arguments
from transformers import TrainingArguments



In [7]:
# Import the load_dataset function from the datasets library, which loads a dataset from the Hugging Face dataset hub
from datasets import load_dataset

# Set the maximum sequence length for the dataset (in this case, 2048 tokens)
max_seq_length = 2048

# Load the IMDB dataset (a popular dataset for sentiment analysis) from the dataset hub, using the "train" split
dataset = load_dataset("imdb", split="train")

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [8]:
dataset

Dataset({
    features: ['text', 'label'],
    num_rows: 25000
})

In [9]:
# Create a FastLanguageModel instance from a pre-trained model
# and a corresponding tokenizer, using the from_pretrained method
model, tokenizer = FastLanguageModel.from_pretrained(

	# Specify the name of the pre-trained model to use
	model_name = "unsloth/mistral-7b-bnb-4bit",

	# Set the maximum sequence length for the model (using the variable defined earlier)
	max_seq_length = max_seq_length,

	# Load the model in 4-bit precision (reduces memory usage)
	load_in_4bit=True,

	# Do not specify a specific data type (dtype) for the model
	dtype = None
)

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.5
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/971 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

In [10]:
# Get a PEFT (Parameter-Efficient Fine-Tuning) model from the existing model
model = FastLanguageModel.get_peft_model(

	# Specify the existing model to modify
	model = model,

	# Set the rank of the low-rank approximation (r=16)
	r = 16,  # If you have a small model, you need a higher rank

	# Specify the target modules to apply PEFT to
	target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],

	# Set the LORA (Low-Rank Adaptation) hyperparameters
	lora_alpha = 16,
	lora_dropout = 0,

	# Specify the bias option (none in this case)
	bias = "none",

	# Enable gradient checkpointing to save memory
	use_gradient_checkpointing = True,

	# Set the random state for reproducibility
	random_state = 3407,

	# Set the maximum sequence length (using the variable defined earlier)
	max_seq_length = max_seq_length
)

Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [16]:
# Create an SFTTrainer instance to train the model
trainer = SFTTrainer(

	# Specify the model to train
	model = model,

	# Specify the training dataset
	train_dataset = dataset,

	# Specify the text field in the dataset
	dataset_text_field = "text",

	# Set the maximum sequence length (using the variable defined earlier)
	max_seq_length = max_seq_length,

	# Specify the tokenizer for the model
	tokenizer = tokenizer,

	# Define the training arguments
	args = TrainingArguments(

		# Set the batch size per device
		per_device_train_batch_size = 2,

		# Set the gradient accumulation steps
		gradient_accumulation_steps = 4,

		# Set the warmup steps
		warmup_steps = 10,

		# Set the maximum training steps
		max_steps = 60,

		# Enable or disable 16-bit floating-point precision (fp16) based on GPU support
		fp16 = not torch.cuda.is_bf16_supported(),

		# Enable or disable 16-bit BFloat precision (bf16) based on GPU support
		bf16 = torch.cuda.is_bf16_supported(),

		# Set the logging steps
		logging_steps = 1,

		# Set the output directory for training results
		output_dir = "unsloth-test",

		# Specify the optimizer (AdamW with 8-bit precision)
		optim = "adamw_8bit",

		# Set the random seed for reproducibility
		seed = 3407,
	),
)

# Start the training process
trainer.train()

max_steps is given, it will override any value given in num_train_epochs
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 25,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.6055
2,2.3436
3,2.3419
4,2.4773
5,2.5397
6,2.6869
7,2.3523
8,2.3724
9,2.2223
10,2.5411


TrainOutput(global_step=60, training_loss=2.4154031753540037, metrics={'train_runtime': 112.8869, 'train_samples_per_second': 4.252, 'train_steps_per_second': 0.532, 'total_flos': 9642624560529408.0, 'train_loss': 2.4154031753540037, 'epoch': 0.0192})

In [17]:
# Tokenize the input text using the tokenizer
input = tokenizer(
    [
      # The input text to tokenize
      "I really like the movie because it shows emotions and talks about humanity"
    ],
    # Return the tokenized input as PyTorch tensors
    return_tensors = "pt"
).to("cuda")  # Move the tokenized input to the CUDA device (GPU)

In [18]:
input

{'input_ids': tensor([[    1,   315,  1528,   737,   272,  5994,  1096,   378,  4370, 13855,
           304, 15066,   684, 17676]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

In [19]:
# Generate output text using the model
output = model.generate(
    # Pass the tokenized input to the model
    **input,
    # Set the maximum number of new tokens to generate
    max_new_tokens = 128,
    # Enable caching to speed up generation
    use_cache = True
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [20]:
output

tensor([[    1,   315,  1528,   737,   272,  5994,  1096,   378,  4370, 13855,
           304, 15066,   684, 17676, 28723,   661,   349,   264,  1215,  1179,
          5994,   304,   315,  6557,   378,   298,  3376, 28723,   661,   349,
           264,  1215,  1179,  5994,   304,   315,  6557,   378,   298,  3376,
         28723,   661,   349,   264,  1215,  1179,  5994,   304,   315,  6557,
           378,   298,  3376, 28723,   661,   349,   264,  1215,  1179,  5994,
           304,   315,  6557,   378,   298,  3376, 28723,   661,   349,   264,
          1215,  1179,  5994,   304,   315,  6557,   378,   298,  3376, 28723,
           661,   349,   264,  1215,  1179,  5994,   304,   315,  6557,   378,
           298,  3376, 28723,   661,   349,   264,  1215,  1179,  5994,   304,
           315,  6557,   378,   298,  3376, 28723,   661,   349,   264,  1215,
          1179,  5994,   304,   315,  6557,   378,   298,  3376, 28723,   661,
           349,   264,  1215,  1179,  5994,   304,  

In [21]:
tokenizer.batch_decode(output)

['<s> I really like the movie because it shows emotions and talks about humanity. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it to everyone. It is a very good movie and I recommend it']

In [22]:
model.save_pretrained("lora_model")