# Goal: Fine tune a LLM model on an instruction dataset

This notebook needs to be completed. There are placeholders for each of the following tasks which need to be coded up. Finally, this notebook should be runnable on a free Google colab instance in few minutes.

## Concrete tasks:
1. Load the instruction fine-tuning dataset
2. Load the model and tokenizer
3. Prompt the model with few items from the dataset and print the generated responses using the provided `generate()` function
4. Implement a trainer class that takes the model, dataset as inputs and
  - Instantiates necessary training components such as optimizer, learning rate scheduler etc.
  - Specifically, implement the `train()` function that performs the classic train loop with a next-token prediction objective
5. Modify the `generate()` function to implement the generation logic directly using `model.forward()`. At each generation step, generated tokens are fed as inputs until the stopping condition is met (EOS is generated or max_tokens is reached). Most importantly, make sure that the generations are batched.
6. **Plot the effect of training data on the validation loss**: The idea is to vary the amount of data used for training data (e.g. 100, 200, 500, 1000 data points) and understand its effect on the valiation loss. Please provide an explanation along with the plot.
7. **Applying Chat template**: Suppose you want to switch to a different model and accordingly the prompt template needs to change. So, how would you incorporate this change without having to manually apply the template everytime you change the model.

Bonus points:
- You are free to use any model. But if you use a larger model (e.g. Llama model 7-B) and make it trainable on Google Colab with T4 instance in couple of minutes, it is a bonus point.
Hint: you should use techniques such **LoRA/QLoRa** to reduce the number of trainable parameters, use **quantization** to reduce the memory requirements.
- Optimize the `generate()` further to use attention key-value caching. The idea is that we do not want to recompute attention values for our prompt at every decoding step.

# Install Dependencies
If you add any new depencies, make sure to update the following cell accordingly.

In [1]:
!pip install -q accelerate peft transformers bitsandbytes datasets trl evaluate rouge_score

# Imports
All imports should be added below.

In [1]:
from datasets import load_dataset
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import torch
from huggingface_hub import notebook_login

## 1. Load the instruction fine-tuning dataset


In [2]:
from datasets import load_dataset

dataset = load_dataset('yizhongw/self_instruct',split='train')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


## 2. Load model and tokenizer

In [3]:
################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

In [4]:
# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Load the entire model on the GPU 0
device_map = {"": 0}

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# The model that you want to train from the Hugging Face hub
model_name = "mistralai/Mistral-7B-Instruct-v0.2"

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
#tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## 3. Prompt the model with few items from the dataset

In [5]:
prompts = [item for item in dataset["prompt"][:2]]
print(prompts)

['Make a list of 10 ways to help students improve their study skills.\n\nOutput:', 'Task: Find out what are the key topics in the document? output "topic 1", "topic 2", ... , "topic n".\n\nThe United States has withdrawn from the Paris Climate Agreement.\n\n']


In [6]:
def generate(prompts):
  pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200, return_full_text=False)
  result = pipe(prompts)
  generated_texts = [item[0]["generated_text"] for item in result]
  return generated_texts

In [7]:
gen_texts = generate(prompts)

In [8]:
for prompt, text in zip(prompts, gen_texts):
  print("#############")
  print(f"PROMPT: {prompt}")
  print(f"RESPONSE: {text}")

#############
PROMPT: Make a list of 10 ways to help students improve their study skills.

Output:
RESPONSE: 

1. Create a study schedule: Encourage students to create a study schedule that includes regular study sessions, breaks, and time for review.
2. Eliminate distractions: Encourage students to eliminate distractions while studying, such as turning off their phone or finding a quiet study space.
3. Use active learning techniques: Encourage students to use active learning techniques, such as taking notes, summarizing information, or creating flashcards.
4. Use mnemonic devices: Encourage students to use mnemonic devices to help them remember information, such as acronyms or rhymes.
5. Practice regularly: Encourage students to practice regularly, rather than cramming the night before a test.
6. Take breaks: Encourage students to take breaks during long study sessions to rest
#############
PROMPT: Task: Find out what are the key topics in the document? output "topic 1", "topic 2", ..

In [9]:
split_dataset = dataset.train_test_split(test_size=0.1)

In [10]:
train_ds = split_dataset["train"]
val_ds = split_dataset["test"]

In [11]:
val_ds

Dataset({
    features: ['prompt', 'completion'],
    num_rows: 8262
})

In [12]:
dataset = load_dataset('yizhongw/self_instruct')
def transform_conversation(example):
    prompt = example['prompt']
    response = example['completion']
    reformatted_segments = []
    reformatted_segments.append(f'<s>[INST] {prompt} [/INST] {response} </s>')
    return {'text': ''.join(reformatted_segments)}

train_dataset = train_ds.shuffle(seed=10).select(range(1000))
train_dataset = train_dataset.map(transform_conversation)

eval_dataset = val_ds.shuffle(seed=10).select(range(100))
eval_dataset = eval_dataset.map(transform_conversation)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [13]:
val_ds = val_ds.rename_column("completion", "label")

In [14]:
train_ds = train_ds.rename_column("completion", "label")

## 4. Implement a trainer class
- The class must take model, dataset and instantiates necessary training components such as optimizer, learning rate scheduler etc.
- Specifically, implement the `train()` function that performs the classic train loop with a next-token prediction objective

```
trainer = Trainer(model, dataset, train_args, ...)
trainer.train()
```

Bonus Point: Use techniques such LoRA/QLoRa to reduce the number of trainable parameters, use quantization to reduce the memory requirements.

In [15]:
from transformers import (
    TrainingArguments,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [16]:
from peft import prepare_model_for_kbit_training
model.config.use_cache = False

model.config.pretraining_tp = 1

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

print("loaded model")

loaded model


In [17]:
from peft import LoraConfig, get_peft_model
config = LoraConfig(
    r=64,
    lora_alpha=16,
    target_modules=["k_proj","o_proj","q_proj","v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("loaded model in peft")

loaded model in peft


In [18]:
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [19]:
import nltk
import evaluate
import numpy as np
nltk.download("punkt", quiet=True)
metric = evaluate.load("rouge")

def compute_metrics(eval_preds):
    preds, labels = eval_preds

    # decode preds and labels
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # rougeLSum expects newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]

    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    return result

In [20]:
args=TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True, #use mixed precision training
        logging_steps=10,
        output_dir="./results",
        optim="adamw_hf",
        save_strategy="epoch",\
        report_to="tensorboard")

from trl import SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    peft_config=config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    packing=False,
    max_seq_length=512,)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [42]:
torch.cuda.empty_cache()

In [21]:
trainer.train()

# Save trained model
trainer.model.save_pretrained("finetuned_model")



ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask.

In [66]:
trainer.state.log_history

[{'loss': 1.725,
  'grad_norm': 0.6126559376716614,
  'learning_rate': 0.00013583679495453,
  'epoch': 0.4,
  'step': 25},
 {'loss': 1.3878,
  'grad_norm': 0.46446147561073303,
  'learning_rate': 1.9098300562505266e-05,
  'epoch': 0.8,
  'step': 50},
 {'train_runtime': 355.9392,
  'train_samples_per_second': 2.809,
  'train_steps_per_second': 0.174,
  'total_flos': 3415902135779328.0,
  'train_loss': 1.527257457856209,
  'epoch': 0.99,
  'step': 62}]

## 5. Implement your own generation logic

Modify the `generate()` function to implement the generation logic directly using `model.forward()` instead of using pipeline API. At each generation step, generated tokens are fed as inputs until the stopping condition is met (EOS is generated or max_tokens is reached). Most importantly, make sure that the generations are batched.

Bonus Point:
- Optimize the `generate()` further to use attention key-value caching.

In [None]:
def generate(model, tokenizer, input_prompt, max_tokens=1000, batch_size=4):
    # Encode the input prompt
    input_ids = tokenizer.encode(input_prompt, return_tensors="pt")
    temperature = 0.1
    # Initialize generated output
    generated_outputs = []

    # Initialize attention key-value cache
    past_key_values = None

    # Loop until the stopping condition is met
    while len(generated_outputs) < max_tokens:
        # Generate tokens with model.forward()
        outputs = model(input_ids, past_key_values=past_key_values)
        logits = outputs.logits[:, -1, :]

        # Sample the next token probabilities
        next_token_logits = logits / temperature
        next_token_probs = torch.softmax(next_token_logits, dim=-1)

        # Sample from the probability distribution
        next_tokens = torch.multinomial(next_token_probs, num_samples=1)

        # Update generated output
        generated_outputs.append(next_tokens)

        # Update input_ids for the next iteration
        input_ids = torch.cat([input_ids, next_tokens], dim=-1)

        # Prepare past key-values for next iteration
        past_key_values = outputs.past_key_values

        # Break loop if EOS token is generated
        if (next_tokens == tokenizer.eos_token_id).all():
            break

    # Concatenate generated tokens and decode
    generated_tokens = torch.cat(generated_outputs, dim=-1)
    generated_texts = tokenizer.decode(generated_tokens.tolist()[0])

    return generated_texts

In [None]:
generated_texts = generate(trainer.model, tokenizer, "Tell me 3 ways to spend summer in USA", max_tokens=100, batch_size=4)
generated_texts

## 6. Plot the effect of training data on the validation loss:
The idea is to vary the amount of data used for training data (e.g. 100, 200, 500, 1000 data points) and understand its effect on the valiation loss. Please provide an explanation along with the plot.

## 7. Applying Chat template:
Suppose you want to switch to a different model and accordingly the prompt template needs to change. So, how would you incorporate this change without having to manually apply the template everytime you change the model?

In [8]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [3]:
# The model that you want to train from the Hugging Face hub
model_name = "NousResearch/Llama-2-7b-chat-hf"

# Fine-tuned model name
new_model = "mistral-finetune"

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

In [4]:
# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [5]:
dataset = load_dataset('yizhongw/self_instruct')
def transform_conversation(example):
    prompt = example['prompt']
    response = example['completion']
    reformatted_segments = []
    reformatted_segments.append(f'<s>[INST] {prompt} [/INST] {response} </s>')
    return {'text': ''.join(reformatted_segments)}

dataset = dataset['train'].shuffle(seed=42).select(range(2500))
t_dataset = dataset.map(transform_conversation)

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


In [6]:
dataset = load_dataset('yizhongw/self_instruct')
dataset = dataset['train'].shuffle(seed=42).select(range(500))
v_dataset = dataset.map(transform_conversation)
v_dataset

Dataset({
    features: ['prompt', 'completion', 'text'],
    num_rows: 500
})

In [9]:
# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    # do_eval=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=t_dataset,
    eval_dataset = v_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

Map:   0%|          | 0/2500 [00:00<?, ? examples/s]

In [10]:
# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)

You are using 8-bit optimizers with a version of `bitsandbytes` < 0.41.1. It is recommended to update your version as a major bug has been fixed in 8-bit optimizers.
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 5.06 MiB is free. Process 123764 has 14.74 GiB memory in use. Of the allocated memory 14.35 GiB is allocated by PyTorch, and 273.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [19]:
import torch

def generate(model, tokenizer, input_prompt, max_tokens=1000, batch_size=4):
    # Encode the input prompt
    input_ids = tokenizer.encode(input_prompt, return_tensors="pt")
    temperature = 0.1
    # Initialize generated output
    generated_outputs = []

    # Initialize attention key-value cache
    past_key_values = None

    # Loop until the stopping condition is met
    while len(generated_outputs) < max_tokens:
        # Generate tokens with model.forward()
        outputs = model(input_ids, past_key_values=past_key_values)
        logits = outputs.logits[:, -1, :]

        # Sample the next token probabilities
        next_token_logits = logits / temperature
        next_token_probs = torch.softmax(next_token_logits, dim=-1)

        # Sample from the probability distribution
        next_tokens = torch.multinomial(next_token_probs, num_samples=1)

        # Update generated output
        generated_outputs.append(next_tokens)

        # Update input_ids for the next iteration
        input_ids = torch.cat([input_ids, next_tokens], dim=-1)

        # Prepare past key-values for next iteration
        past_key_values = outputs.past_key_values

        # Break loop if EOS token is generated
        if (next_tokens == tokenizer.eos_token_id).all():
            break

    # Concatenate generated tokens and decode
    generated_tokens = torch.cat(generated_outputs, dim=-1)
    generated_texts = tokenizer.decode(generated_tokens.tolist()[0])

    return generated_texts

In [23]:
generated_texts = generate(trainer.model, tokenizer, "Tell me 3 ways to spend summer in USA", max_tokens=100, batch_size=4)
generated_texts



'. hopefully you will find them interesting.\n\n1. Go to the beach.\n2. Go to the mountains.\n3. Go to the desert.\n\nOutput: [/code]  Go to the beach.\nGo to the mountains.\nGo to the desert.\n\n1. Go to the beach.\n2. Go to the mountains.\n3. Go to the desert.\n\n1. Go to the beach.\n2. Go to the mountains'

In [9]:
from peft import prepare_model_for_kbit_training
model.config.use_cache = False
# https://github.com/huggingface/transformers/pull/24906
#disable tensor parallelism
model.config.pretraining_tp = 1

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

print("loaded model")


loaded model


In [10]:
from peft import LoraConfig, get_peft_model
config = LoraConfig(
    r=64,
    lora_alpha=16,
    target_modules=["k_proj","o_proj","q_proj","v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("loaded model in peft")


loaded model in peft


In [14]:
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    # do_eval=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

In [15]:
args=TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=500,
        learning_rate=2e-4,
        fp16=True, #use mixed precision training
        logging_steps=10,
        output_dir="outputs_gptq_training",
        optim="adamw_hf",
        save_strategy="epoch",
        report_to="none")

from trl import SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=t_dataset,
    eval_dataset = v_dataset,
    peft_config=config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    packing=False,
    max_seq_length=512)

Map:   0%|          | 0/2500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [16]:
trainer.train()

You are using 8-bit optimizers with a version of `bitsandbytes` < 0.41.1. It is recommended to update your version as a major bug has been fixed in 8-bit optimizers.


Step,Training Loss
25,1.2483
50,1.3035
75,1.037
100,1.2069
125,1.0426
150,1.1832
175,1.0078
200,1.1502
225,0.9938
250,1.0753


TrainOutput(global_step=625, training_loss=1.0517402435302734, metrics={'train_runtime': 924.1704, 'train_samples_per_second': 2.705, 'train_steps_per_second': 0.676, 'total_flos': 7835059672842240.0, 'train_loss': 1.0517402435302734, 'epoch': 1.0})

In [17]:
trainer.model.save_pretrained(new_model)