To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + support us if you can!
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

## Kaggle is slow - you'll have to wait **5 minutes** for it to install.

I suggest you to use our free Colab notebooks instead. I linked our Mistral Colab notebook here: [notebook](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing)

In [1]:
%%capture
!pip install pip3-autoremove
!pip-autoremove torch torchvision torchaudio -y
!pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121
!pip install unsloth
!pip install evaluate
!pip install rouge-score

* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc
* And Yi, Qwen ([llamafied](https://huggingface.co/models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.

In [41]:
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
model_name = "unsloth/Llama-3.2-1B-bnb-4bit"
test_size = 0.1
formatted_dataset_path = "/kaggle/input/formatted-conversation/formatted_conversations.json"
max_steps = 100
training_batch_size = 4
warmup_steps = 10
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
samples_for_human_comparison = 1

In [28]:
from unsloth import FastLanguageModel
import torch
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.



model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)



==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: Tesla P100-PCIE-16GB. Max memory: 15.888 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 6.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [29]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

<a name="Data"></a>
### Data Prep
We now use the `ChatML` format for conversation style finetunes. We use [Open Assistant conversations](https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style) in ShareGPT style. ChatML renders multi turn conversations like below:

```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
Paris.
```

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old` and our own optimized `unsloth` template.

Normally one has to train `<|im_start|>` and `<|im_end|>`. We instead map `<|im_end|>` to be the EOS token, and leave `<|im_start|>` as is. This requires no additional training of additional tokens.

Note ShareGPT uses `{"from": "human", "value" : "Hi"}` and not `{"role": "user", "content" : "Hi"}`, so we use `mapping` to map it.

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [30]:
from unsloth.chat_templates import get_chat_template
from datasets import Dataset

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass
import json
from sklearn.model_selection import train_test_split

# Paths for input and output
train_output_path = "formatted_train_conversations.json"
test_output_path = "formatted_test_conversations.json"

# Load the formatted dataset
with open(formatted_dataset_path, "r", encoding="utf-8") as file:
    data = json.load(file)

# Split the dataset (90% train, 10% test)
train_data, test_data = train_test_split(data, test_size=test_size, random_state=42)

# Save train and test splits
with open(train_output_path, "w", encoding="utf-8") as train_file:
    json.dump(train_data, train_file, indent=4)

with open(test_output_path, "w", encoding="utf-8") as test_file:
    json.dump(test_data, test_file, indent=4)

print(f"Train dataset saved to {train_output_path}")
print(f"Test dataset saved to {test_output_path}")

# Load your formatted dataset
with open("formatted_train_conversations.json", "r", encoding="utf-8") as file:
    data = json.load(file)

# Convert to Hugging Face Dataset format
dataset = Dataset.from_list(data)
#number of samples in training data
print(f"Training samples: {len(dataset)}")

# Map the formatting function
dataset = dataset.map(formatting_prompts_func, batched=True)

Train dataset saved to formatted_train_conversations.json
Test dataset saved to formatted_test_conversations.json
Training samples: 34224


Map:   0%|          | 0/34224 [00:00<?, ? examples/s]

Let's see how the `ChatML` format works by printing the 5th element

In [31]:
dataset[5]["conversations"]

[{'content': 'You are a sales call center agent. Your task is to assist customers during outbound calls while maintaining a professional tone.',
  'role': 'system'},
 {'content': 'Hello, this is Sam Wilson calling from Marketplace. How are you doing today?',
  'role': 'assistant'},
 {'content': 'Yeah. Good.', 'role': 'user'},
 {'content': 'Yeah. Good.', 'role': 'user'},
 {'content': 'Yeah. Reason for the call is to let you know you may qualify for some new additional benefits. So do you have Medicare or..',
  'role': 'assistant'}]

In [32]:
print(dataset[5]["text"])

<|im_start|>system
You are a sales call center agent. Your task is to assist customers during outbound calls while maintaining a professional tone.<|im_end|>
<|im_start|>assistant
Hello, this is Sam Wilson calling from Marketplace. How are you doing today?<|im_end|>
<|im_start|>user
Yeah. Good.<|im_end|>
<|im_start|>user
Yeah. Good.<|im_end|>
<|im_start|>assistant
Yeah. Reason for the call is to let you know you may qualify for some new additional benefits. So do you have Medicare or..<|im_end|>



In [33]:
unsloth_template = \
    "{{ bos_token }}"\
    "{{ 'You are a call center sales assistant to the user, you have to inquire about things correctly, and address user queries\n' }}"\
    "{% endif %}"\
    "{% for message in messages %}"\
        "{% if message['role'] == 'user' %}"\
            "{{ '>>> User: ' + message['content'] + '\n' }}"\
        "{% elif message['role'] == 'assistant' %}"\
            "{{ '>>> Assistant: ' + message['content'] + eos_token + '\n' }}"\
        "{% endif %}"\
    "{% endfor %}"\
    "{% if add_generation_prompt %}"\
        "{{ '>>> Assistant: ' }}"\
    "{% endif %}"
unsloth_eos_token = "eos_token"

if False:
    tokenizer = get_chat_template(
        tokenizer,
        chat_template = (unsloth_template, unsloth_eos_token,), # You must provide a template and EOS token
        mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
        map_eos_token = True, # Maps <|im_end|> to </s> instead
    )

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [34]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",  # The field containing preprocessed texts
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Packing optimizes short sequences, set False if not needed
    args=TrainingArguments(
        per_device_train_batch_size=training_batch_size,  # Adjust based on GPU memory
        gradient_accumulation_steps=8,  # For effective batch size
        warmup_steps=warmup_steps,                # Warm-up steps for LR scheduler
        max_steps=max_steps,                 # Adjust based on dataset size
        learning_rate=1e-4,             # Adjust learning rate
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",             # Optimizer with 8-bit precision
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to=[]
    ),
)

Map (num_proc=2):   0%|          | 0/34224 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [35]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla P100-PCIE-16GB. Max memory = 15.888 GB.
12.303 GB of memory reserved.


In [36]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 34,224 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 8
\        /    Total batch size = 32 | Total steps = 100
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
1,2.6565
2,2.4919
3,2.5122
4,2.4931
5,2.5598
6,2.526
7,2.3746
8,2.2813
9,2.0951
10,2.0093


In [37]:
import evaluate
import csv
import json
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
import time

train_test_string = f"{100 - (test_size*100)}-{(test_size*100)}"
if '/' in model_name:
    formatted_model_name = model_name.split('/', 1)[1]  # Split at the first forward slash
else:
    formatted_model_name = None  # No forward slash present
model.save_pretrained(f"{formatted_model_name}-{train_test_string}-finetuned") # Local saving
tokenizer.save_pretrained(f"{formatted_model_name}-{train_test_string}-finetuned")

('Llama-3.2-1B-bnb-4bit-90.0-10.0-finetuned/tokenizer_config.json',
 'Llama-3.2-1B-bnb-4bit-90.0-10.0-finetuned/special_tokens_map.json',
 'Llama-3.2-1B-bnb-4bit-90.0-10.0-finetuned/tokenizer.json')

In [54]:
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = f"{formatted_model_name}-{train_test_string}-finetuned", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)
base_model, base_tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    )

FastLanguageModel.for_inference(base_model) # Enable native 2x faster inference

==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: Tesla P100-PCIE-16GB. Max memory: 15.888 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 6.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: Tesla P100-PCIE-16GB. Max memory: 15.888 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 6.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 2048, padding_idx=128004)
    (layers): ModuleList(
      (0-15): 16 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaExtendedRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=8192, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=8192, bias=False)
          (down_proj): Linear4bit(in_features=8192, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaR

In [55]:

# Load BLEU and ROUGE metrics
bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")

# Paths to your test dataset and fine-tuned model
test_dataset_path = "formatted_test_conversations.json"

output_csv_name = f"{formatted_model_name}_{100 - (test_size*100)}-{(test_size*100)}.csv"
output_csv_path = output_csv_name
print(output_csv_path)
# Function to load and preprocess the test dataset
def load_test_dataset(file_path):
    with open(file_path, "r", encoding="utf-8") as file:
        data = json.load(file)
    return data

# Function to generate predictions for a model
# Initialize separate inference time trackers
base_model_inference_times = []
fine_tuned_model_inference_times = []

def generate_predictions(model, tokenizer, test_samples, max_new_tokens=128, track_times=None, enable_base=True):
    predictions = []
    references = []

    for sample in test_samples:
        # Build the chat messages
        conversation = sample["conversations"]
        # Exclude the last assistant response for the prompt
        messages = conversation[:-1]

        # Prepare inputs using the tokenizer with chat template
        inputs = tokenizer.apply_chat_template(
            messages,
            tokenize=True,
            add_generation_prompt=True,
            return_tensors="pt",
        ).to("cuda")

        # Measure inference time
        inference_start_time = time.time()
        outputs = model.generate(input_ids=inputs, max_new_tokens=max_new_tokens, use_cache=True)
        inference_end_time = time.time()
        
        # Track inference time for this sample
        if track_times is not None:
            track_times.append(inference_end_time - inference_start_time)
        
        # Decode and clean the predictions
        # Decode and clean the predictions
        prediction_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0].strip()

        # Extract only the assistant's final response
# Extract everything after the last occurrence of "assistant"
        if "assistant" in prediction_text:
            last_assistant_index = prediction_text.rfind("assistant")
            prediction_text = prediction_text[last_assistant_index + len("assistant"):].strip()

        # Debug: Print final prediction for each sample
        print("Final Prediction:", prediction_text)

        predictions.append(prediction_text)
        references.append(conversation[-1]["content"].strip())

    return predictions, references




def evaluate_metrics(predictions, references):
    if not predictions or not references or not all(predictions) or not all(references):
        return 0.0, 0.0  # Assign default scores when predictions or references are empty

    bleu_score = bleu.compute(predictions=predictions, references=[[r] for r in references])
    rouge_score = rouge.compute(predictions=predictions, references=references)

    return bleu_score["bleu"], rouge_score["rougeLsum"]


# Function to evaluate and save comparison for 40-50 samples
def evaluate_sample_comparison(base_model, fine_tuned_model, tokenizer, test_samples, output_csv_path):
    # Generate predictions for the base model and track inference times
    base_predictions, references = generate_predictions(
        base_model, tokenizer, test_samples, track_times=base_model_inference_times
    )

    # Generate predictions for the fine-tuned model and track inference times
    fine_tuned_predictions, _ = generate_predictions(
        fine_tuned_model, tokenizer, test_samples, track_times=fine_tuned_model_inference_times, enable_base=False
    )

    # Calculate average inference times
    avg_base_inference_time = sum(base_model_inference_times) / len(base_model_inference_times)
    avg_fine_tuned_inference_time = sum(fine_tuned_model_inference_times) / len(fine_tuned_model_inference_times)

    # Save comparison to CSV
    with open(output_csv_path, "w", encoding="utf-8", newline="") as file:
        writer = csv.writer(file)
        writer.writerow(["Sample_ID", "Ground_Truth", "Base_Model_Prediction", "Fine_Tuned_Model_Prediction", 
                         "BLEU_Base", "BLEU_Fine_Tuned", "ROUGE_Base", "ROUGE_Fine_Tuned"])

        for i, (ref, base_pred, fine_tuned_pred) in enumerate(zip(references, base_predictions, fine_tuned_predictions)):
            base_bleu, base_rouge = evaluate_metrics([base_pred], [ref])
            fine_tuned_bleu, fine_tuned_rouge = evaluate_metrics([fine_tuned_pred], [ref])

            writer.writerow([i, ref, base_pred, fine_tuned_pred, base_bleu, fine_tuned_bleu, base_rouge, fine_tuned_rouge])

        # Append average inference times
        writer.writerow([])
        writer.writerow(["Average Inference Time (seconds)", f"Base Model: {avg_base_inference_time:.4f}", 
                         f"Fine-Tuned Model: {avg_fine_tuned_inference_time:.4f}"])
    
    print(f"Comparison for {samples_for_human_comparison} samples saved to {output_csv_path}")
    print(f"Average inference time for Base Model: {avg_base_inference_time:.4f} seconds")
    print(f"Average inference time for Fine-Tuned Model: {avg_fine_tuned_inference_time:.4f} seconds")

# Function to evaluate the whole test set
def evaluate_full_test_set(model, tokenizer, test_samples):
    predictions, references = generate_predictions(model, tokenizer, test_samples)
    bleu_score, rouge_score = evaluate_metrics(predictions, references)
    print(f"BLEU Score: {bleu_score}")
    print(f"ROUGE-L F1 Score: {rouge_score}")
    return bleu_score, rouge_score

# Load the test dataset
test_data = load_test_dataset(test_dataset_path)
print(f"Test dataset loaded with {len(test_data)} samples.")
# Split into 40-50 samples and full test set
sample_test_data = test_data[:samples_for_human_comparison]
print(sample_test_data)
full_test_data = test_data

# Enable inference mode for both models
FastLanguageModel.for_inference(base_model)  # Base model
FastLanguageModel.for_inference(model)  # Fine-tuned model
# Save 40-50 sample comparison
start_time = time.time()
evaluate_sample_comparison(base_model, model, tokenizer, sample_test_data, output_csv_path)
end_time = time.time()
per_sample_time = (end_time - start_time)/samples_for_human_comparison
print("Csv saved for comparison")
print(f"It took {end_time - start_time} seconds for evaluating {samples_for_human_comparison} samples on both base and finetuned models combined")

Llama-3.2-1B-bnb-4bit_90.0-10.0.csv
Test dataset loaded with 3803 samples.
[{'conversations': [{'role': 'system', 'content': 'You are a sales call center agent. Your task is to assist customers during outbound calls while maintaining a professional tone.'}, {'role': 'assistant', 'content': 'Hello.'}, {'role': 'user', 'content': 'Hello.'}, {'role': 'assistant', 'content': 'Hey, there. My name is Peter. How are you doing today?'}, {'role': 'user', 'content': 'All right.'}, {'role': 'assistant', 'content': 'That is wonderful. Well, I can see here you are living in Florida. Is that correct?'}, {'role': 'user', 'content': 'Yes.'}, {'role': 'assistant', 'content': 'I got you. And you have Medicare part A and B right now. Is that correct?'}, {'role': 'user', 'content': 'No, I do not.'}, {'role': 'assistant', 'content': 'All righty. How old are you right now?'}, {'role': 'user', 'content': '48.'}, {'role': 'assistant', 'content': 'I got you. And you have marketplace insurance, like right now h

In [None]:
print(f"Evaluating base model on full test set of {train_test_string} and {len(test_data)} samples for rouge and bleu scores")
print(f"Estimated time: {per_sample_time * len(test_data)} seconds")
base_model_bleu, base_model_rouge = evaluate_full_test_set(base_model, base_tokenizer, full_test_data)
print("Evaluated base model")

print(f"Evaluating finetuned model on full test set of {train_test_string} and {len(test_data)} samples for rouge and bleu scores")
print(f"Estimated time: {per_sample_time * len(test_data)} seconds")

fine_tuned_model_bleu, fine_tuned_model_rouge = evaluate_full_test_set(model, tokenizer, full_test_data)
print("Evaluated fine tuned model")


print("Evaluation complete.")

In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>