### Installation

In [None]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9]{1,}\.[0-9]{1,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.33.post1" if v=="2.9" else "0.0.32.post2" if v=="2.8" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

### Unsloth

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    # model_name = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit", # or choose "unsloth/Llama-3.2-1B-Instruct"
    model_name =   "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2026.1.3: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2026.1.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `Llama-3.1` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we convert it to HuggingFace's normal multiturn format `("role", "content")` instead of `("from", "value")`/ Llama-3 renders multi turn conversations like below:

```
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hey there! How are you?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'm great thanks!<|eot_id|>
```

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3` and more.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }

from datasets import load_dataset
dataset = load_dataset("csv", data_files="./dataset.csv")

In [None]:
dataset['train'][1]

{'conversations': '[{"from": "human", "value": "from actions import walk(obj), pick_and_place(obj, pos), open(obj), find(obj) \\n objects=[red_apple(fruit), banana(fruit), carrot(vegetable), broccoli(vegetable), orange(fruit), vegetable_basket, fruit_bowl] \\n Create python functions that do the following: move all fruits to the fruit bowl and all vegetables to the vegetable basket."}, {"from": "gpt", "value": "def categorize_food_items():\\n    # Define fruits and vegetables\\n    fruits = [\\"red_apple\\", \\"banana\\", \\"orange\\"]\\n    vegetables = [\\"carrot\\", \\"broccoli\\"]\\n    \\n    # Move fruits to fruit bowl\\n    for fruit in fruits:\\n        if find(fruit) != \\"fruit_bowl\\":\\n            pick_and_place(fruit, \\"fruit_bowl\\")\\n        else:\\n            print(f\\"{fruit} already in fruit_bowl, cannot move\\")\\n    \\n    # Move vegetables to vegetable basket\\n    for vegetable in vegetables:\\n        if find(vegetable) != \\"vegetable_basket\\":\\n         

We now use `standardize_sharegpt` to convert ShareGPT style datasets into HuggingFace's generic format. This changes the dataset from looking like:
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```

In [None]:
import json
import ast
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt, get_chat_template

# 1. Load the CSV
# Note: this creates a DatasetDict
dataset = load_dataset("csv", data_files="dataset.csv")["train"]

# 2. Convert the "conversations" column from String to Python List
# Because CSVs save lists as text like "[{'from': 'human'}]"
def parse_json_column(example):
    # We use json.loads or ast.literal_eval to turn the string into a list
    if isinstance(example["conversations"], str):
        try:
            # Try json first, fallback to literal_eval if quotes are single (')
            example["conversations"] = json.loads(example["conversations"])
        except:
            example["conversations"] = ast.literal_eval(example["conversations"])
    return example

dataset = dataset.map(parse_json_column)

# 3. Now standardize
dataset = standardize_sharegpt(dataset)

# 4. Apply your Llama-3.1 template
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False) for convo in convos]
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)

Map:   0%|          | 0/62 [00:00<?, ? examples/s]

In [None]:
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)

Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/62 [00:00<?, ? examples/s]

Map:   0%|          | 0/62 [00:00<?, ? examples/s]

We look at how the conversations are structured for item 5:

In [None]:
dataset[5]["conversations"]

[{'content': 'from actions import walk(obj), pick_and_place(obj, pos), open(obj), find(obj) \n objects=[cotton_shirt(clothing), wool_sweater(clothing), leather_shoes(footwear), silk_scarf(clothing), rubber_boots(footwear), wardrobe, shoe_rack] \n Create python functions that do the following: move all clothing items to the wardrobe and all footwear to the shoe rack.',
  'role': 'user'},
 {'content': 'def organize_clothing_and_footwear():\n    # Define categories\n    clothing_items = ["cotton_shirt", "wool_sweater", "silk_scarf"]\n    footwear_items = ["leather_shoes", "rubber_boots"]\n    \n    # Move clothing to wardrobe\n    for clothing in clothing_items:\n        if find(clothing) != "wardrobe":\n            pick_and_place(clothing, "wardrobe")\n        else:\n            print(f"{clothing} already in wardrobe, cannot move")\n    \n    # Move footwear to shoe rack\n    for footwear in footwear_items:\n        if find(footwear) != "shoe_rack":\n            pick_and_place(footwear, 

And we see how the chat template transformed these conversations.

**[Notice]** Llama 3.1 Instruct's default chat template default adds `"Cutting Knowledge Date: December 2023\nToday Date: 26 July 2024"`, so do not be alarmed!

In [None]:
dataset[5]["text"]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nfrom actions import walk(obj), pick_and_place(obj, pos), open(obj), find(obj) \n objects=[cotton_shirt(clothing), wool_sweater(clothing), leather_shoes(footwear), silk_scarf(clothing), rubber_boots(footwear), wardrobe, shoe_rack] \n Create python functions that do the following: move all clothing items to the wardrobe and all footwear to the shoe rack.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\ndef organize_clothing_and_footwear():\n    # Define categories\n    clothing_items = ["cotton_shirt", "wool_sweater", "silk_scarf"]\n    footwear_items = ["leather_shoes", "rubber_boots"]\n    \n    # Move clothing to wardrobe\n    for clothing in clothing_items:\n        if find(clothing) != "wardrobe":\n            pick_and_place(clothing, "wardrobe")\n        else:\n            print(f"{clothing}

<a name="Train"></a>
### Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    packing = False, # Can make training 5x faster for short sequences.
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 30,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.001,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use TrackIO/WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/62 [00:00<?, ? examples/s]

We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs.

In [None]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Map (num_proc=6):   0%|          | 0/62 [00:00<?, ? examples/s]

We verify masking is actually done:

In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nfrom actions import walk(obj), pick_and_place(obj, pos), open(obj), find(obj) \n objects=[cotton_shirt(clothing), wool_sweater(clothing), leather_shoes(footwear), silk_scarf(clothing), rubber_boots(footwear), wardrobe, shoe_rack] \n Create python functions that do the following: move all clothing items to the wardrobe and all footwear to the shoe rack.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\ndef organize_clothing_and_footwear():\n    # Define categories\n    clothing_items = ["cotton_shirt", "wool_sweater", "silk_scarf"]\n    footwear_items = ["leather_shoes", "rubber_boots"]\n    \n    # Move clothing to wardrobe\n    for clothing in clothing_items:\n        if find(clothing)!= "wardrobe":\n            pick_and_place(clothing, "wardrobe")\n        else:\n            pr

In [None]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

'                                                                                                                              def organize_clothing_and_footwear():\n    # Define categories\n    clothing_items = ["cotton_shirt", "wool_sweater", "silk_scarf"]\n    footwear_items = ["leather_shoes", "rubber_boots"]\n    \n    # Move clothing to wardrobe\n    for clothing in clothing_items:\n        if find(clothing)!= "wardrobe":\n            pick_and_place(clothing, "wardrobe")\n        else:\n            print(f"{clothing} already in wardrobe, cannot move")\n    \n    # Move footwear to shoe rack\n    for footwear in footwear_items:\n        if find(footwear)!= "shoe_rack":\n            pick_and_place(footwear, "shoe_rack")\n        else:\n            print(f"{footwear} already in shoe_rack, cannot move")\n\norganize_clothing_and_footwear()<|eot_id|>'

We can see the System and Instruction prompts are successfully masked!

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
8.008 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 62 | Num Epochs = 4 | Total steps = 30
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,0.7068
2,0.6683
3,0.7314
4,0.5861
5,0.4885
6,0.3901
7,0.3559
8,0.3747
9,0.2843
10,0.2438


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

275.4327 seconds used for training.
4.59 minutes used for training.
Peak reserved memory = 8.008 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 54.325 %.
Peak reserved memory for training % of max memory = 0.0 %.


In [None]:
test_set = [
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[wine_glass, water_glass, ceramic_plate, metal_fork, soup_spoon, cloth_napkin, candle_holder, matches, framed_wedding_photo, tv_remote, magazine, newspaper, dining_table, end_table, kitchen_counter, drawer, shelf, windowsill, armchair] \n Create a sequence of functions to perform these tasks: \n 1. Move all items used for eating solid food from the dining table to the kitchen counter, but leave items used only for liquids. \n 2. Find the object that creates ambiance through light and move it to the end table. Then place the object used to ignite it right next to the light source. \n 3. Gather all paper-based reading materials from their current locations and stack them on the shelf. Place the one published most frequently on top. \n 4. Locate the sentimental photographic item and position it on the windowsill. \n 5. Take the fabric item from the dining table and drape it over the armchair. \n 6. Finally, move the device used to control entertainment from wherever it is now to the end table, but ensure it's on the opposite side from the light source you placed there earlier."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[coffee_mug, tea_cup, juice_glass, breakfast_plate, butter_knife, salt_shaker, pepper_shaker, sugar_bowl, potted_succulent, wall_calendar, hand_towel, sponge, breakfast_nook_table, kitchen_island, countertop, sink_edge, hook, planter_stand, waste_basket] \n Create a sequence of functions to perform these tasks: \n 1. Remove all vessels designed to hold beverages from the breakfast nook table and relocate them to the kitchen island. Keep track of which one is specifically for hot drinks with a handle. \n 2. Group the paired seasoning dispensers together on the countertop. \n 3. Find the living plant and give it a more prominent display by moving it to the planter stand. \n 4. The sweet granular substance container should be placed next to the hot beverage vessel you identified in step 1 on the kitchen island. \n 5. Take the cleaning tool used when wet and place it at the sink edge. \n 6. Hang the absorbent fabric item on the hook. \n 7. Return exactly one item to the breakfast nook table: the flat item used to hold food during the morning meal."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[laptop_computer, wireless_mouse, phone_charger, smartphone, sticky_notes, pen_cup, desk_lamp, small_cactus, desk, nightstand, dresser_top, floor, chair_seat, wall_outlet, trash_can, bookend, manila_folder] \n Create a sequence of functions to perform these tasks: \n 1. Clear the desk of all technology items that require electricity and move them to the dresser top. Remember which one is portable communication device. \n 2. Consolidate all writing and organizational supplies into a group on the chair seat. \n 3. The item that provides focused illumination should be moved to the nightstand. \n 4. Connect the charging cable to the wall outlet, then bring the portable device you identified in step 1 to rest next to its charger. \n 5. Place the drought-resistant plant on the desk next to the architectural support item. \n 6. The document holder should go in the trash can if it's empty, otherwise place it on the floor. \n 7. Finally, return only the input device that works with the laptop back to the desk, positioning it where a right-handed person would naturally use it."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[cutting_board, chef_knife, paring_knife, wooden_spoon, metal_spatula, olive_oil_bottle, vinegar_bottle, fresh_basil, garlic_bulb, recipe_card, kitchen_scale, stove_top, prep_counter, utensil_drawer, spice_rack, herb_pot, cutting_board_slot, cookbook_stand] \n Create a sequence of functions to perform these tasks: \n 1. Put away all bladed implements into the utensil drawer for safety. Note which one is larger. \n 2. Group the two liquid ingredients together on the spice rack. \n 3. Place the living herb into its designated pot holder. \n 4. Position the hard surface used for chopping into its vertical storage slot. \n 5. Move all items you would actually cook with (not cut with) to the stove top area. \n 6. The written cooking instructions should be displayed on the cookbook stand. \n 7. Finally, bring the aromatic bulb vegetable and place it on the prep counter next to the measuring device."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[shampoo_bottle, conditioner_bottle, bar_soap, toothbrush, toothpaste, dental_floss, hand_lotion, cotton_swabs, bath_towel, hand_mirror, bathroom_counter, shower_caddy, towel_rack, medicine_cabinet, trash_bin, soap_dish, drawer_organizer] \n Create a sequence of functions to perform these tasks: \n 1. Move all hair care products to the shower caddy. Track which one you use first in a typical routine. \n 2. Gather all dental hygiene items and organize them in the drawer organizer. \n 3. The solid cleansing item should rest in its designated dish. \n 4. Place the moisturizing product on the bathroom counter next to the reflective grooming tool. \n 5. The cotton cleaning items should go in the medicine cabinet. \n 6. Hang the large absorbent fabric on the towel rack. \n 7. Finally, of the hair care products you moved in step 1, bring the one used first back to the bathroom counter temporarily."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[tennis_ball, soccer_ball, jump_rope, yoga_mat, resistance_band, water_bottle, gym_towel, bluetooth_speaker, protein_bar, workout_gloves, gym_bag, equipment_rack, storage_bin, bench, floor_mat_area, wall_hook, side_table] \n Create a sequence of functions to perform these tasks: \n 1. Collect all spherical sports equipment and place them in the storage bin. Remember which one is larger. \n 2. Roll up the floor exercise mat and stand it upright against the equipment rack. \n 3. Hang the cardiovascular training rope and the elastic resistance item on the wall hook together. \n 4. Place items worn during exercise on the bench. \n 5. The audio device and hydration container should both go on the side table, with the device positioned to the left. \n 6. Put the nutrition item in the gym bag. \n 7. After completing the workout area organization, take the larger ball from step 1 back out and place it on the floor mat area."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[acrylic_paint_red, acrylic_paint_blue, paintbrush_large, paintbrush_small, palette, water_cup, paint_rag, canvas, easel, art_table, supply_drawer, drying_rack, floor_cloth, chair, windowsill, palette_knife] \n Create a sequence of functions to perform these tasks: \n 1. Mount the blank surface onto its standing support structure. \n 2. Arrange all color application tools on the chair in order of size, largest first. Remember this order. \n 3. Place both pigment containers on the art table, with the cooler color on the left. \n 4. The mixing surface and its associated scraping tool should be paired together on the windowsill. \n 5. Position the liquid container on the supply drawer top. \n 6. Spread the protective floor covering beneath the standing structure from step 1. \n 7. Take the smallest tool from step 2 and place it on the mixing surface you moved in step 4. The cleaning cloth should be draped over the drying rack."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[baby_bottle, pacifier, cloth_diaper, baby_wipes, diaper_cream, baby_rattle, soft_stuffed_bear, board_book, changing_pad, crib, changing_table, rocking_chair, toy_basket, supply_caddy, diaper_pail, nightlight, dresser] \n Create a sequence of functions to perform these tasks: \n 1. Place all diapering supplies into the caddy organizer, but leave the waterproof changing surface where it is. \n 2. The feeding container should be positioned on the dresser top. \n 3. Gather the two comfort items a baby would put in their mouth and place them in the crib, with the smaller one positioned in the far left corner. \n 4. All toys meant for active play should go in the toy basket. \n 5. The educational item with pages goes on the rocking chair seat. \n 6. The ambient lighting device should be placed on the changing table. \n 7. Finally, take the largest soft comfort item and place it in the crib on the opposite side from where you put the smaller mouth item in step 3."
  },
  {
    "role": "user",
    "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[screwdriver_phillips, screwdriver_flathead, adjustable_wrench, hammer, measuring_tape, level_tool, safety_goggles, work_gloves, wood_plank, sandpaper, paint_can, paint_tray, workbench, tool_chest, sawhorse, pegboard, floor, shelf_unit] \n Create a sequence of functions to perform these tasks: \n 1. Hang all handheld turning and tightening tools on the pegboard, arranged by head type. \n 2. Personal protective equipment should be placed together on the shelf unit. \n 3. Position the raw building material across the sawhorse to prepare it for work. \n 4. The impacting tool and precision measuring devices should be stored together in the tool chest. \n 5. Set up the painting supplies on the workbench with the container on the left and its paired shallow holder on the right. \n 6. The abrasive finishing material should be placed on top of the wood material from step 3. \n 7. Take the tool used to ensure straightness from the tool chest and place it on the workbench next to the paint setup, then move one of the protective items from step 2 to rest on top of the level tool."
  }
]

<a name="Inference"></a>
### Inference


In [None]:
all_responses = []

# Example components
example_user_prompt =   """from household_robot import find(item), grab(item), deposit(item, target) \n objects=[wine_glass, water_glass, ceramic_plate, metal_fork, soup_spoon, cloth_napkin, candle_holder, matches, framed_wedding_photo, tv_remote, magazine, newspaper, dining_table, end_table, kitchen_counter, drawer, shelf, windowsill, armchair] \n Create a sequence of functions to perform these tasks: \n 1. Move all items used for eating solid food from the dining table to the kitchen counter, but leave items used only for liquids. \n 2. Find the object that creates ambiance through light and move it to the end table. Then place the object used to ignite it right next to the light source. \n 3. Gather all paper-based reading materials from their current locations and stack them on the shelf. Place the one published most frequently on top. \n 4. Locate the sentimental photographic item and position it on the windowsill. \n 5. Take the fabric item from the dining table and drape it over the armchair. \n 6. Finally, move the device used to control entertainment from wherever it is now to the end table, but ensure it's on the opposite side from the light source you placed there earlier."""

example_assistant_response = """# Step 1: Move all items used for eating solid food from the dining table to the kitchen counter\n# Items for eating solids: ceramic_plate, metal_fork, soup_spoon\ngrab('ceramic_plate')\ndeposit('ceramic_plate', 'kitchen_counter')\ngrab('metal_fork')\ndeposit('metal_fork', 'kitchen_counter')\ngrab('soup_spoon')\ndeposit('soup_spoon', 'kitchen_counter')\n# Note: wine_glass and water_glass left on dining_table (liquids only)\n\n# Step 2: Move candle_holder to end_table, then matches next to it\ngrab('candle_holder')\ndeposit('candle_holder', 'end_table')\ngrab('matches')\ndeposit('matches', 'end_table')\n\n# Step 3: Stack paper reading materials on shelf, most frequent on top\n# Magazine is published more frequently than newspaper\ngrab('newspaper')\ndeposit('newspaper', 'shelf')\ngrab('magazine')\ndeposit('magazine', 'shelf')\n# Magazine now on top (published most frequently)\n\n# Step 4: Move sentimental photo to windowsill\ngrab('framed_wedding_photo')\ndeposit('framed_wedding_photo', 'windowsill')\n\n# Step 5: Move fabric item from dining_table to armchair\ngrab('cloth_napkin')\ndeposit('cloth_napkin', 'armchair')\n\n# Step 6: Move tv_remote to end_table opposite from candle_holder\n# Candle_holder was placed first, so tv_remote goes on opposite side\ngrab('tv_remote')\ndeposit('tv_remote', 'end_table')\n# Position: candle_holder on one side, tv_remote on opposite side\n"""

for i, conversation in enumerate(test_set):
    # 1. Build the full history for this specific test case
    # full_conversation = [
    #     {"role": "user", "content": example_user_prompt},
    #     {"role": "assistant", "content": example_assistant_response}
    # ] + [conversation] # Changed 'conversation' to '[conversation]' to concatenate lists

    full_conversation = conversation
    # 2. Wrap 'full_conversation' in a list to satisfy the batch requirement
    inputs = tokenizer.apply_chat_template(
        [full_conversation], # <--- Wrapped in brackets
        tokenize = True,
        add_generation_prompt = True,
        return_tensors = "pt",
    ).to("cuda")

    # 3. Generate
    outputs = model.generate(
        input_ids = inputs,
        max_new_tokens = 2048,
        use_cache = True,
        temperature = 1.0,
        min_p = 0.1
    )

    # 4. Decode
    generated_text = tokenizer.batch_decode(outputs[:, inputs.shape[1]:], skip_special_tokens=True)[0]

    all_responses.append({
        "output": generated_text,
        "tokens_count": outputs.shape[1] - inputs.shape[1],
        "input_token_count" : inputs.shape[1]

    })

In [None]:
all_responses


[{'output': '# 1: Move solid food eating utensils\neating_utensils = ["metal_fork", "soup_spoon"]\nfor utensil in eating_utensils:\n    if find(utensil)!= "kitchen_counter":\n        deposit(utensil, "kitchen_counter")\n    else:\n        print(f"{utensil} already on kitchen_counter, cannot move")\n\n# 2: Move ambiance and igniter\nambiance = "candle_holder"\nigniter = "matches"\ndeposit(ambiance, "end_table")\nif find(igniter)!= "end_table":\n    deposit(igniter, "end_table")\nelse:\n    print(f"{igniter} already on end_table, cannot move")\n\n# 3: Gather and stack paper-based reading materials\npaper_materials = ["magazine", "newspaper"]\nfor material in paper_materials:\n    if find(material)!= "shelf":\n        deposit(material, "shelf")\n    else:\n        print(f"{material} already on shelf, cannot move")\n\n# 4: Position sentimental photo\nphoto = "framed_wedding_photo"\nif find(photo)!= "windowsill":\n    deposit(photo, "windowsill")\nelse:\n    print(f"{photo} already on windo

In [None]:
# 1. Get the length of the input tokens
input_length = inputs.shape[1]

# 2. Get the total length of the output (assuming batch size 1)
output_length = outputs.shape[1]

# 3. Calculate generated tokens
generated_tokens_count = output_length - input_length

print(f"Tokens generated: {generated_tokens_count}")

Tokens generated: 445


 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "from household_robot import find(item), grab(item), deposit(item, target) \n objects=[dinner_plate, dessert_plate, salad_bowl, mug, paperback_book, hardcover_book, notebook, pencil, family_portrait_photo, house_key, coaster, vase_with_fake_flowers, kitchen_table, bookshelf, coffee_table, cabinet, kitchen_counter, drawer, sofa_cushion] \n Create a sequence of functions to perform these tasks: \n 1. Clear the main eating surface by moving all items meant for holding food or drink to the kitchen counter. Do not touch decorative items. \n 2. On the surface meant for reading and relaxing, gather all items that belong to a writing desk and put them on the cabinet. However, leave the most visually prominent object that is purely for decoration in its original place. \n 3. Now, find the item from step 1 that is specifically used for hot beverages. Place it next to the last item you placed on the cabinet in step 2. \n 4. Retrieve the object that provides access to the home from outside. Take it and place it on the highest *non-storage* surface in the room. \n 5. Locate the hard, flat object that is most likely to be read for leisure (not for note-taking). Pick it up and place it underneath the decorative object you intentionally left untouched in step 2. \n 6. Finally, ensure the only remaining object on the main eating surface is the one that prevents drink rings. If it's not there, place it there."}
],

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1000,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

def step_1():
    edible_items = ["dinner_plate", "dessert_plate", "salad_bowl", "mug"]
    for item in edible_items:
        deposit(item, "kitchen_counter")
    non_decorative_eating_items = ["kitchen_table", "coffee_table"]
    for item in non_decorative_eating_items:
        deposit(item, "kitchen_counter")

def step_2():
    writing_desk_items = ["pencil", "notebook"]
    for item in writing_desk_items:
        if item!= "notebook":  # leave notebook
            deposit(item, "cabinet")
    decorative_item = "family_portrait_photo"
    # leave in its original place

def step_3():
    mug_deposit = "cabinet"
    coffee_mug = "mug"
    # placing mug next to the last item I placed on the cabinet
    deposit(coffee_mug, "cabinet")

def step_4():
    house_key = "house_key"
    # take it and place it on the highest non-storage surface
    deposit(house_key, "bookshelf")

def step_5():
    hardcover_book = "hardcover_book"
    # place it underneath the decorative object
    deposit(hard

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model_70b")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/chat_template.jinja',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Describe a tall tower in the capital of France."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

RuntimeError: Unsloth: No config file found - are you sure the `model_name` is correct?
If you're using a model on your local device, confirm if the folder location exists.
If you're using a HuggingFace online model, check if it exists.

You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False:
    model.save_pretrained("model")
    tokenizer.save_pretrained("model")
if False:
    model.push_to_hub("hf/model", token = "")
    tokenizer.push_to_hub("hf/model", token = "")


### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp.

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️

  This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).
</div>
