To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**Read our [blog post](https://unsloth.ai/blog/r1-reasoning) for guidance on how to train reasoning models.**

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [17]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
    !pip install --no-deps cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth

In [25]:
!pip install wandb



In [26]:
wandbkey = "eb96f6820f9daed18e2c666c157f511856896d07"

!wandb login --relogin

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin


### Unsloth

In [18]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.3.9: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [19]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [20]:
import pandas as pd

df = pd.read_excel(r'Fold_1_Consolidated_Data.xlsx')
print(len(df))

23345


In [21]:
chapter_columns = [
    "TRABALHOS PREPARATÓRIOS E MONTAGEM DE ESTALEIRO",
    "DEMOLIÇÕES E CONTENÇÃO DE FACHADA",
    "MOVIMENTO DE TERRAS",
    "CONTENÇÕES",
    "FUNDAÇÕES",
    "PAVIMENTO TERREO",
    "ESTRUTURA BETÃO ARMADO",
    "ESTRUTURA METÁLICA",
    "ESTRUTURA DE MADEIRA",
    "ESTRUTURAS DE ALVENARIA",
    "DIVERSOS",
    "REFORÇO DE ELEMENTOS ESTRUTURAIS EXISTENTES"
]  # Add all chapter columns

# Subchapter Columns
subchapter_columns = [
    '1.1 Estaleiro',
    '1.2 Tapume e Pala de Proteção',
    '1.3 Segurança Permanente em Obra',
    '1.4 Telas',
    '1.5 Levantamentos Construções Vizinhas',
    '2.1 Demolições de Revestimentos',
    '2.2 Demolições de Elementos Estruturais',
    '2.3 Demolição do Edifício existente incluindo acumulação e transporte dos produtos para vazadouro autorizado',
    '2.4 Demolições de Elementos',
    '2.5 Remoção de equipamentos eléctricos',
    '2.6 Contenção de Fachada',
    '2.7 Trabalhos Arqueológicos',
    '2.8 Plano Gestao de Resíduos',
    '3.1 Limpeza, Desmatação e Decapagem do terreno',
    '3.2 Remoção de árvores e arbustos',
    '3.3 Escavação Geral',
    '3.4 Escavação de Caboucos',
    '3.5 Aterros',
    '4.1 Contenção Periférica',
    '4.2 Instrumentação e Monitorização',
    '5.1 Betão de Limpeza',
    '5.2 Fundações Diretas',
    '5.3 Fundações Indirectas',
    '6.1 Pavimento Térreo com fibras',
    '6.2 Laje de Pavimento Térreo Armado',
    '6.3 Laje de Pavimento Térreo com moldes Cupolex',
    '7.1 Muros de Suporte',
    '7.2 Pilares',
    '7.3 Vigas',
    '7.4 Paredes',
    '7.5 Núcleos',
    '7.6 Escadas',
    '7.7 Platibandas',
    '7.8 Lajes',
    '7.9 Pré-Esforço',
    '8.1 Enformados a Frio',
    '8.2 Laminados',
    '8.3 Tubulares enformados a frio',
    '8.4 Tubulares enformados a quente',
    '8.5 Chumbadouros',
    '8.6 Lajes Colaborantes',
    '8.7 Gradil Metálico',
    '8.8 Pintura de Perfis Metálicos',
    '8.9 Proteção ao Fogo',
    '8.10 Apoios de neoprene',
    '9.1 Vigamentos de madeira de lamelado colado',
    '9.2 vigas de madeira de Pinho Bravo',
    '9.3  lajes mistas Madeira-Betão com vigas de madeira',
    '9.4 ligações em aço da Estrutura de Madeira',
    '9.5 Paredes de frontal',
    '9.6 Estrutura de Cobertura em madeira, com painéis OSB.',
    'Throwaway',
    'Juntas de dilatação',
    'Impermeabilização - Telas Bentoníticas Voltex',
    'Impermeabilização - Telas em PVC',
    'Impermeabilização - Emulsão Betuminosa',
    'Recuperação de paredes existentes',
    'Recuperação de pavimentos existentes'
] # Add all subchapter columns

# Item Columns
item_columns = [
    'Pavimento Existentes',
    'Parede Existentes',
    'Tecto Existentes',
    'Pavimentos Existentes',
    'Paredes Existentes',
    'Chaminés',
    'Caixilharia Exterior',
    'Caixilharia Interior',
    'Equipamentos Sanitários ',
    'Armários e Mobiliário de Cozinha',
    'Posto de Transformação',
    'Cabos Electricos',
    'Aparelhagens, Quadros, calhas e luminárias',
    'Abate de Árvores',
    'Transplante de Árvores',
    'Proteção de Árvores',
    'Escavação em Solo',
    'Escavação em Rocha',
    'Aterro com solos provenientes do local',
    'Aterro com solos de emprestimo',
    'Ancoragem de Contenção',
    'Muros Tipo Berlim',
    'Muros Tipo Munique',
    'Escoramento',
    'Cortina de Estacas Prancha',
    'Parede Moldada',
    'Muro de Gabiões',
    'Muros de Terra Armada',
    'Gunitagem / Betão Projetado',
    'Cortina de Estacas de Trado Continuo',
    'Cortina de Estacas Moldadas',
    'Desactivação de Ancoragens',
    'Ensaio Sónico',
    'Células de Carga',
    'Tubos Inclinómetros',
    'Alvos Topográficos',
    'Ensaios Prévios em Ancoragens',
    'Células dinamométricas',
    'Réguas de Nivelamento',
    'Piezómetros',
    'Sapatas',
    'Vigas de Fundação',
    'Betão Ciclópico',
    'Poços',
    'Laje de Fundação (Ensoleiramento Geral)',
    'Maciços de Encabeçamento',
    'Estacas Trado Contínuo',
    'Estacas Moldadas',
    'MicroEstacas',
    'Estacas Raiz',
    'Colunas de Jet Grouting',
    'Colunas Calda de Cimento Armadas',
    'Ensaios de Integridade das Estacas',
    'Laje maciças',
    'Laje fungiformes Maciças',
    'Laje fungiformes Aligeiradas',
    'Lajes com módulos embebido',
    'Lajes Pré-Fabricadas',
    'Execução de picagens e consolidação das paredes existentes a manter, mediante da aplicação de uma lâmina de argamassa, por via húmida.',
    'microbetão, em reforços de paredes com lâminas de betão projectado',
    'encasques nas paredes de alvenaria de pedra existentes',
    'encasques e preenchimento de vazios, nichos existentes',
    'Execução de injecção e selagem de fendas em paredes de alvenaria de pedra',
    'Reforço de paredes com injeção e selagem com calda,',
    'Execução de novos troços de paredes de frontal.',
    'Reforço das ligações entre paredes de tabique e de alvenaria de pedra/tijolo. ',
    'Pregagens',
    'Reforço de abóbadas com betão estrutural.',
    'Substituição de vigas de madeira existentes em pavimentos cujo estado de deterioração não permita a sua conservação.',
    'Reforço da ligação das vigas de madeira existentes sem apoio em cantoneiras às paredes de alvenaria de pedra existentes.',
    'Carotagem de lajes existentes',
    'Throwaway - Item'

]  # Add all item columns

In [22]:
# ✅ Extract all labels (multi-label format)
def extract_labels(row):
    # Identify all nonzero values (1s) in the row for each classification type
    chapter_list = [chapter_columns[i] for i, val in enumerate(row[chapter_columns]) if val == 1]
    subchapter_list = [subchapter_columns[i] for i, val in enumerate(row[subchapter_columns]) if val == 1]
    item_list = [item_columns[i] for i, val in enumerate(row[item_columns]) if val == 1]

    # Assign the list of detected labels or "None" if empty
    chapter = chapter_list if chapter_list else ["None"]
    subchapter = subchapter_list if subchapter_list else ["None"]
    item = item_list if item_list else ["None"]

    return chapter, subchapter, item

# ✅ Filter dataset: Only rows with at least one classification
df_examples = df[(df.iloc[:, 3:].sum(axis=1) > 0)]  # Exclude rows without labels

# ✅ Convert to Dictionary Format for Unsloth
data_dict = {
    "description": df_examples["Description"].tolist(),
    "chapter": [", ".join(extract_labels(row)[0]) for _, row in df_examples.iterrows()],
    "subchapter": [", ".join(extract_labels(row)[1]) for _, row in df_examples.iterrows()],
    "item": [", ".join(extract_labels(row)[2]) for _, row in df_examples.iterrows()]
}

print(data_dict["description"][0])
print(data_dict["chapter"][0])
print(data_dict["subchapter"][0])
print(data_dict["item"][0])


Fornecimento e construção da laje colaborante na execução dos passadiços de transição entre a caixa do elevador e o edifício existente com dimensões médias de 1,60*1,30m, a levar a efeito de acordo com a localização e pormenorização construtiva prevista nos desenhos nº 4 e nº 6 do projecto de estabilidade, incluindo todos os trabalhos necessários e ligações estruturais.
Nota: quantifica-se no presente artigo a unidade de laje a executar em cada um dos patamares, pelo seu valor global. 

ESTRUTURA METÁLICA
8.6 Lajes Colaborantes
Throwaway - Item


In [23]:
from datasets import Dataset

# ✅ Convert Dictionary to Hugging Face Dataset
dataset = Dataset.from_dict(data_dict)

# ✅ Check first 2 samples
print(dataset["description"][:1])
print(dataset["chapter"][:1])
print(dataset["subchapter"][:1])
print(dataset["item"][:1])

['Fornecimento e construção da laje colaborante na execução dos passadiços de transição entre a caixa do elevador e o edifício existente com dimensões médias de 1,60*1,30m, a levar a efeito de acordo com a localização e pormenorização construtiva prevista nos desenhos nº 4 e nº 6 do projecto de estabilidade, incluindo todos os trabalhos necessários e ligações estruturais.\nNota: quantifica-se no presente artigo a unidade de laje a executar em cada um dos patamares, pelo seu valor global. \n']
['ESTRUTURA METÁLICA']
['8.6 Lajes Colaborantes']
['Throwaway - Item']


In [24]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    instructions = ["Classify the following Bill of Quantities (BOQ) description into Chapter, Subchapter, and Item."] * len(examples["description"])
    inputs = examples["description"]
    outputs = [f"Chapter: {ch}\nSubchapter: {sub}\nItem: {item}"
               for ch, sub, item in zip(examples["chapter"], examples["subchapter"], examples["item"])]

    texts = [alpaca_prompt.format(inst, inp, out) + EOS_TOKEN for inst, inp, out in zip(instructions, inputs, outputs)]

    return {"text": texts}
pass

# Apply formatting function
dataset = dataset.map(formatting_prompts_func, batched=True)

# Show some samples
print(dataset["text"][:2])  # Preview first two formatted samples

Map:   0%|          | 0/23270 [00:00<?, ? examples/s]

['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nClassify the following Bill of Quantities (BOQ) description into Chapter, Subchapter, and Item.\n\n### Input:\nFornecimento e construção da laje colaborante na execução dos passadiços de transição entre a caixa do elevador e o edifício existente com dimensões médias de 1,60*1,30m, a levar a efeito de acordo com a localização e pormenorização construtiva prevista nos desenhos nº 4 e nº 6 do projecto de estabilidade, incluindo todos os trabalhos necessários e ligações estruturais.\nNota: quantifica-se no presente artigo a unidade de laje a executar em cada um dos patamares, pelo seu valor global. \n\n\n### Response:\nChapter: ESTRUTURA METÁLICA\nSubchapter: 8.6 Lajes Colaborantes\nItem: Throwaway - Item<|end_of_text|>', 'Below is an instruction that describes a task, paired with an input that provides further 

In [28]:
# Split dataset into 80% training and 20% testing
split_dataset = dataset.train_test_split(test_size=0.2, seed=42)

# Extract train and test datasets
train_dataset = split_dataset["train"]
test_dataset = split_dataset["test"]

# Show dataset sizes
print(f"Training samples: {len(train_dataset)}")
print(f"Testing samples: {len(test_dataset)}")


Training samples: 18616
Testing samples: 4654


<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [30]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = test_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 3, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "wandb",
        save_steps = 30,  # Save checkpoints every 50 steps
        save_total_limit = 3,  # Keep 3 checkpoints max
        evaluation_strategy = "steps",  # Evaluate frequently
        eval_steps = 15,  # Every 20 steps
    ),
)



Tokenizing to ["text"] (num_proc=2):   0%|          | 0/18616 [00:00<?, ? examples/s]

Tokenizing to ["text"] (num_proc=2):   0%|          | 0/4654 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.984 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 51,760 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,1.8176
2,2.3042
3,1.6893
4,1.9382
5,1.6569
6,1.6219
7,1.1871
8,1.2642
9,1.1012
10,1.1895


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

462.7198 seconds used for training.
7.71 minutes used for training.
Peak reserved memory = 7.922 GB.
Peak reserved memory for training = 1.938 GB.
Peak reserved memory % of max memory = 53.716 %.
Peak reserved memory for training % of max memory = 13.141 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
13, 21, 34, 55, 89, 144<|end_of_text|>


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "What is a famous tall tower in Paris?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is a famous tall tower in Paris?

### Input:


### Response:
One of the most famous and iconic tall towers in Paris is the Eiffel Tower. Standing at 324 meters (1,063 feet) tall, this wrought iron tower is a symbol of the city and a must-see attraction for tourists from all over the world.<|end_of_text|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "",
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Llama 3.2 Conversational notebook. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
