To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

**[NEW] Llama-3 8b is trained on a crazy 15 trillion tokens! Llama-2 was 2 trillion.**

Use our [Llama-3 8b Instruct](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing) notebook for conversational style finetunes.

In [1]:
# %%capture
# # Installs Unsloth, Xformers (Flash Attention) and all other packages!
# !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# !pip install --no-deps xformers trl peft accelerate bitsandbytes

In [None]:
import torch
major_version, minor_version = torch.cuda.get_device_capability()
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass
!pip install triton transformers
!pip install -U datasets
!pip install --pre -U xformers ##### this take some time


# restart the kernel after running this cell

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [2]:
# Load dataset
import pandas as pd
import os

# df_evaluated = pd.read_pickle("/content/0122_10000_evaluated.pkl")
df_evaluated = pd.read_pickle(os.path.join(os.getcwd(), "0122_10000_evaluated.pkl"))
# df_news = pd.read_pickle("/content/mixtral_integrated_df.pkl")
# df_news = pd.read_pickle("/content/3_new_days_mixtral_integrated_df.pkl")
df_news = pd.read_pickle(os.path.join(os.getcwd(), "3_new_days_mixtral_integrated_df.pkl"))
df_news = df_news[df_news['answer'] != 'Error: LLM call failed']
df_evaluated = df_evaluated[df_evaluated["accuracy"] > 4.5]
# df_evaluated.head(3)
# df_news.head(3)

In [3]:
import pandas as pd


# Define the number of instances to select per language
split_language = 1000

# Create a dictionary to store language-specific DataFrames
language_dataframes = {
    lang: df_evaluated[df_evaluated["language"] == lang].sample(split_language, random_state=42)
    for lang in df_evaluated["language"].unique()
}

# Access the DataFrames for each language using the dictionary
df_finetuning_en = language_dataframes["en"]  # Access English DataFrame
df_finetuning_it = language_dataframes["it"]  # Access Italian DataFrame (if it exists)
df_finetuning_es = language_dataframes["es"]  # Access Spanish DataFrame (if it exists)
df_finetuning_fr = language_dataframes["fr"]  # Access French DataFrame (if it exists)

# Print DataFrame shapes
print(df_finetuning_en.shape)
print(df_finetuning_it.shape)
print(df_finetuning_es.shape)
print(df_finetuning_fr.shape)

df_finetuning = pd.concat([df_finetuning_en, df_finetuning_it, df_finetuning_es, df_finetuning_fr], ignore_index=True)


(1000, 5)
(1000, 5)
(1000, 5)
(1000, 5)


In [4]:
df_news['language'] = 'it'


In [5]:
# Merge the selected dataframes based on 'question' and 'answer'
merged_df = pd.merge(df_finetuning, df_news, on=['question', 'answer', 'language'], how='outer')
# merged_df.head(2)

In [6]:
# rename the columns
merged_df = merged_df.sample(frac=1).reset_index(drop=True)
df_finetuning = merged_df.rename(columns = {"question": "instruction", "answer": "output" , "language": "input"})

<a name="Save"></a>
### loading finetuned models

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model.

In [32]:
# model.save_pretrained("lora_model") # Local saving
# tokenizer.save_pretrained("lora_model")
# # model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# # tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [8]:
if True:
    max_seq_length = 8048 # Choose any! We auto support RoPE Scaling internally!
    dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
    load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.27.dev792. FA = True.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [9]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context (if present). Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

In [10]:

# istructions = ["Un gatto mangia 5 topi in 6 giorni. Di questo passo, quanti topi mangerà in 30 giorni?"]

istructions = ["Scrivi una frase che parli di un ristorante con tutte le seguenti proprietà: nome = Strada, eatType = ristorante, cibo = cinese, fascia di prezzo = economico, valutazione del cliente = 5 su 5, adatto alle famiglie = sì, vicino = Rainbow Vegetarian Café"
              , "Mi esposa y yo fuimos a ver Elizabethtown anoche (27/10/2005) y quedamos terriblemente decepcionados. De hecho, entramos esperando ver una buena película. NO esperábamos nada grandioso, pero imaginamos algunas escenas lindas y algunas líneas divertidas de Paula Dean (normalmente presentadora de Food Network). Es difícil subestimar exactamente cuán equivocados estábamos (de hecho, consideré tomar un descanso para ir al baño), ir a jugar uno o dos videojuegos y luego regresar.\n\nSeleccione entre los siguientes.\n * Al hablante le encanta Elizabethtown.\n *El orador recomienda Elizabethtown..\n *El orador detesta Elizabethtown..\n * Ninguna de las opciones anteriores.\nP: ¿Qué piensa el orador sobre Elizabethtown?"
              , "Dans une gare, quatre amis déposent leurs bagages dans des casiers séparés avant de partir faire du tourisme. Adam met sa valise dans le casier A, Beth met la sienne dans le casier B, Carl met la sienne dans le casier C et Dana met la sienne dans le casier D. Plus tard, un agent de sécurité déplace les valises en raison d'un problème de maintenance, plaçant la valise d'Adam dans le casier. D, la valise de Beth dans le casier A, la valise de Carl dans le casier B et la valise de Dana dans le casier C. Lorsque les amis reviendront récupérer leurs bagages, où chacun cherchera-t-il initialement sa valise\xa0?"]
for instruction in istructions:
  prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  {instruction}


  ### Response:"""

  print(f"[*********]Instruction:")
  print(instruction)
  #   print(prompt)
  print("[*]result:")
  inputs = tokenizer(
    [
        alpaca_prompt.format(
            instruction, # instruction
            "it", # input
            "", # output - leave this blank for generation!
        )
    ], return_tensors = "pt").to("cuda")
  outputs = model.generate(**inputs, max_new_tokens = 500, use_cache = True)
  outputs = tokenizer.batch_decode(outputs)
  result =outputs[0].split("Response:")[-1].strip()

  
  print(result)
  print("\n") 

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[*********]Instruction:
Scrivi una frase che parli di un ristorante con tutte le seguenti proprietà: nome = Strada, eatType = ristorante, cibo = cinese, fascia di prezzo = economico, valutazione del cliente = 5 su 5, adatto alle famiglie = sì, vicino = Rainbow Vegetarian Café
[*]result:


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Strada è un delizioso ristorante cinese economico dove il cibo è valutato 5 su 5 dai clienti e che è adatto alle famiglie, situato vicino al Rainbow Vegetarian Café.<|end_of_text|>


[*********]Instruction:
Mi esposa y yo fuimos a ver Elizabethtown anoche (27/10/2005) y quedamos terriblemente decepcionados. De hecho, entramos esperando ver una buena película. NO esperábamos nada grandioso, pero imaginamos algunas escenas lindas y algunas líneas divertidas de Paula Dean (normalmente presentadora de Food Network). Es difícil subestimar exactamente cuán equivocados estábamos (de hecho, consideré tomar un descanso para ir al baño), ir a jugar uno o dos videojuegos y luego regresar.

Seleccione entre los siguientes.
 * Al hablante le encanta Elizabethtown.
 *El orador recomienda Elizabethtown..
 *El orador detesta Elizabethtown..
 * Ninguna de las opciones anteriores.
P: ¿Qué piensa el orador sobre Elizabethtown?
[*]result:


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


El orador detesta Elizabethtown.<|end_of_text|>


[*********]Instruction:
Dans une gare, quatre amis déposent leurs bagages dans des casiers séparés avant de partir faire du tourisme. Adam met sa valise dans le casier A, Beth met la sienne dans le casier B, Carl met la sienne dans le casier C et Dana met la sienne dans le casier D. Plus tard, un agent de sécurité déplace les valises en raison d'un problème de maintenance, plaçant la valise d'Adam dans le casier. D, la valise de Beth dans le casier A, la valise de Carl dans le casier B et la valise de Dana dans le casier C. Lorsque les amis reviendront récupérer leurs bagages, où chacun cherchera-t-il initialement sa valise ?
[*]result:
Adam cherchera d'abord sa valise dans le casier D, puisqu'il l'a déposée dans le casier A. Beth cherchera d'abord sa valise dans le casier A, puisqu'elle l'a déposée dans le casier B. Carl cherchera d'abord sa valise dans le casier B, puisqu'il l'a déposée dans le casier C. Dana cherchera d'abord sa vali