#Empathic LLM

## Bibliotecas

In [None]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
!pip uninstall torch torchvision torchaudio bitsandbytes -y
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install bitsandbytes

Found existing installation: torch 2.5.1+cu121
Uninstalling torch-2.5.1+cu121:
  Successfully uninstalled torch-2.5.1+cu121
Found existing installation: torchvision 0.20.1+cu121
Uninstalling torchvision-0.20.1+cu121:
  Successfully uninstalled torchvision-0.20.1+cu121
Found existing installation: torchaudio 2.5.1+cu121
Uninstalling torchaudio-2.5.1+cu121:
  Successfully uninstalled torchaudio-2.5.1+cu121
Found existing installation: bitsandbytes 0.45.0
Uninstalling bitsandbytes-0.45.0:
  Successfully uninstalled bitsandbytes-0.45.0
Looking in indexes: https://download.pytorch.org/whl/cu121
Collecting torch
  Downloading https://download.pytorch.org/whl/cu121/torch-2.5.1%2Bcu121-cp310-cp310-linux_x86_64.whl (780.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m780.4/780.4 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.20.1%2Bcu121-cp310-cp310-linux_x86_64.whl (7.3 MB)
[2K

In [None]:
import pandas as pd
import re
import torch
from unsloth import FastLanguageModel
from datasets import Dataset, load_dataset
from unsloth.chat_templates import get_chat_template

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## Preparação dos Dados

In [None]:
df = pd.read_csv("/content/emotion-emotion_69k.csv", delimiter=',',
                 engine='python', index_col="Unnamed: 0")

In [None]:
df

Unnamed: 0,Situation,emotion,empathetic_dialogues,labels,Unnamed: 5,Unnamed: 6
0,I remember going to the fireworks with my best...,sentimental,Customer :I remember going to see the firework...,"Was this a friend you were in love with, or ju...",,
1,I remember going to the fireworks with my best...,sentimental,Customer :This was a best friend. I miss her.\...,Where has she gone?,,
2,I remember going to the fireworks with my best...,sentimental,Customer :We no longer talk.\nAgent :,Oh was this something that happened because of...,,
3,I remember going to the fireworks with my best...,sentimental,Customer :Was this a friend you were in love w...,This was a best friend. I miss her.,,
4,I remember going to the fireworks with my best...,sentimental,Customer :Where has she gone?\nAgent :,We no longer talk.,,
...,...,...,...,...,...,...
64631,I found some pictures of my grandma in the att...,sentimental,Customer :Did you find anything great?\nAgent :,Yeah I found some old pictures of when us kids...,,
64632,I found some pictures of my grandma in the att...,sentimental,Customer :What a wonderful memory. \nAgent :,Yeah reminds me of the good old days. I miss ...,,
64633,I woke up this morning to my wife telling me s...,surprised,Customer :I woke up this morning to my wife te...,Oh hey that's awesome! That is awesome right?,,
64634,I woke up this morning to my wife telling me s...,surprised,Customer :It is soooo awesome. We have been w...,That is awesome!!!! Congratulations!,,


In [None]:
df['emotion'].value_counts()

Unnamed: 0_level_0,count
emotion,Unnamed: 1_level_1
surprised,3295
excited,2465
angry,2296
proud,2247
annoyed,2213
sad,2213
lonely,2106
afraid,2094
grateful,2091
terrified,2074


In [None]:
# Redefinir o índice para garantir que 'index' seja uma coluna
df.reset_index(inplace=True)

# Função para corrigir linhas problemáticas da coluna 'emotion'
def fix_misaligned_rows(row):
    # Lista de condições e se elas precisam de espaço ao concatenar em 'Situation'
    conditions = [
        ('t even like scary things', False),
        ('t believe my daughter taught herself how to play the ukelele. I was amazed', True),
        ('I really killed it!', True),
        ('t believe I like the show Power so much. I was never really into shows like that', False),
        ('t think I wold like super heroes', False),
        ("but what I didn't know was that he was working in the next room with the door open.", True),
        ('we were in a different country', True),
        ("I hear all these different labor stories that aren't exactly reassuring!", True),
        ("He stole from me and didn't think I would notice.", False),
        ("time to jump on the motorcycle and go cruising!", True)
    ]

    # Iterar sobre as condições
    for condition, needs_space in conditions:
        if condition in str(row['emotion']):
            # Concatenar 'emotion' em 'Situation' com ou sem espaço
            row['Situation'] += (" " if needs_space else "") + row['emotion']
            # Substituir o valor de 'emotion' pelo de 'empathetic_dialogues'
            row['emotion'] = row['empathetic_dialogues']
            # Reorganizar as colunas
            row['empathetic_dialogues'] = row['labels']
            row['labels'] = row['Unnamed: 5']
            row['Unnamed: 5'] = row['Unnamed: 6']
            row['Unnamed: 6'] = None
            return row  # Sair do loop ao encontrar a condição

    # Caso especial para "("
    if "(" in str(row['emotion']):
        # Reorganizar as colunas sem alterar 'Situation'
        row['emotion'] = row['empathetic_dialogues']
        row['empathetic_dialogues'] = row['labels']
        row['labels'] = row['Unnamed: 5']
        row['Unnamed: 5'] = row['Unnamed: 6']
        row['Unnamed: 6'] = None

    return row

# Função para corrigir linhas problemáticas do 'Unnamed: 6'
def fix_misaligned_rows1(row):
    conditions = [
        (23485, "I really miss having a man around the house. I have to do too much"),
        (23486, "I really miss having a man around the house. I have to do too much"),
        (23487, "I really miss having a man around the house. I have to do too much"),
        (23488, "I really miss having a man around the house. I have to do too much"),
    ]

    for index, correct_situation in conditions:
        if row['index'] == index:
            row['Situation'] = correct_situation
            row['emotion'] = row['labels']
            row['empathetic_dialogues'] = row['Unnamed: 5']
            row['labels'] = row['Unnamed: 6']
            row['Unnamed: 5'] = None
            row['Unnamed: 6'] = None
            return row

    return row

# Aplicar a correção ao DataFrame
df_fixed = df.apply(fix_misaligned_rows, axis=1)
# Aplicar a nova correção ao DataFrame
df_fixed = df_fixed.apply(fix_misaligned_rows1, axis=1)
# Dropar a coluna 'Unnamed: 6' porque todos os valores são nulos
df_fixed.drop(columns=['Unnamed: 6'], inplace=True)
df_fixed.drop(columns=['index'], inplace=True)
# Concatenar as colunas 'labels' e 'Unnamed: 5', separando os valores por espaço
df_fixed['labels'] = df_fixed['labels'].fillna('') + ' ' + df_fixed['Unnamed: 5'].fillna('')
# Remover a coluna 'Unnamed: 5' após a junção
df_fixed.drop(columns=['Unnamed: 5'], inplace=True)
# Renomear as colunas 'labels' para 'Agent' e 'empathetic_dialogues' para 'Customer'
df_fixed.rename(columns={'labels': 'Assistant', 'empathetic_dialogues': 'User'}, inplace=True)
# Remover os padrões "Customer :" e "\nAgent :" da coluna 'Customer'
df_fixed['User'] = df_fixed['User'].str.replace(r'Customer :|\nAgent :', '', regex=True)

In [None]:
df_fixed.loc[3722:3726]

Unnamed: 0,Situation,emotion,User,Assistant
3722,I've been watching the new stephen king show. ...,terrified,"I don't watch scary shows, but I have been wat...","Yes, I watch the show. That is a bit scary tho..."
3723,I've been watching the new stephen king show. ...,terrified,The first episode was fine but I don't thik I ...,"Ohh come on its not that scary, you should wat..."
3724,I've been watching the new stephen king show. ...,terrified,You don't know me lol,"Haha, Ok as you wish then"
3725,I've been watching the new stephen king show. ...,terrified,"Yes, I watch the show. That is a bit scary tho...",The first episode was fine but I don't thik I ...
3726,I've been watching the new stephen king show. ...,terrified,"Ohh come on its not that scary, you should wat...",You don't know me lol


In [None]:
df_fixed.loc[1419:1422]

Unnamed: 0,Situation,emotion,User,Assistant
1419,I had a great day at work today I really kill...,confident,I had a great day at work today! We are worki...,"That's really awesome, what makes it so complex?"
1420,I had a great day at work today I really kill...,confident,There are many different variables that need t...,Well that's really great that you were able to...
1421,I had a great day at work today I really kill...,confident,"That's really awesome, what makes it so complex?",There are many different variables that need t...
1422,I had a great day at work today I really kill...,confident,Well that's really great that you were able to...,Not usually this is an unusual case!


In [None]:
df_fixed.loc[20933:20936]

Unnamed: 0,Situation,emotion,User,Assistant
20933,I cant believe I like the show Power so much. ...,surprised,I have really gotten into that show Power.,I've never seen it. Why do you like it?
20934,I cant believe I like the show Power so much. ...,surprised,"I don't usually like shows like that, but the ...",What is it about?
20935,I cant believe I like the show Power so much. ...,surprised,I've never seen it. Why do you like it?,"I don't usually like shows like that, but the ..."
20936,I cant believe I like the show Power so much. ...,surprised,What is it about?,Drugs. That's why I was apprehensive


In [None]:
df_fixed.loc[40194:40197]

Unnamed: 0,Situation,emotion,User,Assistant
40194,I didnt think I wold like super heroes,surprised,I love super hero movies now,What is your favorite super hero movie?
40195,I didnt think I wold like super heroes,surprised,wonder woman. She was fierce,"To be honest, it seemed quite childish for me"
40196,I didnt think I wold like super heroes,surprised,What is your favorite super hero movie?,wonder woman. She was fierce
40197,I didnt think I wold like super heroes,surprised,"To be honest, it seemed quite childish for me",wow ok


In [None]:
df_fixed.loc[53778:53781]

Unnamed: 0,Situation,emotion,User,Assistant
53778,I once was talking about a coworker behind his...,ashamed,"The other week, I was talking about a co-worki...","Hmm, how did that make you feel"
53779,I once was talking about a coworker behind his...,ashamed,"Pretty awful, actually. Especially considerin...","Yeah, at least now you're more aware, it can b..."
53780,I once was talking about a coworker behind his...,ashamed,"Hmm, how did that make you feel","Pretty awful, actually. Especially considerin..."
53781,I once was talking about a coworker behind his...,ashamed,"Yeah, at least now you're more aware, it can b...",Absolutely. I really am disgusted with myself...


In [None]:
df_fixed.loc[2546:2548]

Unnamed: 0,Situation,emotion,User,Assistant
2546,I ran into my old school mate outside the cou...,surprised,How believable do you think this is when you r...,i think that would be very unbelievable to run...
2547,I ran into my old school mate outside the cou...,surprised,"Of course yes, that is what i just said happen...",wow that is awesome where you good friends wit...
2548,I ran into my old school mate outside the cou...,surprised,i think that would be very unbelievable to run...,"Of course yes, that is what i just said happen..."


In [None]:
df_fixed.loc[28674:28676]

Unnamed: 0,Situation,emotion,User,Assistant
28674,Rally time time to jump on the motorcycle and...,excited,The world famous sturgis rally is this week. ...,I have never heard of that
28675,Rally time time to jump on the motorcycle and...,excited,Huge motorcycle rally near mt rushmore in sout...,oh cool. I never been to any kind of motorcycl...
28676,Rally time time to jump on the motorcycle and...,excited,I have never heard of that,Huge motorcycle rally near mt rushmore in sout...


In [None]:
df_fixed.loc[31743:31745]

Unnamed: 0,Situation,emotion,User,Assistant
31743,I am having my first baby a boy. I hear all ...,apprehensive,I am having my first baby; a boy. I hear all t...,"Congratulations! If it helps at all, my wife a..."
31744,I am having my first baby a boy. I hear all ...,apprehensive,"Aw, that's so neat. I would love two boys! I ...",I think those types of situations are rather r...
31745,I am having my first baby a boy. I hear all ...,apprehensive,"Congratulations! If it helps at all, my wife a...","Aw, that's so neat. I would love two boys! I ..."


In [None]:
df_fixed.loc[42140:42142]

Unnamed: 0,Situation,emotion,User,Assistant
42140,Im so mad with my brother. He stole from me an...,furious,My brother stole from me and when I found out ...,"That's pretty scummy, has he always been like ..."
42141,Im so mad with my brother. He stole from me an...,furious,No he just recently started being a jerk.,You should have a talk with him and let him kn...
42142,Im so mad with my brother. He stole from me an...,furious,"That's pretty scummy, has he always been like ...",No he just recently started being a jerk.


In [None]:
df_fixed.loc[64217:64219]

Unnamed: 0,Situation,emotion,User,Assistant
64217,My dog died right after my cat died.,sad,My puppy died right after my cat.,Sorry to hear that. You must be going through ...
64218,My dog died right after my cat died.,sad,"It is horrible, i dont know what to do.",You should try engaging in a hobby to divert y...
64219,My dog died right after my cat died.,sad,Sorry to hear that. You must be going through ...,"It is horrible, i dont know what to do."


In [None]:
df_fixed['emotion'].value_counts()

Unnamed: 0_level_0,count
emotion,Unnamed: 1_level_1
surprised,3306
excited,2468
angry,2296
proud,2247
sad,2216
annoyed,2213
lonely,2110
afraid,2094
grateful,2091
terrified,2079


In [None]:
# Filtrar linhas onde a coluna 'emotion' é NaN
nan_rows = df_fixed[df_fixed['emotion'].isna()]

# Exibir as linhas filtradas
nan_rows

Unnamed: 0,Situation,emotion,User,Assistant


In [None]:
df_fixed.loc[23485:23488]

Unnamed: 0,Situation,emotion,User,Assistant
23485,I really miss having a man around the house. I...,lonely,I miss having a man at home,Having someone to call your own i essemtial in...
23486,I really miss having a man around the house. I...,lonely,I dont want to fell alone but I still miss it,"i understand, how long has it been?"
23487,I really miss having a man around the house. I...,lonely,Having someone to call your own i essemtial in...,I dont want to fell alone but I still miss it
23488,I really miss having a man around the house. I...,lonely,"i understand, how long has it been?",2 years. the longest


In [None]:
df_fixed.head(20)

Unnamed: 0,Situation,emotion,User,Assistant
0,I remember going to the fireworks with my best...,sentimental,I remember going to see the fireworks with my ...,"Was this a friend you were in love with, or ju..."
1,I remember going to the fireworks with my best...,sentimental,This was a best friend. I miss her.,Where has she gone?
2,I remember going to the fireworks with my best...,sentimental,We no longer talk.,Oh was this something that happened because of...
3,I remember going to the fireworks with my best...,sentimental,"Was this a friend you were in love with, or ju...",This was a best friend. I miss her.
4,I remember going to the fireworks with my best...,sentimental,Where has she gone?,We no longer talk.
5,i used to scare for darkness,afraid,it feels like hitting to blank wall when i se...,Oh ya? I don't really see how
6,i used to scare for darkness,afraid,dont you feel so.. its a wonder,I do actually hit blank walls a lot of times b...
7,i used to scare for darkness,afraid,i virtually thought so.. and i used to get sw...,Wait what are sweatings
8,i used to scare for darkness,afraid,Oh ya? I don't really see how,dont you feel so.. its a wonder
9,i used to scare for darkness,afraid,I do actually hit blank walls a lot of times b...,i virtually thought so.. and i used to get sw...


## Configuração do Unsloth

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-7b-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.1.1: Fast Gemma patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.57G [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


generation_config.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/40.0k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 42,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.1.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


## Formatar nosso dados para o Template a ser usado.

In [None]:
EOS_TOKEN = tokenizer.eos_token  # Token de finalização

# Função para formatar as conversas no estilo ChatML
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = []
    for convo in convos:
        # Formatar cada conversa no estilo ChatML
        text = (
            f"<|im_start|>system\n{convo[0]['value']}\n<|im_end|>\n"
            f"<|im_start|>user\n{convo[1]['value']}\n<|im_end|>\n"
            f"<|im_start|>assistant\n{convo[2]['value']}\n<|im_end|>"
        )
        texts.append(f"{text}{EOS_TOKEN}")
    return {"text": texts}

# Converter o DataFrame em uma lista de conversas
def prepare_conversations(df):
    conversations = []
    for _, row in df.iterrows():
        conversation = [
            {"from": "system", "value": f"{row['Situation']} Emotion of the chat: {row['emotion']}."},
            {"from": "human", "value": row["User"]},
            {"from": "gpt", "value": row["Assistant"]}
        ]
        conversations.append(conversation)
    return conversations

# Criar uma lista de conversas e transformá-la em um Dataset do Hugging Face
convos = prepare_conversations(df_fixed)
dataset = Dataset.from_dict({"conversations": convos})

# Aplicar a formatação no dataset
dataset = dataset.map(formatting_prompts_func, batched=True)

# Visualizar um exemplo formatado
print(dataset[0]["text"])

Map:   0%|          | 0/64636 [00:00<?, ? examples/s]

<|im_start|>system
I remember going to the fireworks with my best friend. There was a lot of people, but it only felt like us in the world. Emotion of the chat: sentimental.
<|im_end|>
<|im_start|>user
I remember going to see the fireworks with my best friend. It was the first time we ever spent time alone together. Although there was a lot of people, we felt like the only people in the world.
<|im_end|>
<|im_start|>assistant
Was this a friend you were in love with, or just a best friend? 
<|im_end|><eos>


## Train the model

Agora vamos usar o SFTTrainer da biblioteca Huggingface TRL! Mais informações aqui: [Documentação do TRL SFT](https://huggingface.co/docs/trl/sft_trainer).

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,  # O modelo pré-treinado que será ajustado.
    tokenizer = tokenizer,  # O tokenizador correspondente ao modelo.
    train_dataset = dataset,  # O conjunto de dados de treinamento formatado.
    dataset_text_field = "text",  # Nome do campo no dataset que contém o texto de entrada.
    max_seq_length = max_seq_length,  # Comprimento máximo da sequência para o modelo.
    dataset_num_proc = 2,  # Número de processos para paralelizar o pré-processamento do dataset.
    packing = False,  # Desativa o "packing", que agrupa sequências curtas para treinar de forma mais eficiente.
    args = TrainingArguments(  # Configuração dos argumentos de treinamento.
        run_name="G-Empathic",
        per_device_train_batch_size = 2,  # Tamanho do batch por dispositivo (GPU ou CPU).
        gradient_accumulation_steps = 4,  # Acumula gradientes por este número de passos antes de fazer o backward.
        warmup_steps = 5,  # Número de passos iniciais de "aquecimento" para ajustar a taxa de aprendizado.
        # max_steps = 60,
        num_train_epochs=1,
        learning_rate = 2e-4,  # Taxa de aprendizado para o otimizador.
        fp16 = not is_bfloat16_supported(),  # Usa ponto flutuante de 16 bits se BF16 não for suportado.
        bf16 = is_bfloat16_supported(),  # Usa BF16 se suportado pelo hardware (NVIDIA A100, por exemplo).
        logging_steps = 1,  # Frequência de logging (a cada N passos).
        optim = "adamw_8bit",  # Otimizador usado; "adamw_8bit" reduz a memória ocupada.
        weight_decay = 0.01,  # Taxa de decaimento de peso para regularização.
        lr_scheduler_type = "linear",  # Agendador da taxa de aprendizado; linearmente decai ao longo do treinamento.
        seed = 42,  # Semente aleatória para reprodutibilidade.
        output_dir = "outputs",  # Diretório onde os checkpoints e logs serão salvos.
    ),
)

Map (num_proc=2):   0%|          | 0/64636 [00:00<?, ? examples/s]

In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.83 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 64,636 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 8,079
 "-____-"     Number of trainable parameters = 50,003,968
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
1,2.4892
2,2.6672
3,2.2138
4,2.3516
5,1.8875
6,1.6174
7,1.5127
8,1.4106
9,1.3769
10,1.3615


Step,Training Loss
1,2.4892
2,2.6672
3,2.2138
4,2.3516
5,1.8875
6,1.6174
7,1.5127
8,1.4106
9,1.3769
10,1.3615


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

43379.6057 seconds used for training.
722.99 minutes used for training.
Peak reserved memory = 7.078 GB.
Peak reserved memory for training = 1.248 GB.
Peak reserved memory % of max memory = 47.993 %.
Peak reserved memory for training % of max memory = 8.462 %.


### Inferência
Vamos executar o modelo! Você pode alterar a instrução e a entrada - deixe a saída em branco!

In [None]:
prompt = """<|im_start|>system
I'm overly excited today because will be flying outside the country for the first time tomorrow.
<|im_end|>
<|im_start|>user
Traveling to South Africa then to Ghana. Also my first time visiting Africa.
<|im_end|>
<|im_start|>assistant"""

In [None]:
FastLanguageModel.for_inference(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GemmaForCausalLM(
      (model): GemmaModel(
        (embed_tokens): Embedding(256000, 3072, padding_idx=0)
        (layers): ModuleList(
          (0-27): 28 x GemmaDecoderLayer(
            (self_attn): GemmaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=3072, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=3072, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Lin

In [None]:
inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 256, use_cache = True)
tokenizer.batch_decode(outputs)

["<bos><|im_start|>system\nI'm overly excited today because will be flying outside the country for the first time tomorrow.\n<|im_end|>\n<|im_start|>user\nTraveling to South Africa then to Ghana. Also my first time visiting Africa.\n<|im_end|>\n<|im_start|>assistant\nWow, that's a long flight. I hope you have a good time. \n<|im_end|><eos>"]

Você também pode usar um `TextStreamer` para inferência contínua - assim você pode ver a geração token por token, em vez de esperar o tempo todo!

In [None]:
inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

<bos><|im_start|>system
I'm overly excited today because will be flying outside the country for the first time tomorrow.
<|im_end|>
<|im_start|>user
Traveling to South Africa then to Ghana. Also my first time visiting Africa.
<|im_end|>
<|im_start|>assistant
Wow, that's a long flight. I hope you have a good time. 
<|im_end|><eos>


### Saving, loading finetuned models

In [None]:
# Run this cell and introduce your token and username,
# or else you won't be able to save on HuggingFace
# (write both between the "" symbols).
from huggingface_hub import login

username = "MathMuniz"
your_token = ""
login(your_token)

Para salvar o modelo final como adaptadores LoRA, use `push_to_hub` do Huggingface para um salvamento online ou `save_pretrained` para um salvamento local.

Seu token HuggingFace pode ser encontrado [aqui](https://huggingface.co/settings/tokens).

Se você quiser salvar apenas localmente, remova o # no início da primeira linha e adicione um # no início da segunda linha.

Se você quiser salvar apenas na nuvem, execute como está.

Se você quiser salvar localmente e na nuvem, remova o # no início da primeira linha.

Agora, se você quiser carregar os adaptadores LoRA que acabamos de salvar para inferência, defina `False` como `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "Empathic", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )

inputs = tokenizer([prompt], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 256, use_cache = True)
tokenizer.batch_decode(outputs)

["<bos><|im_start|>system\nI'm overly excited today because will be flying outside the country for the first time tomorrow.\n<|im_end|>\n<|im_start|>user\nTraveling to South Africa then to Ghana. Also my first time visiting Africa.\n<|im_end|>\n<|im_start|>assistant\nWow, that's a long flight. I hope you have a good time. \n<|im_end|><eos>"]

### Salvando em float16 para VLLM

Também oferecemos suporte para salvar em `float16` diretamente. Selecione `merged_16bit` para float16 ou `merged_4bit` para int4. Também permitimos adaptadores `lora` como fallback. Use `push_to_hub_merged` para fazer upload para sua conta Hugging Face! Você pode ir para https://huggingface.co/settings/tokens para seus tokens pessoais.

Para escolher uma opção (ou opções), altere o `False` correspondente para `True`. O mesmo que eu disse acima se aplica aqui para escolher o nome do modelo que aparecerá no HuggingFace.

`save_pretrained_merged` salva o modelo localmente. `push_to_hub_merged` carrega o modelo para o HuggingFace.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if True: model.push_to_hub_merged(f"{username}/G-Empathic_16bit", tokenizer, save_method = "merged_16bit", token = your_token)

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged(f"{username}/G-Empathic_4bit", tokenizer, save_method = "merged_4bit", token = your_token)

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if True: model.push_to_hub_merged(f"{username}/G-Empathic_Lora", tokenizer, save_method = "lora", token = your_token)

Unsloth: You are pushing to hub, but you passed your HF username = MathMuniz.
We shall truncate MathMuniz/G-Empathic_16bit to G-Empathic_16bit
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.6G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 28.2 out of 50.99 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 32%|███▏      | 9/28 [00:00<00:01, 11.14it/s]
We will save to Disk and not RAM now.
100%|██████████| 28/28 [00:27<00:00,  1.00it/s]


Unsloth: Saving tokenizer...

  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/34.4M [00:00<?, ?B/s]

 Done.


README.md:   0%|          | 0.00/572 [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/2.11G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/MathMuniz/G-Empathic_16bit
Unsloth: Saving LoRA adapters. Please wait...


README.md:   0%|          | 0.00/572 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/200M [00:00<?, ?B/s]

  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/34.4M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/MathMuniz/G-Empathic_Lora


### Conversão GGUF / llama.cpp
Para salvar em `GGUF` / `llama.cpp`, agora oferecemos suporte nativo! Clonamos `llama.cpp` e salvamos como padrão em `q8_0`. Permitimos todos os métodos como `q4_k_m`. Use `save_pretrained_gguf` para salvar localmente e `push_to_hub_gguf` para fazer upload para HF.

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf(f"{username}/G-Empathic_8bit_Q8_0", tokenizer, token = your_token)

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf(f"{username}/G-Empathic_16bit_GGUF", tokenizer, quantization_method = "f16", token = your_token)

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf(f"{username}/G-Empathic_q4_k_m_GGUF", tokenizer, quantization_method = "q4_k_m", token = your_token)

Agora, use o arquivo `model-unsloth.gguf` ou o arquivo `model-unsloth-Q4_K_M.gguf` em `llama.cpp` ou um sistema baseado em UI como `GPT4All`. Você pode instalar o GPT4All indo [aqui](https://gpt4all.io/index.html).

### [GIE-Bench](https://github.com/GIEBench/GIEBench)