## Para ejecutar el código
Es necesario generar un token de HuggingFace y guardarlo como secreto llamado `HF_TOKEN`

https://huggingface.co/docs/hub/security-tokens

A continuación instrucciones desde https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb#scrollTo=qRFQnPL2a9Dj

### Authenticate with Hugging Face

Generate a Hugging Face `read` access token by going to [settings](https://huggingface.co/settings/tokens).

If you are using Google Colab, add your access token to the Colab Secrets manager to securely store it. If not, proceed to run the cell below to authenticate with Hugging Face.

1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src="https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg" alt="The Secrets tab is found on the left panel." width=50%>
2. Create a new secret with the name `HF_TOKEN`.
3. Copy/paste your token key into the Value input box of `HF_TOKEN`.
4. Toggle the button on the left to allow notebook access to the secret.

## Obtener acceso al modelo

Posteriormente ir a https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/tree/main habiendo iniciado sesión con la misma cuenta que se usó para generar el token, y aceptar los términos y condiciones.


Listo! Eso es todo lo necesario para correr el código.

## Disclaimer:

El código se corrió usando una GPU L4 y esta optimizado para tal, se recomienda utilizar el mismo entorno para su ejecución

In [None]:
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128

env: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128


In [None]:
!pip install -q transformers datasets peft trl accelerate bitsandbytes

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/376.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m376.2/376.2 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m38.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m106.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m93.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from datasets import load_dataset

DATASET_NAME = "dserranog/fewshot-narrative-examples"

dataset = load_dataset(DATASET_NAME)
dataset

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


README.md:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

epic_narrative_examples.json: 0.00B [00:00, ?B/s]

noir_narrative_examples.json: 0.00B [00:00, ?B/s]

sci_fi_narrative_examples.json: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/45 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['original', 'rewritten', 'style', 'source'],
        num_rows: 45
    })
})

In [None]:
import pandas as pd

for split in dataset:
    print(f"\n📚 {split.upper()}")
    df = pd.DataFrame(dataset[split])
    display(df.sample(3))


📚 TRAIN


Unnamed: 0,original,rewritten,style,source
43,The ship entered faster-than-light travel.,"The ship jumped into hyperspace, carrying the ...",sci-fi,Foundation (Isaac Asimov)
33,Terms of service were long and ignored.,Reality was just another terms-of-service nobo...,sci-fi,Black Mirror
37,Her eyes looked human to the eye.,"Her eyes looked human enough, but the scanner ...",sci-fi,Blade Runner


In [None]:
from google.colab import userdata
from huggingface_hub import login
TOKEN_NAME = "HF_TOKEN"
hf_token = userdata.get(TOKEN_NAME)
if hf_token:
    login(token=hf_token)
    print("Successfully logged in to Hugging Face!")
else:
    print("Hugging Face token not found in Colab Secrets.")

Successfully logged in to Hugging Face!


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
import torch
import gc

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"

# Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

def get_lora_model(base_model=False):
    torch.cuda.empty_cache()
    gc.collect()

    bnb_config = BitsAndBytesConfig(
        load_in_8bit=True,
        llm_int8_threshold=6.0,
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",          # shards across GPU/CPU
        torch_dtype=torch.float16,  # internal compute in fp16
        low_cpu_mem_usage=True,
    )
    model.config.pad_token_id = tokenizer.pad_token_id # updating model config
    if base_model:
        return model
    # r: rank dimension for LoRA update matrices (smaller = more compression)
    rank_dimension = 8
    # lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
    lora_alpha = 16
    # lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
    lora_dropout = 0.05

    # LoRa config
    lora_config = LoraConfig(
        r=rank_dimension, # Rank dimension - typically between 4-32
        lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank
        target_modules=["q_proj", "v_proj"],  # Which modules to apply LoRA to
        lora_dropout=lora_dropout, # Dropout probability for LoRA layers
        bias="none", # Bias type for LoRA. the corresponding biases will be updated during training.
        task_type="CAUSAL_LM", # Task type for model architecture
    )
    return get_peft_model(model, lora_config)

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
dataset = load_dataset(DATASET_NAME, split="train")
scifi_dataset = dataset.filter(lambda example: example["style"] == "sci-fi")
scifi_dataset.to_pandas().head()

Filter:   0%|          | 0/45 [00:00<?, ? examples/s]

Unnamed: 0,original,rewritten,style,source
0,Mia's rating fell after her comment online.,Mia's social credit score plummeted after her ...,sci-fi,Black Mirror
1,The program learned what she wanted.,The algorithm knew her desires better than she...,sci-fi,Black Mirror
2,Every day she woke up in a new environment.,"Each morning she awoke in a new simulation, wi...",sci-fi,Black Mirror
3,Terms of service were long and ignored.,Reality was just another terms-of-service nobo...,sci-fi,Black Mirror
4,The house was quiet except for the computer.,The house was silent except for the constant h...,sci-fi,Black Mirror


In [None]:
from transformers import DataCollatorForLanguageModeling

scifi_dataset = dataset.filter(lambda example: example["style"] == "sci-fi")

# 1. Build prompt & response from your curated fields
def format_prompt(example):
    original = example["original"]
    rewritten = example["rewritten"]
    # insert the style dynamically
    example["prompt"]   = f"Rewrite the following text in {example['style']} style: '{original}'"
    example["response"] = rewritten
    return example

# apply to only the scifi subset
scifi_dataset = scifi_dataset.map(
    format_prompt,
    remove_columns=["original", "rewritten", "style", "source"]
)

# 2. Prepare the collator (causal LM, so mlm=False)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
    pad_to_multiple_of=8,  # Optional: pad to multiple of 8 for tensor cores
)

# 3. Tokenize prompt + response into input_ids / attention_mask

def tokenize_function_with_eos(example):
    # Construct the full text with proper Mistral format
    full_text = f"<s>[INST]{example['prompt'].strip()}[/INST]{example['response'].strip()}</s>"

    # Tokenize the full text
    tokenized_full = tokenizer(
        full_text,
        truncation=True,
        padding=False,  # Let the data collator handle padding
        max_length=512,
        return_tensors=None  # Return lists, not tensors
    )

    # Create labels (copy of input_ids)
    tokenized_full["labels"] = tokenized_full["input_ids"].copy()

    # Find where the response starts by tokenizing just the prompt part
    prompt_part = f"<s>[INST]{example['prompt'].strip()}[/INST]"
    tokenized_prompt = tokenizer(
        prompt_part,
        truncation=True,
        max_length=512,
        return_tensors=None
    )
    prompt_length = len(tokenized_prompt["input_ids"])

    # Mask the prompt tokens in labels (we only want to learn from the response)
    tokenized_full["labels"][:prompt_length] = [-100] * prompt_length

    return tokenized_full

tokenized_scifi_dataset = scifi_dataset.map(
    tokenize_function_with_eos,
    remove_columns=["prompt", "response"]
)

Map:   0%|          | 0/15 [00:00<?, ? examples/s]

Map:   0%|          | 0/15 [00:00<?, ? examples/s]

In [None]:
from peft import LoraConfig, get_peft_model
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

lora_model_scifi_eos = get_lora_model()
lora_model_scifi_eos.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/lora_scifi_outputs",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    logging_steps=10,
    save_strategy="no",
    report_to="none"
)
trainer = Trainer(
    model=lora_model_scifi_eos,
    train_dataset=tokenized_scifi_dataset,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=data_collator
)

trainer.train()

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 3,407,872 || all params: 7,251,431,424 || trainable%: 0.0470


Step,Training Loss
10,5.126


TrainOutput(global_step=12, training_loss=5.013524452845256, metrics={'train_runtime': 29.9009, 'train_samples_per_second': 1.505, 'train_steps_per_second': 0.401, 'total_flos': 93263968272384.0, 'train_loss': 5.013524452845256, 'epoch': 3.0})

In [None]:
from transformers import pipeline

generator_scifi = pipeline(
    "text-generation",
    model=lora_model_scifi_eos,
    tokenizer=tokenizer,
)

# Format the prompt correctly with [INST] tokens
prompt = "[INST]Rewrite the following text in scifi style: 'The phone rang at midnight.'[/INST]"

print("=== Generation after 3 epochs ===\n")
for i in range(4):
    output = generator_scifi(
        prompt,
        max_new_tokens=300,
        do_sample=True,
        temperature=0.67,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    generated_text = output[0]["generated_text"]

    # Extract just the response part (after [/INST])
    response = generated_text.split("[/INST]")[-1].strip()
    print(f"📝 Example {i+1}:\n{response}\n{'-'*60}\n")

Device set to use cuda:0


=== Generation after 3 epochs ===

📝 Example 1:
The comm-link buzzed at the dead of night, its eerie glow casting long shadows across the cold, metal floor. The silence was shattered, the only sound in the darkened room being the soft, rhythmic hum of the life-support systems. The astronaut, half-asleep, reached for the device, his gloved hand trembling slightly as he answered the call. "This is Captain Stanton, speaking." A voice, cold and mechanical, responded from the other end. "Captain, we have detected an anomaly. Your immediate presence is required at Sector 37." The astronaut's heart skipped a beat. "Understood," he replied, his voice steady despite the sudden surge of adrenaline. "I will be there as soon as possible." He disconnected the call, the gravity of the situation slowly sinking in. The phone had rung at midnight, and it seemed the universe was about to reveal another of its secrets.
------------------------------------------------------------

📝 Example 2:
The comm-li

Had to tone down `temperature` a bit because the model was getting too creative, and generating too much sometimes