## Para ejecutar el código
Es necesario generar un token de HuggingFace y guardarlo como secreto llamado `HF_TOKEN`

https://huggingface.co/docs/hub/security-tokens

A continuación instrucciones desde https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb#scrollTo=qRFQnPL2a9Dj

### Authenticate with Hugging Face

Generate a Hugging Face `read` access token by going to [settings](https://huggingface.co/settings/tokens).

If you are using Google Colab, add your access token to the Colab Secrets manager to securely store it. If not, proceed to run the cell below to authenticate with Hugging Face.

1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src="https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg" alt="The Secrets tab is found on the left panel." width=50%>
2. Create a new secret with the name `HF_TOKEN`.
3. Copy/paste your token key into the Value input box of `HF_TOKEN`.
4. Toggle the button on the left to allow notebook access to the secret.

## Obtener acceso al modelo

Posteriormente ir a https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/tree/main habiendo iniciado sesión con la misma cuenta que se usó para generar el token, y aceptar los términos y condiciones.


Listo! Eso es todo lo necesario para correr el código.

## Disclaimer:

El código se corrió usando una GPU L4 y esta optimizado para tal, se recomienda utilizar el mismo entorno para su ejecución

In [None]:
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128

env: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128


In [None]:
!pip install -q transformers datasets peft trl accelerate bitsandbytes

In [None]:
from datasets import load_dataset

DATASET_NAME = "dserranog/fewshot-narrative-examples"

dataset = load_dataset(DATASET_NAME)
dataset

DatasetDict({
    train: Dataset({
        features: ['original', 'rewritten', 'style', 'source'],
        num_rows: 45
    })
})

In [None]:
import pandas as pd

for split in dataset:
    print(f"\n📚 {split.upper()}")
    df = pd.DataFrame(dataset[split])
    display(df.sample(3))


📚 TRAIN


Unnamed: 0,original,rewritten,style,source
33,Terms of service were long and ignored.,Reality was just another terms-of-service nobo...,sci-fi,Black Mirror
10,His sword had blood on it at dawn.,The monster's blood still dripped from his bla...,epic,The Witcher
29,Everything came down to greed in the end.,"In the end, it was greed that wrote the final ...",noir,Double Indemnity


In [None]:
from google.colab import userdata
from huggingface_hub import login
TOKEN_NAME = "HF_TOKEN"
hf_token = userdata.get(TOKEN_NAME)
if hf_token:
    login(token=hf_token)
    print("Successfully logged in to Hugging Face!")
else:
    print("Hugging Face token not found in Colab Secrets.")

Successfully logged in to Hugging Face!


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
import torch
import gc

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"

# Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

def get_lora_model(base_model=False):
    torch.cuda.empty_cache()
    gc.collect()

    bnb_config = BitsAndBytesConfig(
        load_in_8bit=True,
        llm_int8_threshold=6.0,
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",          # shards across GPU/CPU
        torch_dtype=torch.float16,  # internal compute in fp16
        low_cpu_mem_usage=True,
    )
    model.config.pad_token_id = tokenizer.pad_token_id # updating model config
    if base_model:
        return model
    # r: rank dimension for LoRA update matrices (smaller = more compression)
    rank_dimension = 8
    # lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
    lora_alpha = 16
    # lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
    lora_dropout = 0.05

    # LoRa config
    lora_config = LoraConfig(
        r=rank_dimension, # Rank dimension - typically between 4-32
        lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank
        target_modules=["q_proj", "v_proj"],  # Which modules to apply LoRA to
        lora_dropout=lora_dropout, # Dropout probability for LoRA layers
        bias="none", # Bias type for LoRA. the corresponding biases will be updated during training.
        task_type="CAUSAL_LM", # Task type for model architecture
    )
    return get_peft_model(model, lora_config)

In [None]:
dataset = load_dataset(DATASET_NAME, split="train")
epic_dataset = dataset.filter(lambda example: example["style"] == "epic")
epic_dataset.to_pandas().head()

Unnamed: 0,original,rewritten,style,source
0,Winter was coming soon.,Winter was not merely a season; it was a dark ...,epic,Game of Thrones
1,There were ravens flying around the old tower.,"Ravens circled the ruined tower, bearing secre...",epic,Game of Thrones
2,The throne room was very quiet.,"In the throne room, silence weighed heavier th...",epic,Game of Thrones
3,She walked through the battlefield.,She walked among the dead with the gaze of som...,epic,Game of Thrones
4,Fire and ice met in battle.,"When fire and ice collide, only the ancient go...",epic,Game of Thrones


In [None]:
from transformers import DataCollatorForLanguageModeling

epic_dataset = dataset.filter(lambda example: example["style"] == "epic")

# 1. Build prompt & response from your curated fields
def format_prompt(example):
    original = example["original"]
    rewritten = example["rewritten"]
    # insert the style dynamically
    example["prompt"]   = f"Rewrite the following text in {example['style']} style: '{original}'"
    example["response"] = rewritten
    return example

# apply to only the epic subset
epic_dataset = epic_dataset.map(
    format_prompt,
    remove_columns=["original", "rewritten", "style", "source"]
)

# 2. Prepare the collator (causal LM, so mlm=False)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
    pad_to_multiple_of=8,  # Optional: pad to multiple of 8 for tensor cores
)

# 3. Tokenize prompt + response into input_ids / attention_mask

def tokenize_function_with_eos(example):
    # Construct the full text with proper Mistral format
    full_text = f"<s>[INST]{example['prompt'].strip()}[/INST]{example['response'].strip()}</s>"

    # Tokenize the full text
    tokenized_full = tokenizer(
        full_text,
        truncation=True,
        padding=False,  # Let the data collator handle padding
        max_length=512,
        return_tensors=None  # Return lists, not tensors
    )

    # Create labels (copy of input_ids)
    tokenized_full["labels"] = tokenized_full["input_ids"].copy()

    # Find where the response starts by tokenizing just the prompt part
    prompt_part = f"<s>[INST]{example['prompt'].strip()}[/INST]"
    tokenized_prompt = tokenizer(
        prompt_part,
        truncation=True,
        max_length=512,
        return_tensors=None
    )
    prompt_length = len(tokenized_prompt["input_ids"])

    # Mask the prompt tokens in labels (we only want to learn from the response)
    tokenized_full["labels"][:prompt_length] = [-100] * prompt_length

    return tokenized_full

tokenized_epic_dataset = epic_dataset.map(
    tokenize_function_with_eos,
    remove_columns=["prompt", "response"]
)

In [None]:
from peft import LoraConfig, get_peft_model
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

lora_model_epic_eos = get_lora_model()
lora_model_epic_eos.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/lora_epic_outputs",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    logging_steps=10,
    save_strategy="no",
    report_to="none"
)
trainer = Trainer(
    model=lora_model_epic_eos,
    train_dataset=tokenized_epic_dataset,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=data_collator
)

trainer.train()

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 3,407,872 || all params: 7,251,431,424 || trainable%: 0.0470


Step,Training Loss
10,5.0028


TrainOutput(global_step=12, training_loss=4.949137051900228, metrics={'train_runtime': 29.6605, 'train_samples_per_second': 1.517, 'train_steps_per_second': 0.405, 'total_flos': 91214210727936.0, 'train_loss': 4.949137051900228, 'epoch': 3.0})

In [None]:
from transformers import pipeline

generator_epic = pipeline(
    "text-generation",
    model=lora_model_epic_eos,
    tokenizer=tokenizer,
)

# Format the prompt correctly with [INST] tokens
prompt = "[INST]Rewrite the following text in epic style: 'The phone rang at midnight.'[/INST]"

print("=== Generation after 3 epochs ===\n")
for i in range(4):
    output = generator_epic(
        prompt,
        max_new_tokens=300,
        do_sample=True,
        temperature=0.8,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    generated_text = output[0]["generated_text"]

    # Extract just the response part (after [/INST])
    response = generated_text.split("[/INST]")[-1].strip()
    print(f"📝 Example {i+1}:\n{response}\n{'-'*60}\n")

Device set to use cuda:0


=== Generation after 3 epochs ===

📝 Example 1:
In the dead of the night, when the moon held court over the slumbering world, there rang out a clarion call, a discordant interruption to the silence, like a war horn echoing through the stillness. The phone, a humble servant to the whims of its masters, sounded its call in the darkest hour, its ring a challenge to the quietude that had ruled for so long. The midnight hour, that time of shadows and secrets, that moment when dreams walk the earth, was shattered by this unexpected summons. The phone rang, a beacon in the darkness, a harbinger of events yet to unfold, a herald of the unknown that lay waiting beyond the veil of sleep.
------------------------------------------------------------

📝 Example 2:
In the stillness of the midnight hour, when the stars held their breath and the moon cast its silvery glow upon the slumbering earth, a sound echoed forth that could not be ignored. A call from the heavens, or so it seemed, pierced the si