## Para ejecutar el código
Es necesario generar un token de HuggingFace y guardarlo como secreto llamado `HF_TOKEN`

https://huggingface.co/docs/hub/security-tokens

A continuación instrucciones desde https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb#scrollTo=qRFQnPL2a9Dj

### Authenticate with Hugging Face

Generate a Hugging Face `read` access token by going to [settings](https://huggingface.co/settings/tokens).

If you are using Google Colab, add your access token to the Colab Secrets manager to securely store it. If not, proceed to run the cell below to authenticate with Hugging Face.

1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src="https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg" alt="The Secrets tab is found on the left panel." width=50%>
2. Create a new secret with the name `HF_TOKEN`.
3. Copy/paste your token key into the Value input box of `HF_TOKEN`.
4. Toggle the button on the left to allow notebook access to the secret.

## Disclaimer:

El código se corrió usando una GPU L4 y esta optimizado para tal, se recomienda utilizar el mismo entorno para su ejecución

In [None]:
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128

env: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128


In [None]:
!pip install -q transformers datasets peft trl accelerate bitsandbytes

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/376.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m376.2/376.2 kB[0m [31m25.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m32.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m104.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m82.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from datasets import load_dataset

DATASET_NAME = "dserranog/fewshot-narrative-examples"

dataset = load_dataset(DATASET_NAME)
dataset

README.md:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

epic_narrative_examples.json: 0.00B [00:00, ?B/s]

noir_narrative_examples.json: 0.00B [00:00, ?B/s]

sci_fi_narrative_examples.json: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/45 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['original', 'rewritten', 'style', 'source'],
        num_rows: 45
    })
})

In [None]:
import pandas as pd

for split in dataset:
    print(f"\n📚 {split.upper()}")
    df = pd.DataFrame(dataset[split])
    display(df.sample(3))


📚 TRAIN


Unnamed: 0,original,rewritten,style,source
42,"In the encyclopedia, his record was small.","In the Galactic Encyclopedia, his name would b...",sci-fi,Foundation (Isaac Asimov)
30,Mia's rating fell after her comment online.,Mia's social credit score plummeted after her ...,sci-fi,Black Mirror
38,Everything around her was made by machines.,"In a world where everything was artificial, fe...",sci-fi,Blade Runner


In [None]:
from google.colab import userdata
from huggingface_hub import login
TOKEN_NAME = "HF_TOKEN"
hf_token = userdata.get(TOKEN_NAME)
if hf_token:
    login(token=hf_token)
    print("Successfully logged in to Hugging Face!")
else:
    print("Hugging Face token not found in Colab Secrets.")

Successfully logged in to Hugging Face!


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
import torch
import gc

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.3"

# Tokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

def get_lora_model(base_model=False):
    torch.cuda.empty_cache()
    gc.collect()

    bnb_config = BitsAndBytesConfig(
        load_in_8bit=True,
        llm_int8_threshold=6.0,
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",          # shards across GPU/CPU
        torch_dtype=torch.float16,  # internal compute in fp16
        low_cpu_mem_usage=True,
    )
    model.config.pad_token_id = tokenizer.pad_token_id # updating model config
    if base_model:
        return model
    # r: rank dimension for LoRA update matrices (smaller = more compression)
    rank_dimension = 8
    # lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
    lora_alpha = 16
    # lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
    lora_dropout = 0.05

    # LoRa config
    lora_config = LoraConfig(
        r=rank_dimension, # Rank dimension - typically between 4-32
        lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank
        target_modules=["q_proj", "v_proj"],  # Which modules to apply LoRA to
        lora_dropout=lora_dropout, # Dropout probability for LoRA layers
        bias="none", # Bias type for LoRA. the corresponding biases will be updated during training.
        task_type="CAUSAL_LM", # Task type for model architecture
    )
    return get_peft_model(model, lora_config)

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
dataset = load_dataset(DATASET_NAME, split="train")
dataset.to_pandas().head()

Unnamed: 0,original,rewritten,style,source
0,Winter was coming soon.,Winter was not merely a season; it was a dark ...,epic,Game of Thrones
1,There were ravens flying around the old tower.,"Ravens circled the ruined tower, bearing secre...",epic,Game of Thrones
2,The throne room was very quiet.,"In the throne room, silence weighed heavier th...",epic,Game of Thrones
3,She walked through the battlefield.,She walked among the dead with the gaze of som...,epic,Game of Thrones
4,Fire and ice met in battle.,"When fire and ice collide, only the ancient go...",epic,Game of Thrones


In [None]:
from transformers import DataCollatorForLanguageModeling

# 1. Build prompt & response from your curated fields
def format_prompt(example):
    original = example["original"]
    rewritten = example["rewritten"]
    # insert the style dynamically
    example["prompt"]   = f"Rewrite the following text in {example['style']} style: '{original}'"
    example["response"] = rewritten
    return example

# apply to only the noir subset
styles_dataset = dataset.map(
    format_prompt,
    remove_columns=["original", "rewritten", "style", "source"]
)

# 2. Prepare the collator (causal LM, so mlm=False)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
    pad_to_multiple_of=8,  # Optional: pad to multiple of 8 for tensor cores
)

# 3. Tokenize prompt + response into input_ids / attention_mask

def tokenize_function_with_eos(example):
    # Construct the full text with proper Mistral format
    full_text = f"<s>[INST]{example['prompt'].strip()}[/INST]{example['response'].strip()}</s>"
    print(full_text)

    # Tokenize the full text
    tokenized_full = tokenizer(
        full_text,
        truncation=True,
        padding=False,  # Let the data collator handle padding
        max_length=512,
        return_tensors=None  # Return lists, not tensors
    )

    # Create labels (copy of input_ids)
    tokenized_full["labels"] = tokenized_full["input_ids"].copy()

    # Find where the response starts by tokenizing just the prompt part
    prompt_part = f"<s>[INST]{example['prompt'].strip()}[/INST]"
    tokenized_prompt = tokenizer(
        prompt_part,
        truncation=True,
        max_length=512,
        return_tensors=None
    )
    prompt_length = len(tokenized_prompt["input_ids"])

    # Mask the prompt tokens in labels (we only want to learn from the response)
    tokenized_full["labels"][:prompt_length] = [-100] * prompt_length

    return tokenized_full

tokenized_styles_dataset = styles_dataset.map(
    tokenize_function_with_eos,
    remove_columns=["prompt", "response"]
)

Map:   0%|          | 0/45 [00:00<?, ? examples/s]

Map:   0%|          | 0/45 [00:00<?, ? examples/s]

<s>[INST]Rewrite the following text in epic style: 'Winter was coming soon.'[/INST]Winter was not merely a season; it was a dark promise looming over the kingdoms.</s>
<s>[INST]Rewrite the following text in epic style: 'There were ravens flying around the old tower.'[/INST]Ravens circled the ruined tower, bearing secrets that no one dared recall.</s>
<s>[INST]Rewrite the following text in epic style: 'The throne room was very quiet.'[/INST]In the throne room, silence weighed heavier than Valyrian steel.</s>
<s>[INST]Rewrite the following text in epic style: 'She walked through the battlefield.'[/INST]She walked among the dead with the gaze of someone who feared losing nothing.</s>
<s>[INST]Rewrite the following text in epic style: 'Fire and ice met in battle.'[/INST]When fire and ice collide, only the ancient gods can judge the world's fate.</s>
<s>[INST]Rewrite the following text in epic style: 'The moonlight shone on the hills.'[/INST]The moonlight spilled over the hills where the st

In [None]:
from peft import LoraConfig, get_peft_model
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

lora_model_noir_eos = get_lora_model()
lora_model_noir_eos.print_trainable_parameters()

training_args = TrainingArguments(
    output_dir="/content/lora_noir_outputs",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    logging_steps=10,
    save_strategy="no",
    report_to="none"
)
trainer = Trainer(
    model=lora_model_noir_eos,
    train_dataset=tokenized_styles_dataset,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=data_collator
)

trainer.train()

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


trainable params: 3,407,872 || all params: 7,251,431,424 || trainable%: 0.0470


Step,Training Loss
10,5.0423
20,4.167
30,3.7837


TrainOutput(global_step=36, training_loss=4.233365535736084, metrics={'train_runtime': 87.2314, 'train_samples_per_second': 1.548, 'train_steps_per_second': 0.413, 'total_flos': 272617753411584.0, 'train_loss': 4.233365535736084, 'epoch': 3.0})

In [None]:
from transformers import pipeline

generator_styles = pipeline(
    "text-generation",
    model=lora_model_noir_eos,
    tokenizer=tokenizer,
)

# Format the prompt correctly with [INST] tokens
prompt = "[INST]Rewrite the following text in noir style (keep it concise): 'The phone rang at midnight.'[/INST]"

print("=== Generation after 3 epochs ===\n")
for i in range(4):
    output = generator_styles(
        prompt,
        max_new_tokens=300,
        do_sample=True,
        temperature=0.55,
        top_p=0.78,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    generated_text = output[0]["generated_text"]

    # Extract just the response part (after [/INST])
    response = generated_text.split("[/INST]")[-1].strip()
    print(f"📝 Example {i+1}:\n{response}\n{'-'*60}\n")

Device set to use cuda:0


=== Generation after 3 epochs ===

📝 Example 1:
The phone jangled like a jailhouse door at midnight. I was in the middle of a dream, but the sound was too real. I reached for the receiver, my hand trembling. A voice, low and gravelly, whispered, "Meet me at the corner of the alley, midnight sharp. Bring the package." I hung up, the silence echoing like a funeral bell. The game was on.
------------------------------------------------------------

📝 Example 2:
The phone shrilled like a banshee's wail at midnight, piercing the silence of my darkened room. I sat up, the whiskey in my glass sloshing over the edge. I knew that sound, knew what it meant. It was a call I didn't want, a call that would bring trouble. I took a long drag on my cigarette, the smoke curling in the dim light, and picked up the receiver.
------------------------------------------------------------

📝 Example 3:
The phone shrilled at midnight, a mournful wail that pierced the silence of my apartment. I knew it was tro

In [None]:
# Format the prompt correctly with [INST] tokens
prompt = "[INST]Rewrite the following text in epic style (keep it concise): 'The phone rang at midnight.'[/INST]"

print("=== Generation after 3 epochs ===\n")
for i in range(4):
    output = generator_styles(
        prompt,
        max_new_tokens=300,
        do_sample=True,
        temperature=0.55,
        top_p=0.78,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    generated_text = output[0]["generated_text"]

    # Extract just the response part (after [/INST])
    response = generated_text.split("[/INST]")[-1].strip()
    print(f"📝 Example {i+1}:\n{response}\n{'-'*60}\n")

=== Generation after 3 epochs ===

📝 Example 1:
Midnight's shroud was shattered as the phone's mournful dirge echoed through the silent halls. The air seemed to grow colder, the shadows darker, as if the very fabric of the night was conspiring to answer the call.
------------------------------------------------------------

📝 Example 2:
Midnight's hush was shattered as the phone's shrill cry pierced the stillness. A clarion call in the dead of night, it echoed through the castle's stone halls, a reminder of the world beyond the walls. The king, deep in slumber, stirred at the sound, his eyes flashing with a warrior's resolve as he rose to answer the call. The fate of his kingdom now hung in the balance, and the king would not let it fall to the hands of the enemy.
------------------------------------------------------------

📝 Example 3:
Midnight's shroud fell, as the raven's call echoed through the silent halls. Yet, the stillness was shattered, as the phone's mournful melody pierced 

In [None]:
# Format the prompt correctly with [INST] tokens
prompt = "[INST]Rewrite the following text in sci-fi style (keep it concise): 'The phone rang at midnight.'[/INST]"

print("=== Generation after 3 epochs ===\n")
for i in range(4):
    output = generator_styles(
        prompt,
        max_new_tokens=300,
        do_sample=True,
        temperature=0.55,
        top_p=0.78,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    generated_text = output[0]["generated_text"]

    # Extract just the response part (after [/INST])
    response = generated_text.split("[/INST]")[-1].strip()
    print(f"📝 Example {i+1}:\n{response}\n{'-'*60}\n")

=== Generation after 3 epochs ===

📝 Example 1:
The commlink buzzed at midnight, a discordant note in the otherwise silent space station.
------------------------------------------------------------



You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


📝 Example 2:
The commlink buzzed at midnight, a discordant note in the silent vacuum of space.
------------------------------------------------------------

📝 Example 3:
The comm-link buzzed at midnight, a dissonant tone in the silent space of the capsule. The astronaut, floating in the zero-gravity, reached for it, his heart pounding. The voice on the other end was not his. "Your coordinates have been intercepted," it said, "Prepare for extraction."
------------------------------------------------------------

📝 Example 4:
Midnight struck, and my communicator buzzed. The screen flickered to life, revealing an unfamiliar face. "Agent," the voice said, "we've found something." The stars outside my window seemed to shift, as if the universe itself was whispering a secret. I grabbed my coat and stepped into the unknown.
------------------------------------------------------------



In [None]:
def generate_multi_style_examples(
    model,
    tokenizer,
    styles,
    base_text="The phone rang at midnight.",
    examples_per_style=4,
    max_new_tokens=300,
    temperature=0.9,
    top_p=0.75
):
    """
    Generate examples for multiple styles efficiently using batch processing.

    Args:
        model: The fine-tuned model
        tokenizer: The tokenizer
        styles: List of style names (e.g., ['sci-fi', 'epic', 'romantic'])
        base_text: The text to rewrite
        examples_per_style: Number of examples to generate per style
        max_new_tokens: Maximum tokens to generate
        temperature: Sampling temperature
        top_p: Top-p sampling parameter
    """
    from transformers import pipeline
    import torch

    # Create all prompts for batch processing
    all_prompts = []
    style_labels = []

    for style in styles:
        for i in range(examples_per_style):
            prompt = f"[INST]Rewrite the following text in {style} style *one concise sentence*: '{base_text}'[/INST]"
            all_prompts.append(prompt)
            style_labels.append((style, i + 1))

    # Create the pipeline
    generator = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        batch_size=4,  # Process 4 prompts at once
    )

    # Generate all examples in batches
    print("🚀 Generating examples for all styles...\n")

    # Pass the list of prompts directly to the pipeline
    results = generator(
        all_prompts,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature,
        top_p=top_p,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

    # Process and display results by style
    result_idx = 0
    for style in styles:
        print(f"🎨 === {style.upper()} STYLE ===\n")

        for example_num in range(examples_per_style):
            generated_text = results[result_idx][0]["generated_text"]

            # Extract just the response part (after [/INST])
            response = generated_text.split("[/INST]")[-1].strip()

            print(f"📝 Example {example_num + 1}:")
            print(f"{response}")
            print(f"{'-' * 60}\n")

            result_idx += 1

        print(f"{'=' * 60}\n")

In [None]:
generate_multi_style_examples(
    model=lora_model_noir_eos,
    tokenizer=tokenizer,
    styles=['sci-fi', 'epic', 'noir'],
    base_text="The phone rang at midnight.",
    examples_per_style=4,
    temperature=0.1,
    top_p=0.5)

Device set to use cuda:0


🚀 Generating examples for all styles...

🎨 === SCI-FI STYLE ===

📝 Example 1:
The commlink buzzed at midnight, a signal that could only mean trouble.
------------------------------------------------------------

📝 Example 2:
The midnight hour chimed as the phone's eerie glow pierced the darkness.
------------------------------------------------------------

📝 Example 3:
The commlink buzzed at midnight, a signal that could only mean trouble.
------------------------------------------------------------

📝 Example 4:
My communicator buzzed at midnight, a signal that could only mean trouble.
------------------------------------------------------------


🎨 === EPIC STYLE ===

📝 Example 1:
Midnight's silence was shattered by the phone's insistent call, a discordant note in the symphony of the night.
------------------------------------------------------------

📝 Example 2:
Midnight's silence was shattered by the phone's insistent call.
--------------------------------------------------------