# Fine-Tuning de un Modelo de Lenguaje en Español (Guiones IA)

Este notebook entrena un modelo de lenguaje (`spanish-gpt2`) usando guiones personalizados.

## 📦 Paso 1: Instalar Dependencias

In [None]:
!pip install transformers datasets accelerate

## 📁 Paso 2: Subir tu Dataset `.jsonl`

In [None]:
from google.colab import files
uploaded = files.upload()  # Cargar archivo guiones_ejemplo.jsonl

## 📚 Paso 3: Preparar Dataset

In [None]:
from datasets import load_dataset
dataset = load_dataset("json", data_files="guiones_ejemplo.jsonl", split="train")

## 🤖 Paso 4: Cargar Modelo y Tokenizer (con pad_token y resize de embeddings)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "mrm8488/spanish-gpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # Añadir pad_token

model = AutoModelForCausalLM.from_pretrained(model_name)
model.resize_token_embeddings(len(tokenizer))  # Ajustar embeddings al nuevo vocabulario

## ⚙️ Paso 5: Tokenizar

In [None]:
def tokenize_function(example):
    return tokenizer(example["text"], truncation=True, padding="max_length", max_length=128)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

## 🔁 Paso 6: Entrenar

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./guion-model",
    per_device_train_batch_size=2,
    num_train_epochs=3,
    logging_steps=10,
    save_steps=100,
    save_total_limit=1,
    eval_strategy="no",  # actualizado
    report_to="none"
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

trainer.train()

## 💾 Paso 7: Guardar el Modelo Entrenado en Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

model.save_pretrained("/content/drive/MyDrive/guion-model")
tokenizer.save_pretrained("/content/drive/MyDrive/guion-model")

✅ ¡Listo! Ahora puedes usar este modelo en tu API Flask o subirlo a Hugging Face.