
# Fine-Tuning de un Modelo de Lenguaje en Español (Guiones IA)

Este notebook entrena un modelo de lenguaje (`spanish-gpt2`) usando guiones personalizados.

---

## 📦 Paso 1: Instalar Dependencias

```python
!pip install transformers datasets accelerate
```

---

## 📁 Paso 2: Subir tu Dataset `.jsonl`

```python
from google.colab import files

uploaded = files.upload()  # Cargar archivo guiones_ejemplo.jsonl
```

---

## 📚 Paso 3: Preparar Dataset

```python
from datasets import load_dataset

dataset = load_dataset("json", data_files="guiones_ejemplo.jsonl", split="train")
```

---

## 🤖 Paso 4: Cargar Modelo y Tokenizer

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "mrm8488/spanish-gpt2"  # O puedes usar otro como PlanTL
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
```

---

## ⚙️ Paso 5: Tokenizar

```python
def tokenize_function(example):
    return tokenizer(example["text"], truncation=True, padding="max_length", max_length=128)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
```

---

## 🔁 Paso 6: Entrenar

```python
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./guion-model",
    per_device_train_batch_size=2,
    num_train_epochs=3,
    logging_steps=10,
    save_steps=100,
    save_total_limit=1,
    evaluation_strategy="no",
    report_to="none"
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

trainer.train()
```

---

## 💾 Paso 7: Guardar el Modelo Entrenado en Drive

```python
from google.colab import drive
drive.mount('/content/drive')

model.save_pretrained("/content/drive/MyDrive/guion-model")
tokenizer.save_pretrained("/content/drive/MyDrive/guion-model")
```

---

✅ ¡Listo! Ahora puedes usar este modelo en tu API Flask o subirlo a Hugging Face si lo deseas.
