
# Transformers Extendidos: Visualización, Fine-Tuning y Encoder desde Cero (TensorFlow)

Este notebook amplía los fundamentos del modelo Transformer incluyendo:

1. Visualización de atención de un modelo preentrenado.
2. Fine-tuning completo con métricas y validación.
3. Implementación desde cero de un **Encoder Transformer** con TensorFlow.




## 1. Visualización de Atención

Usamos `DistilBERT` de Hugging Face para obtener los pesos de atención.

Requiere:
```bash
pip install transformers
```


In [None]:

from transformers import TFAutoModel, AutoTokenizer
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

model = TFAutoModel.from_pretrained("distilbert-base-uncased", output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

text = "The transformer model changed deep learning forever."
inputs = tokenizer(text, return_tensors="tf")
outputs = model(**inputs)
attentions = outputs.attentions  # tuple of (layers, batch, heads, seq_len, seq_len)

# Visualizar una cabeza de una capa
layer = 0
head = 0
weights = attentions[layer][0, head].numpy()

tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])

plt.figure(figsize=(10, 8))
sns.heatmap(weights, xticklabels=tokens, yticklabels=tokens, cmap="viridis")
plt.title("Atención Layer 0 - Head 0")
plt.show()



## 2. Fine-Tuning Completo (DistilBERT para clasificación binaria)

Incluye:
- Tokenización
- Entrenamiento con validación
- Visualización de métricas


In [None]:

from transformers import TFAutoModelForSequenceClassification
from datasets import load_dataset
from transformers import DataCollatorWithPadding
from sklearn.metrics import classification_report

# Cargar dataset
ds = load_dataset("imdb", split="train[:2000]").train_test_split(test_size=0.2)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

ds = ds.map(lambda x: tokenizer(x["text"], truncation=True, padding="max_length"), batched=True)
ds.set_format(type='tensorflow', columns=["input_ids", "attention_mask", "label"])

train_ds = tf.data.Dataset.from_tensor_slices((
    {"input_ids": ds["train"]["input_ids"], "attention_mask": ds["train"]["attention_mask"]},
    ds["train"]["label"]
)).batch(16)

val_ds = tf.data.Dataset.from_tensor_slices((
    {"input_ids": ds["test"]["input_ids"], "attention_mask": ds["test"]["attention_mask"]},
    ds["test"]["label"]
)).batch(16)

model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=["accuracy"])

history = model.fit(train_ds, validation_data=val_ds, epochs=2)



## 3. Implementación de un Encoder Transformer desde Cero

Componentes:
- MultiHead Attention
- Feed Forward
- Positional Encoding
- Layer Normalization


In [None]:

class PositionalEncoding(tf.keras.layers.Layer):
    def __init__(self, d_model, max_len=5000):
        super().__init__()
        pos = np.arange(max_len)[:, np.newaxis]
        i = np.arange(d_model)[np.newaxis, :]
        angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
        angle_rads = pos * angle_rates
        angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
        angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
        self.pos_encoding = tf.constant(angle_rads[np.newaxis, ...], dtype=tf.float32)

    def call(self, x):
        return x + self.pos_encoding[:, :tf.shape(x)[1], :]

class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, dff, dropout=0.1):
        super().__init__()
        self.mha = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=d_model)
        self.ffn = tf.keras.Sequential([
            tf.keras.layers.Dense(dff, activation='relu'),
            tf.keras.layers.Dense(d_model)
        ])
        self.layernorm1 = tf.keras.layers.LayerNormalization()
        self.layernorm2 = tf.keras.layers.LayerNormalization()
        self.dropout1 = tf.keras.layers.Dropout(dropout)
        self.dropout2 = tf.keras.layers.Dropout(dropout)

    def call(self, x, training):
        attn_output = self.mha(x, x, x)
        out1 = self.layernorm1(x + self.dropout1(attn_output, training=training))
        ffn_output = self.ffn(out1)
        out2 = self.layernorm2(out1 + self.dropout2(ffn_output, training=training))
        return out2


In [None]:

sample = tf.random.uniform((1, 50, 128))
encoder = EncoderLayer(d_model=128, num_heads=4, dff=512)
output = encoder(sample, training=False)
print("Output shape:", output.shape)



## 4. Aplicaciones Prácticas con Transformers

A continuación se exploran tareas específicas que aprovechan arquitecturas Transformer:

- Pregunta-Respuesta (QA)
- Resumen Automático (Summarization)
- Generación de Texto (Text Generation)
- Traducción Automática (Translation)



### Pregunta-Respuesta

Usamos `AutoModelForQuestionAnswering` para responder preguntas sobre un contexto dado.


In [None]:

from transformers import pipeline

qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

contexto = "Transformers are deep learning models that use self-attention mechanisms to model relationships in sequential data."
pregunta = "What do transformers use to model relationships?"

resultado = qa(question=pregunta, context=contexto)
print(f"Respuesta: {resultado['answer']}")



### Resumen Automático

Usamos `facebook/bart-large-cnn` o `t5-small` para sintetizar textos largos.


In [None]:

resumidor = pipeline("summarization", model="facebook/bart-large-cnn")

texto_largo = '''
Transformers have transformed deep learning and NLP. They use attention mechanisms to relate different positions of a sequence and have enabled models like BERT, GPT, and T5 to achieve state-of-the-art results in tasks such as translation, question answering, and summarization.
'''

resumen = resumidor(texto_largo, max_length=50, min_length=20, do_sample=False)
print("Resumen:", resumen[0]['summary_text'])



### Generación de Texto

Utilizamos `GPT2` para generar texto a partir de un prompt inicial.


In [None]:

generador = pipeline("text-generation", model="gpt2")

prompt = "Once upon a time, there was a robot that"
resultado = generador(prompt, max_length=50, num_return_sequences=1)
print(resultado[0]['generated_text'])



### Traducción Automática

Usamos `Helsinki-NLP/opus-mt-en-es` para traducir del inglés al español.


In [None]:

traductor = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")

frase = "Deep learning models are becoming increasingly powerful."
traduccion = traductor(frase)
print("Traducción:", traduccion[0]['translation_text'])
