<a href="https://colab.research.google.com/github/RafaelCaballero/Julio25/blob/main/code/29transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introducción a la ciencia de datos con Python
Rafael Caballero

## Transformers

Los transformers son un tipo de red neuronal:

- Procesa secuencias enteras a la vez mediante un mecanismo llamado "self-attention", identificando qué partes del texto (o de la señal) se influyen mutuamente

- Es paralelizable y escala de forma casi lineal con el tamaño del modelo y los datos, convirtiéndolo en la base de los grandes modelos de lenguaje actuales.


Vamos a ver ejemplos tomados y adaptados de

[https://www.kaggle.com/code/assiri/huggingface-simple-examples](https://www.kaggle.com/code/assiri/huggingface-simple-examples)

In [None]:
#!pip install transformers
#!pip uinstall pgrade torch torchvision torchaudio

In [None]:
from transformers import pipeline

In [None]:
nlp = pipeline('sentiment-analysis')
nlp('We are very happy to include pipeline into the transformers repository.')

Probar con la frase que se desee

Pero ¿y en español?

In [None]:
from transformers import pipeline

MODEL = "UMUTeam/roberta-spanish-sentiment-analysis"  # RoBERTa en español

sentiment = pipeline(
    task="sentiment-analysis",
    model=MODEL,
    tokenizer=MODEL,     # usa el mismo tokenizer
    return_all_scores=False,  # True → devuelve las 3 probabilidades
)

texto = "¡Esta película es fantástica: me ha encantado!"
print(sentiment(texto))

In [None]:
texto = "Por una parte está bien, por otra parte deja bastante que desear"
print(sentiment(texto))

Máscaras

In [None]:
nlp = pipeline('fill-mask')
nlp('I miss you. The last time I <mask> you was the best moment of my life')

NER

La idea es detectar nombres propios y decidir de qué son

In [None]:
mi_empresa = """
Mr. Sánchez anchors the Human Resources training unit.
He designs in-house courses on leadership, compliance, and emerging tech.
His workshops earn ISO accreditation and keep skills razor-sharp.
Mrs. Carrington, also HR, wields the hiring-and-firing sword.
She scouts talent, negotiates contracts, and delivers tough exit interviews.
Together they cover growth and pruning—the full HR lifecycle.
Mr. Lux commands the Accounting desk with forensic precision.
He audits ledgers daily, closes the books monthly, and protects cash flow.
Riggeti leads IT cybersecurity, guarding servers against zero-day threats.
He runs penetration tests at dawn and patches firewalls before lunch.
Gertrudis heads the Marketing think-tank.
She crafts viral campaigns, tracks KPIs, and talks ROI like a stockbroker.
Her data-driven slogans boosted brand reach 37 % last quarter.
Mrs. Miller drives Research & Development.
She prototypes sustainable materials and files patents at breakneck speed.
Her latest bio-polymer cut production costs by 12 %.
Johnson orchestrates the Logistics nerve center.
He reroutes shipments in real time and slashes delivery delays.
His AI-powered routing saved 50 tons of CO₂ this year.
"""

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

res = nlp(mi_empresa)
res

Resumen de textos

In [None]:


text = 'Shakespeare occupies a position unique in world literature. Other poets, such as Homer and Dante, and novelists, such as Leo Tolstoy and Charles Dickens, have transcended national barriers, but no writer’s living reputation can compare to that of Shakespeare, whose plays, written in the late 16th and early 17th centuries for a small repertory theatre, are now performed and read more often and in more countries than ever before. The prophecy of his great contemporary, the poet and dramatist Ben Jonson, that Shakespeare “was not of an age, but for all time,” has been fulfilled. It may be audacious even to attempt a definition of his greatness, but it is not so difficult to describe the gifts that enabled him to create imaginative visions of pathos and mirth that, whether read or witnessed in the theatre, fill the mind and linger there. He is a writer of great intellectual rapidity, perceptiveness, and poetic power. Other writers have had these qualities, but with Shakespeare the keenness of mind was applied not to abstruse or remote subjects but to human beings and their complete range of emotions and conflicts. Other writers have applied their keenness of mind in this way, but Shakespeare is astonishingly clever with words and images, so that his mental energy, when applied to intelligible human situations, finds full and memorable expression, convincing and imaginatively stimulating. As if this were not enough, the art form into which his creative energies went was not remote and bookish but involved the vivid stage impersonation of human beings, commanding sympathy and inviting vicarious participation. Thus, Shakespeare’s merits can survive translation into other languages and into cultures remote from that of Elizabethan England.'



In [None]:

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')



In [None]:

inputs = tokenizer.batch_encode_plus([text], max_length=1024, return_tensors='pt',truncation=True)

summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=100, early_stopping=False)

for ids in summary_ids:
    short = tokenizer.decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
    print(len(text), len(short))
    print(short)

