In [1]:
!pip install datasets evaluate transformers[sentencepiece] --quiet


In [2]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu --quiet


In [3]:
import torch
from transformers import pipeline

In [4]:
#Análisis de Sentimientos

In [5]:
classifier = pipeline("sentiment-analysis", model="pysentimiento/robertuito-sentiment-analysis")

resultado = classifier("Odio aprender sobre inteligencia artificial.")
print(resultado)

Device set to use cpu


[{'label': 'NEG', 'score': 0.9434412717819214}]


In [6]:
classifier = pipeline("sentiment-analysis", model="pysentimiento/robertuito-sentiment-analysis")

resultado = classifier("Amo aprender sobre inteligencia artificial.")
print(resultado)

Device set to use cpu


[{'label': 'POS', 'score': 0.9226016402244568}]


In [7]:
#Zero-Shot Classification (Clasificación de texto en categorías)

In [8]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

resultado = classifier(
    "La política es el arte de gobernar y organizar la vida en sociedad. A través de instituciones, leyes y decisiones, busca resolver conflictos, distribuir recursos y garantizar el bien común.",
    candidate_labels=["educación", "política", "negocios"],
    hypothesis_template="Este texto trata sobre {}."
)

print(resultado)

Device set to use cpu


{'sequence': 'La política es el arte de gobernar y organizar la vida en sociedad. A través de instituciones, leyes y decisiones, busca resolver conflictos, distribuir recursos y garantizar el bien común.', 'labels': ['política', 'educación', 'negocios'], 'scores': [0.9916322827339172, 0.004312432371079922, 0.004055291414260864]}


In [9]:
#Generación de texto 

In [10]:
generator = pipeline("text-generation", model="DeepESP/gpt2-spanish")

resultado = generator(
    "En este curso aprenderás sobre inteligencia artificial",
    max_length=50,
    num_return_sequences=2
)

print(resultado)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'En este curso aprenderás sobre inteligencia artificial. Como ya sabrás, la mayoría de los ordenadores pueden aprender sobre ordenadores basados en ordenadores, como ordenadores portátiles y ordenadores para crear aplicaciones inteligentes a partir de ordenadores. \n\nSin embargo, a ti te va la'}, {'generated_text': 'En este curso aprenderás sobre inteligencia artificial. \n\n—Lo haré si se me quita el sueño —advirtió Klaus, algo sorprendido por la conversación entre Julian y Julian. \n\n—Es una idea magnífica, pero tengo que decir algo. Julian tiene'}]


In [11]:
generator = pipeline("text-generation", model="DeepESP/gpt2-spanish")

resultado = generator(
    "En este curso aprenderás sobre inteligencia artificial",
)
print(resultado)

Device set to use cpu


[{'generated_text': 'En este curso aprenderás sobre inteligencia artificial. \n\nLa clase de la enseñanza está destinada a ser muy útil a un niño. Puedes elegir de buen profesor o de maestro o de discípulo, para que puedas usar tu sistema informático. \n\nEs importante'}]


In [12]:
#Completar el texto con la palabra más probable (Fill-Mask)

In [13]:
unmasker = pipeline("fill-mask", model="dccuchile/bert-base-spanish-wwm-cased")

resultado = unmasker("Este curso enseña sobre modelos [MASK].", top_k=3)

print(resultado)

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Device set to use cpu


[{'score': 0.08100215345621109, 'token': 3792, 'token_str': 'sociales', 'sequence': 'Este curso enseña sobre modelos sociales.'}, {'score': 0.06293947249650955, 'token': 6156, 'token_str': 'económicos', 'sequence': 'Este curso enseña sobre modelos económicos.'}, {'score': 0.027612948790192604, 'token': 3, 'token_str': '[UNK]', 'sequence': 'Este curso enseña sobre modelos.'}]


In [14]:
#Identificar nombres propios en un texto (Entidades) 

In [15]:
ner = pipeline("ner", model="PlanTL-GOB-ES/roberta-base-bne", grouped_entities=True)

resultado = ner("Me llamo Pedro y trabajo en Telefónica en Madrid.")

print(resultado)

Some weights of RobertaForTokenClassification were not initialized from the model checkpoint at PlanTL-GOB-ES/roberta-base-bne and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


[{'entity_group': 'LABEL_0', 'score': np.float32(0.5211671), 'word': 'Me', 'start': 0, 'end': 2}, {'entity_group': 'LABEL_1', 'score': np.float32(0.54079884), 'word': ' llamo Pedro y trabajo en', 'start': 3, 'end': 27}, {'entity_group': 'LABEL_0', 'score': np.float32(0.510976), 'word': ' Telefónica en Madrid.', 'start': 28, 'end': 49}]


In [17]:
ner = pipeline("ner", model="mrm8488/bert-spanish-cased-finetuned-ner", grouped_entities=True)
resultado = ner("Me llamo Pedro y trabajo en Telefónica en Madrid.")
print(resultado)

Some weights of the model checkpoint at mrm8488/bert-spanish-cased-finetuned-ner were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


[{'entity_group': 'PER', 'score': np.float32(0.9989525), 'word': 'Pedro', 'start': 9, 'end': 14}, {'entity_group': 'ORG', 'score': np.float32(0.90264404), 'word': 'Telefónica', 'start': 28, 'end': 38}, {'entity_group': 'LOC', 'score': np.float32(0.9998223), 'word': 'Madrid', 'start': 42, 'end': 48}]


In [None]:
#Extraer respuestas a preguntas en un contexto dado (Question Answering)

In [19]:
qa = pipeline("question-answering", model="PlanTL-GOB-ES/roberta-large-bne-sqac")

resultado = qa(
    question="¿Dónde trabajo?",
    context="Me llamo Pedro y trabajo en Telefónica en Madrid."
)

print(resultado)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.07k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/858k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/516k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.48M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Device set to use cpu


{'score': 0.568343997001648, 'start': 25, 'end': 38, 'answer': 'en Telefónica'}


In [None]:
#Resumen de Texto

In [21]:
summarizer = pipeline("summarization", model="csebuetnlp/mT5_multilingual_XLSum")

texto = """
América ha cambiado drásticamente en los últimos años. No solo ha disminuido el número 
de graduados en ingeniería, sino que muchas universidades ahora se enfocan en ciencias aplicadas.
China e India, en cambio, continúan formando más ingenieros.
"""

resultado = summarizer(texto)

print(resultado)

config.json:   0%|          | 0.00/730 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.33G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/375 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Device set to use cpu
Your max_length is set to 84, but your input_length is only 68. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=34)


[{'summary_text': 'En los últimos años, el número de ingenieros en Estados Unidos ha disminuido.'}]


In [None]:
#Traducción de texto