## **Instala librería Transformers**

In [9]:
pip install transformers==4.9.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [10]:
!pip install sentencepiece
from transformers import pipeline, set_seed

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## **Ejemplo 1.- Clasificación de una sentencia POSITIVA/NEGATIVA**

In [11]:
classifier = pipeline("sentiment-analysis")
classifier("I've waiting for a HuggingFace course my whole life")

[{'label': 'POSITIVE', 'score': 0.934044361114502}]

## **Ejemplo 2.- Clasificación indicando uno mismo las etiquetas (Zero-shot classification)**

In [12]:
classifier = pipeline("zero-shot-classification")
classifier("This is a course about the Transformers Library",
           candidate_labels = ['education', 'politics','business','sports'])

{'sequence': 'This is a course about the Transformers Library',
 'labels': ['education', 'business', 'sports', 'politics'],
 'scores': [0.8137630820274353,
  0.09472483396530151,
  0.05269620567560196,
  0.03881587088108063]}

## **GPT-2 para generación de texto**

Se carga el modelo

In [13]:
generator = pipeline('text-generation', model='gpt2')

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Ejemplo 1:

In [27]:
set_seed(123)
generator("Hey readers, today is",
max_length=20,
num_return_sequences=4)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hoy tenemos que iagos a pasado para por algo.\n\n'},
 {'generated_text': 'Hoy tenemos que iba a lo nuerza un mundo no ese qu'},
 {'generated_text': 'Hoy tenemos que iba a quésir quemir. Te más m'},
 {'generated_text': 'Hoy tenemos que été nouvelle en la jardin que la'}]

In [15]:
generator("Hey enemies, today is not",
max_length=20,
num_return_sequences=4)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hey enemies, today is not an opportunity for many people to celebrate the success of the first year of'},
 {'generated_text': "Hey enemies, today is not easy!\n\nWhen you're with the boss at your team's"},
 {'generated_text': 'Hey enemies, today is not a chance to forget..." "But what if I told you I do'},
 {'generated_text': 'Hey enemies, today is not a nice day. Let it be clear."\n\nAs it turns'}]

El siguiente código muestra como se tokeniza una frase, es decir, la codifica a un formato del modelo GPT-2

In [16]:
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text = "Let us encode this sentence please, us"
encoded_input = tokenizer(text, return_tensors='pt')
encoded_input

{'input_ids': tensor([[ 5756,   514, 37773,   428,  6827,  3387,    11,   514]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [17]:
from transformers import GPT2Model
model = GPT2Model.from_pretrained('gpt2')
output = model(**encoded_input)
output['last_hidden_state'].shape

torch.Size([1, 8, 768])

## **Question Answering**

In [18]:
qa_pipeline = pipeline("question-answering")

Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [28]:
context = """Machine learning (ML) is the study of computer algorithms that improve automatically through experience. 
It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", 
in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks."""

In [29]:
question = "What are machine learning models based on?"
result = qa_pipeline(question=question, context=context)
print("Answer:", result['answer'])

Answer: sample data


## **Traducción**

In [21]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

In [22]:
model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")

Downloading:   0%|          | 0.00/908 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.71M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/272 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

In [30]:
en_text="hello my friend"
tokenizer.src_lang = "en"
encoded_en = tokenizer(en_text, return_tensors="pt")

generated_tokens = model.generate(**encoded_en, forced_bos_token_id=tokenizer.get_lang_id("es"))
print("Español: ", tokenizer.batch_decode(generated_tokens, skip_special_tokens=True))

generated_tokens2 = model.generate(**encoded_en, forced_bos_token_id=tokenizer.get_lang_id("fr"))
print("Francés: ",tokenizer.batch_decode(generated_tokens2, skip_special_tokens=True))

generated_tokens3 = model.generate(**encoded_en, forced_bos_token_id=tokenizer.get_lang_id("ru"))
print("Ruso: ",tokenizer.batch_decode(generated_tokens3, skip_special_tokens=True))

Español:  ['Hola mi amigo']
Francés:  ['Bonjour mon ami']
Ruso:  ['Здравствуйте мой друг']
