<figure>
<img src="../Imagenes/logo-final-ap.png"  width="80" height="80" align="left"/> 
</figure>

# <span style="color:blue"><left>Aprendizaje Profundo</left></span>

# <span style="color:red"><center>Transformers- Natural Language Processing</center></span>

<center>HuggingFace pipeline</center>

##   <span style="color:blue">Authors</span>

1. Alvaro Mauricio Montenegro Díaz, ammontenegrod@unal.edu.co
2. Daniel Mauricio Montenegro Reyes, dextronomo@gmail.com 

## <span style="color:blue">References</span> 

1. [HuggingFace. Transformers ](https://huggingface.co/transformers/)
1. [HuggingFace. Intro pipeline](https://huggingface.co/course/chapter1/3?fw=pt)
1. [Tutorial Transformer de Google](https://www.tensorflow.org/text/tutorials/transformer)
1. [Transformer-chatbot-tutorial-with-tensorflow-2](https://blog.tensorflow.org/2019/05/transformer-chatbot-tutorial-with-tensorflow-2.html) 
1. [Transformer Architecture: The positional encoding](https://kazemnejad.com/blog/transformer_architecture_positional_encoding/)
1. [Illustrated Auto-attención](https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a)
1. [Illustrated Attention](https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3#0458)
1. [Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et. al, 2015)](https://arxiv.org/pdf/1409.0473.pdf)
1. [Effective Approaches to Attention-based Neural Machine Translation (Luong et. al, 2015)](https://arxiv.org/pdf/1508.04025.pdf)
1. [Attention Is All You Need (Vaswani et. al, 2017)](https://arxiv.org/pdf/1706.03762.pdf)
1. [Self-Attention GAN (Zhang et. al, 2018)](https://arxiv.org/pdf/1805.08318.pdf)
1. [Sequence to Sequence Learning with Neural Networks (Sutskever et. al, 2014)](https://arxiv.org/pdf/1409.3215.pdf)
1. [TensorFlow’s seq2seq Tutorial with Attention (Tutorial on seq2seq+attention)](https://github.com/tensorflow/nmt)
1. [Lilian Weng’s Blog on Attention (Great start to attention)](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html#a-family-of-attention-mechanisms)
1. [Jay Alammar’s Blog on Seq2Seq with Attention (Great illustrations and worked example on seq2seq+attention)](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/)
1. [Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu et. al, 2016)](https://arxiv.org/pdf/1609.08144.pdf)
1. [Adam: A method for stochastic optimization](https://arxiv.org/pdf/1412.6980.pdf)

## <span style="color:blue">Content</span>

* [Introduction](#Introduction)
* [Pipeline de HuggingFace](#Pipeline-de-HuggingFace)

## <span style="color:blue">Introduction</span>

Modern natural language processing tasks are essentially divided into:

1. Classification of texts. For example, sentiment analysis.
1. Automatic generation of texts, based on an initial sentence.
1. Generating text in context, filling masked empty spaces with `masked text` masks.
1. Classification of each of the words in a sentence: For example: noun, adjective verb, or for example `ner`: named entity recognition. city, name of person, location, organization.
1. Generation of an answer from a question.
1. Translation from one language to another.

## <span style="color:blue">  HuggingFace Pipeline</span>

This lesson introduces `pipeline`, which is a HuggingFace function that allows direct use of pre-trained quer models (basically in English). This implies no need for pre-processing or post-processing to obtain results directly using the pre-trained models.

To run the original HuggingFace notebook in Colab go to [HuggingFace notebook](https://huggingface.co/course/chapter1/3?fw=tf). Versión Torch [aquí](https://huggingface.co/course/chapter1/3?fw=pt)

### Text Classification

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my hole life")

[{'label': 'NEGATIVE', 'score': 0.9950999617576599}]

In [2]:
classifier([
    "I've been waiting for a HuggingFace course my whole life.", 
    "I hate this so much!"
])

[{'label': 'POSITIVE', 'score': 0.9598047137260437},
 {'label': 'NEGATIVE', 'score': 0.9994558095932007}]

### Generation of text from an initial sentence

In [2]:
from transformers import pipeline

generator = pipeline("text-generation")
generator('In this course we will teach you how to')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course we will teach you how to make the best choices in life and how to learn from your mistakes.\n\nLearning from those mistakes means you have the tools to be successful in your life or your business.\n\nLearning from the mistakes'}]

In [3]:
from transformers import pipeline

generator = pipeline("text-generation")
generator('In this course we will teach you how to',
          max_length=30,
          num_return_sequences=2,
         )

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course we will teach you how to use the Internet for self-care applications, how to install and run self-help packages on your PC'},
 {'generated_text': 'In this course we will teach you how to use the MCP2215 on your iPhone and the C920.\n\nYou are going to learn'}]

### Generating text in context

In [14]:
from transformers import pipeline

unmasker = pipeline('fill-mask')
unmasker('This course teach you about <mask> models.', top_k=2)

[{'sequence': 'This course teach you about mathematical models.',
  'score': 0.1940864473581314,
  'token': 30412,
  'token_str': ' mathematical'},
 {'sequence': 'This course teach you about computational models.',
  'score': 0.049340978264808655,
  'token': 38163,
  'token_str': ' computational'}]

### NER

In [5]:
from transformers import pipeline

ner = pipeline('ner', grouped_entities=True)
ner("My name is Alice and I work at HuggingFace in Brooklyn")

[{'entity_group': 'PER',
  'score': 0.9978796,
  'word': 'Alice',
  'start': 11,
  'end': 16},
 {'entity_group': 'ORG',
  'score': 0.99638355,
  'word': 'HuggingFace',
  'start': 31,
  'end': 42},
 {'entity_group': 'LOC',
  'score': 0.9905594,
  'word': 'Brooklyn',
  'start': 46,
  'end': 54}]

### Answer to questions

In [8]:
from transformers import pipeline

question_answer = pipeline("question-answering")
question_answer(
    question="Where do I work?",
    context="My name is John and I work at HuggingFace in Brooklyn"
)

{'score': 0.8432626128196716, 'start': 30, 'end': 41, 'answer': 'HuggingFace'}

### Summary of documents

In [3]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer("""
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""")

To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448216815/work/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

### Translation

In [11]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("""
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""")

[{'translation_text': 'Los Estados Unidos han cambiado drásticamente en los últimos años. No sólo ha disminuido el número de graduados en disciplinas tradicionales de ingeniería, como la ingeniería mecánica, civil, eléctrica, química y aeronáutica, sino que en la mayoría de los programas de ingeniería de las principales universidades estadounidenses se concentra y fomenta en gran medida el estudio de las ciencias de la ingeniería. Como resultado, hay una disminución de la oferta en temas de ingeniería relacionados con la infraestructura, el medio ambiente y cuestiones conexas, y una mayor concentración en temas de alta tecnología, apoyando en gran medida desarrollos científicos cada vez más complejos. Si bien este último es importante, no debe ser a expensas de la ingeniería más tradicional. Las economías en rápido desarrollo, como China y la India, así como otros países industriales de Europa y Asia, siguen fomentando y promoviendo la enseñanza de la ingeniería. Tanto China como la In

In [12]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("""America has changed dramatically during recent years . 
           The number of engineering graduates in the U.S. 
           has declined in traditional engineering disciplines such as mechanical, civil,    
           electrical, chemical, and aeronautical engineering . 
           Rapidly developing economies such as China and India continue to 
           encourage and advance the teaching of engineering .
""")

[{'translation_text': 'América ha cambiado dramáticamente durante los últimos años. El número de graduados de ingeniería en los EE.UU. ha disminuido en las disciplinas tradicionales de ingeniería, como la ingeniería mecánica, civil, eléctrica, química y aeronáutica. Economías en rápido desarrollo como China e India siguen alentando y promoviendo la enseñanza de la ingeniería.'}]

### Zero-shot classification

In [None]:
Clasificación de textos que no han sido clasificados

In [17]:
from transformers import pipeline

classifier = pipeline('zero-shot-classification')
classifier(
          "This is a course about the Transformer library",
          candidate_labels=['education', 'politics', 'bussines']
)

Downloading:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'This is a course about the Transformer library',
 'labels': ['education', 'bussines', 'politics'],
 'scores': [0.9465058445930481, 0.04198963940143585, 0.011504470370709896]}

### Using any hub model in a pipeline

For each specific task a particular model can be specified. In this example we are going to select the `distilgpt2` model to generate text.

In [18]:
from transformers import pipeline

generator = pipeline('text-generation', model='distilgpt2')
generator(
    'In this course, we will teach you how to',
    max_length=30,
num_return_sequences=2,
)

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/353M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to successfully read and write online and how to write real time content for educational purposes.'},
 {'generated_text': 'In this course, we will teach you how to build and use the open source compiler to build a fully functional, flexible, highly open and easy to'}]

### Spanish model

In [23]:
from transformers import pipeline

generator = pipeline('text-generation', model='mrm8488/spanish-gpt2')
generator(
    'Su casa es un asco',
    max_length=30,
num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Su casa es un asco.- ¡Y que lo digas!¡Con las sábanas de la abuela en la lavadora!Y ahora tengo que limpiarla'},
 {'generated_text': 'Su casa es un asco.Y además, se ve así.Con un sombrero de copa y todos esos libros...Y en la cocina me llaman de'}]

### Translators Spanish-English, English-Spanish

In [2]:
from transformers import pipeline

translator_sp_en = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en")
translator_sp_en("Odio mucho esto")

Downloading:   0%|          | 0.00/826k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

[{'translation_text': 'I hate this very much.'}]

In [4]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("I hate this very much")

[{'translation_text': 'Odio mucho esto.'}]

### Combining models


In [19]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

translator_es_en = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en")
translator_en_es = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")



In [20]:
frase_espanol = 'odio mucho esto'
frase_ingles = translator_es_en(frase_espanol)[0]['translation_text']

classifier(frase_ingles)

[{'label': 'NEGATIVE', 'score': 0.9995800852775574}]

In [21]:
frase_espanol = 'eso está muy bien'
frase_ingles = translator_es_en(frase_espanol)[0]['translation_text']

classifier(frase_ingles)

[{'label': 'POSITIVE', 'score': 0.9998371601104736}]