<a href="https://colab.research.google.com/github/d-tomas/workshops/blob/main/transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

🤗 [Transformers](https://huggingface.co/transformers/) library provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with deep interoperability between Jax, PyTorch and TensorFlow.

There are more than 30,000 pre-trained [models](https://huggingface.co/models) and 2,000 [datasets](https://huggingface.co/datasets) available in their web page, covering tenths of different tasks in more than 100 languages.

This demo exemplifies the use of [pipelines](https://huggingface.co/transformers/main_classes/pipelines.html). These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, and Question Answering.

The following examples are inspired in the 🤗 Transformers library [course](https://huggingface.co/course/chapter1/3?fw=pt).

#Initial setup

In [None]:
# Install the Transformers library
!pip install transformers[sentencepiece]

In [None]:
# Import the required libraries
from transformers import pipeline

# Sentiment analysis
Classify a sentence according to positive or negative sentiments.

In [None]:
# Load the sentiment analysis model ('distilbert-base-uncased-finetuned-sst-2-english' by default)
model = pipeline('sentiment-analysis')

In [None]:
# Try it!
model('This is the best keynote speech I have ever attended in my life. Praise to David!')

# Zero-shot classification
Classify text according to a set of given labels.

In [None]:
# Load the zero-shot classification model ('facebook/bart-large-mnli' by default)
model = pipeline('zero-shot-classification')

In [None]:
# Try it!
model('This lecture is about Natural Language Processing', candidate_labels=['education', 'politics', 'business', 'sports'])

# Text generation
Predict the words that will follow a specified text prompt, creating a coherent portion of text that is a continuation from the given context.

In [None]:
# Load the text generation model ('gpt2' by default)
model = pipeline('text-generation')

In [None]:
# Try it! (you will get a different output each time)
model('I opened the door and found')

In [None]:
# Tyr it tuning some parameters (maximum length generated and number of returned sentences)!
model('The book was amazing', max_length=40, num_return_sequences=3)

# Masked language modelling
Mask a token in a sequence with a masking token, and prompt the model to fill that mask with an appropriate token.

In [None]:
# Load the masked language modelling model ('distilroberta-base' by default)
model = pipeline('fill-mask')

In [None]:
# Try it (returning the 'top_k' words)!
model('I <mask> this lecture.', top_k=5)

# Named entity recognition
Classify tokens according to a class (e.g. person, organisation or location).

In [None]:
# Load the named entity recognition model ('dbmdz/bert-large-cased-finetuned-conll03-english' by default)
model = pipeline('ner', grouped_entities=True)

In [None]:
# Try it!
model('My name is David and I live in Spain.')

# Question answering
Extract an answer from a text given a question.

In [None]:
# Load the question answering model ('distilbert-base-cased-distilled-squad' by default)
model = pipeline('question-answering')

In [None]:
# Try it!
model(question='Where do I work?', context='My name is David and I work really hard at the Unviersity of Alicante')

# Machine translation
Translate from one language to another.

In [None]:
# Load the machine translation model from ES to EN ('Helsinki-NLP/opus-mt-es-en')
# Try different models changing 'Helsinki-NLP/opus-mt-{src}-{tgt}' (src = source language, tgt = target)
model = pipeline('translation', model='Helsinki-NLP/opus-mt-es-en')

In [None]:
# Try it!
model('Ojalá el próximo año pueda ir a Alicante')