# Transformers, what can they do?

This is an introductory notebook to quickly go over the main tasks we can do with the `transformers` library.

We'll start by import the `pipeline` module. The `pipeline` module is the "higher-level" API for the library. It allows you to quickly use pre-trained models on a given task. All you need to do is specify the task you want to perform. However, as we'll see, you can also specify the model you want to use.

Notice in the next cell that we first instantiate the `pipeline` object, and then we classify the sentiment of a given sentence.

In [5]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for the Huggingface course my whole life!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9881473183631897}]

As we can see, the pre-trained model is downloaded from the Huggingface model hub, and then it is used to classify the sentiment of the sentence. In this case, the pre-trained model was a model from the BERT class of models: `DistilBERT-base-uncased`. The model was fine-tuned on English text with the task of classify the sentiment of a sentence.

Note: once the model has been downloaded, it is cached on your machine. This speeds up future runs of the notebook since you won't have to download it again.

Now, let's pass a list of strings to the classifier.

In [7]:
classifier(["I've been waiting for the Huggingface course my whole life!", 
"I hate this so much"])

[{'label': 'POSITIVE', 'score': 0.9881473183631897},
 {'label': 'NEGATIVE', 'score': 0.9995144605636597}]

There are three main steps involved when you pass some text to a pipeline:

1. Tokenization: The text is preprocessed into a format the model can understand.
2. Word vectors (tokenized/preprocessed inputs) do a forward-pass in the model.
3. The predictions of the model are post-processed, so you can make sense of them.

There are a ton of different tasks you can do with `pipeline`. Here are a few examples:

* feature-extraction (get the vector representation of a text)
* fill-mask
* ner (named entity recognition)
* question-answering
* sentiment-analysis
* summarization
* text-generation
* translation
* zero-shot-classification

We will cover some of these below.

What do you think that the next task does?

In [10]:
classifier = pipeline("zero-shot-classification")
classifier("This is a course about the Transformers library",
candidate_labels=["education", "politics", "business"])

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445993065834045, 0.11197397857904434, 0.043426692485809326]}

In [11]:
ner = pipeline('ner', grouped_entities=True)
ner("The current CEO of the CER is Gitane De Silva.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)
Downloading: 100%|██████████| 998/998 [00:00<00:00, 998kB/s]
Downloading: 100%|██████████| 1.24G/1.24G [01:36<00:00, 13.9MB/s]
Downloading: 100%|██████████| 60.0/60.0 [00:00<00:00, 62.3kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 1.77MB/s]
  f'`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'


[{'entity_group': 'ORG',
  'score': 0.9947985,
  'word': 'CER',
  'start': 23,
  'end': 26},
 {'entity_group': 'PER',
  'score': 0.99895877,
  'word': 'Gitane De Silva',
  'start': 30,
  'end': 45}]

In [16]:
question_answerer = pipeline('question-answering')
question_answerer(question="Which bachelor's degree did Gitane De Silva recieve?", 
context="""The current CEO of the CER is Gitane De Silva. Ms. De Silva became the Chief Executive Officer of the CER in August 2020. Prior to joining the CER, she was a Special Advisor at TransAlta Corporation. She previously served as Alberta's Senior Representative to the United States and as Deputy Minister for Alberta International and Intergovernmental Relations.

Before joining the Alberta Public Service, Ms. De Silva spent 12 years in Canada's Foreign Service as a specialist in Canada-U.S. relations, serving in a variety of roles, including as Consul General of Canada in Chicago and as Counsellor (Environment & Fisheries) at the Canadian Embassy in Washington, D.C. She also served as Deputy Head of Agency at Status of Women Canada.

Ms. De Silva has a Bachelor of Arts in International Relations from the University of British Columbia and is a 2013 recipient of The International Alliance for Women (TIAW) World of Difference Award.""",)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


{'score': 0.9861729145050049,
 'start': 758,
 'end': 801,
 'answer': 'Bachelor of Arts in International Relations'}