# TRANSFORMER MODELS

## Transformers, what can they do?

### Sentiment Analysis

In [1]:
from transformers import pipeline

# Since no model is supplied, it will choose "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
classifier = pipeline("sentiment-analysis")

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [2]:
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

In [3]:
# we can pass several sentences
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

### Zero-shot classification

In [4]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [5]:
classifier(
    "This is a course about the Transformers library.",
    candidate_labels = ["education", "politics", "business"]
)

{'sequence': 'This is a course about the Transformers library.',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.871986985206604, 0.09406594932079315, 0.033947065472602844]}

### Text generation

In [6]:
from transformers import pipeline

generator = pipeline("text-generation", token="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")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [8]:
output = generator("In this course, we will teach you how to")
print(output[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In this course, we will teach you how to design and build smart applications and services that allow you to get started with smart projects with minimal effort, without waiting for a few weeks to get your start.

In this course, you will:


You can control how many different sequences are generated with the argument `num_return_sequences` and the total length of the output text with the argument `max_length`.

In [9]:
output = generator("In this course, we will teach you how to", max_length=15)
print(output[0]['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In this course, we will teach you how to use a variety of technologies


In [10]:
output = generator("In this course, we will teach you how to", num_return_sequences=2)
output

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use the MCCL 3D printer in order to create a small, flexible 3D printable object with a maximum size of 1.5cm. This object will be small, flexible for a'},
 {'generated_text': 'In this course, we will teach you how to develop a simple, fast way to perform one. Then, we will show you how easy it is for a software developer to make applications in an extensible way. And then, we will show you'}]

### Using any model from the Hub in a pipeline

In [11]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2", token="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")

Device set to use cpu


In [12]:
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use this technology in everyday life when you want to create a happy, productive life. We will be'},
 {'generated_text': 'In this course, we will teach you how to solve all the problems that accompany this course so as not to waste time on the subject. Please share'}]

### Mask filling

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask", token="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")

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [14]:
unmasker("This course will teach you all about <mask> models.", top_k=2)

[{'score': 0.19198477268218994,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04209217056632042,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

### Named Entity Recognition

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.

In [16]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [17]:
ner("My name is Ahmad and I work at University of Engineering and Technology, Lahore. I was prsuing Bechelor's of Computer Science.")

[{'entity_group': 'PER',
  'score': 0.99884915,
  'word': 'Ahmad',
  'start': 11,
  'end': 16},
 {'entity_group': 'ORG',
  'score': 0.9950792,
  'word': 'University of Engineering and Technology',
  'start': 31,
  'end': 71},
 {'entity_group': 'LOC',
  'score': 0.97850055,
  'word': 'Lahore',
  'start': 73,
  'end': 79},
 {'entity_group': 'ORG',
  'score': 0.780728,
  'word': "Bechelor ' s",
  'start': 95,
  'end': 105},
 {'entity_group': 'ORG',
  'score': 0.9224738,
  'word': 'Computer Science',
  'start': 109,
  'end': 125}]

We pass the option grouped_entities=True in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity.

In [18]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=False)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [19]:
ner("My name is Ahmad and I work at University of Engineering and Technology, Lahore. I was prsuing Bechelor's of Computer Science.")

[{'entity': 'I-PER',
  'score': 0.99884915,
  'index': 4,
  'word': 'Ahmad',
  'start': 11,
  'end': 16},
 {'entity': 'I-ORG',
  'score': 0.99653625,
  'index': 9,
  'word': 'University',
  'start': 31,
  'end': 41},
 {'entity': 'I-ORG',
  'score': 0.994396,
  'index': 10,
  'word': 'of',
  'start': 42,
  'end': 44},
 {'entity': 'I-ORG',
  'score': 0.9961349,
  'index': 11,
  'word': 'Engineering',
  'start': 45,
  'end': 56},
 {'entity': 'I-ORG',
  'score': 0.9952356,
  'index': 12,
  'word': 'and',
  'start': 57,
  'end': 60},
 {'entity': 'I-ORG',
  'score': 0.99309313,
  'index': 13,
  'word': 'Technology',
  'start': 61,
  'end': 71},
 {'entity': 'I-LOC',
  'score': 0.97850055,
  'index': 15,
  'word': 'Lahore',
  'start': 73,
  'end': 79},
 {'entity': 'I-ORG',
  'score': 0.8786444,
  'index': 22,
  'word': 'Be',
  'start': 95,
  'end': 97},
 {'entity': 'I-ORG',
  'score': 0.8017566,
  'index': 23,
  'word': '##chel',
  'start': 97,
  'end': 101},
 {'entity': 'I-ORG',
  'score': 0.

### Question answering

In [21]:
from transformers import pipeline

question_answerer = pipeline("question-answering", token="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")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cpu


In [27]:
question_answerer(
    context = "My name is Ahmad and I work at University of Engineering and Technology, Lahore",
    question= "Where do I work?",
)

{'score': 0.5376755595207214,
 'start': 31,
 'end': 79,
 'answer': 'University of Engineering and Technology, Lahore'}

> Note that this pipeline works by extracting information from the provided context; it does not generate the answer.

### Summarization

In [29]:
from transformers import pipeline

summarizer = pipeline("summarization", token="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")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cpu


In [31]:
summary = summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

In [32]:
print(summary[0]['summary_text'])

 America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .


Like with text generation, you can specify a `max_length` or a `min_length` for the result.

### Translation

In [1]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en", token="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")

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cpu


In [2]:
translator("Ce cours est produit par Hugging Face.")

[{'translation_text': 'This course is produced by Hugging Face.'}]

## Bias and limitations

In [13]:
from transformers import pipeline

unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']
['nurse', 'maid', 'teacher', 'waitress', 'prostitute']
