In [5]:
# ! pip install tensorflow

In [6]:
# ! pip install torch

In [8]:
# ! pip install transformers

# Pipeline:
The pipeline function is the most high level api in the transformers library.
The pipeline function returns an end-to-end object that performs an NLP task on one or several texts.
A pipeline includes all the necessary pre-processing as the model does not expect texts but numbers, it feeds the numbers to the model and the post-processing to make the output human readable.

# Sentiment Analysis Pipeline

In [7]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis', model='distilgpt2')

# pass single text:
res = classifier("I've been waiting for a Huggingface course")
print(res)

# Pass multiple texts:
res = classifier(['I love you', 'I hate you'])
print(res)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at distilgpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'label': 'LABEL_0', 'score': 0.06559786200523376}]
[{'label': 'LABEL_0', 'score': 0.12948733568191528}, {'label': 'LABEL_0', 'score': 0.12888683378696442}]


# Zero Shot Classification Pipeline
Helps to classify what the sentence or topic is about 

In [6]:
from transformers import pipeline 

classifier = pipeline('zero-shot-classification', model='distilgpt2')
classifier('This is a course about the Transformers library',
           candidate_labels=['education', 'ploitics', 'business'])

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at distilgpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Failed to determine 'entailment' label id from the label2id mapping in the model config. Setting to -1. Define a descriptive label2id mapping in the model config to ensure correct outputs.
Tokenizer was not supporting padding necessary for zero-shot, attempting to use  `pad_token=eos_token`


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'ploitics', 'business'],
 'scores': [0.36338528990745544, 0.3443466126918793, 0.29226812720298767]}

# Text Generation pipeline:

will auto complete a given prompt. 
Output is generated with a bit of randomness so it changes when you run it each time.

In [4]:
from transformers import pipeline

generator = pipeline('text-generation', model='distilgpt2')
generator('In this course we will teach you how to',
          max_length=30,
          num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course we will teach you how to play and take your skill set as a starter, how to play, and as a player. I will'},
 {'generated_text': 'In this course we will teach you how to convert to Java as your main operating system and write to your friends through our website at Google+.\n\n'}]

In [8]:
from transformers import pipeline

generator = pipeline('text-generation', model='distilgpt2')
generator('in this course we will teach you how to',
          max_length=30,
          num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'in this course we will teach you how to play with a real life experience. It will be a lot more about the importance of understanding and building a'},
 {'generated_text': 'in this course we will teach you how to achieve the objectives of the program. We will teach you how to achieve the objectives of the program. We'}]

The text-generation pipeline is used with the model distilgpt2 above

# Fill Mask Pipeline
This pipeline is a pertraining objective of BERT. This is guess masked words like fill in the blanks. 
In this case we ask the pipeline to generate the two most likely words in the mask using top_k 

In [2]:
from transformers import pipeline

unmasker = pipeline('fill-mask', model='bert-base-cased')
unmasker('This course will teach you all about [MASK] models.', top_k=2)


Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

[{'score': 0.2596316933631897,
  'token': 1648,
  'token_str': 'role',
  'sequence': 'This course will teach you all about role models.'},
 {'score': 0.09427264332771301,
  'token': 1103,
  'token_str': 'the',
  'sequence': 'This course will teach you all about the models.'}]

# Text Classifier Pipeline:
Name Entity Recognition Pipeline within text classifier pipeline which helps identify entities in a sentence.

In [None]:
from transformers import pipeline

ner = pipeline('ner', grouped_entities=True, model='distilgpt2')
ner('My name is Abdullah and I work at Hackules in Bangladesh')

# Extractive Question Answering
Another task available with pipeline api, is the extractive question answering.
Providing a context and a question the model will identify a span of text in the context containing the answer to the question
The model will classify whether the sentence is a question or an answer.

In [None]:
from transformers import pipeline

question_answerer = pipeline('question-answering', model='distilgpt2')
question_answerer(
    question='Where do I work?',
    context='My name is Abdullah and I work at Hackules in Bangladesh'
)

# Summarization Pipeline:
Getting short summaries with articles.

In [None]:
from transformers import pipeline

summarizer = pipeline('summarization', model='distilgpt2')
summarizer('''
It was the 1st of November yesterday, and I had decided to grind my research paper to completion. I failed at the task but I did make some progress. I also discovered that, the conference papers can't be more than 10 pages long and too long conference papers get rejected. I really have a lot to learn about conferences and paper submissions but I don't have anybody to guide me through the steps. I am not complaining, I am just saying that it's going to take me a while but I will get there in shaa Allah!
''')

# Translation Pipeline:
The last task by the pipeline API is translation. 

In [16]:
from transformers import pipeline
# ! pip install sentencepiece
import sentencepiece
translator = pipeline('translation', model='Helsinki-NLP/opus-mt-fr-en')
translator('Ce cours est produit par Hugging Face.')

ValueError: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer.

So, there are the following tasks available withing our Pipeline API:

- Text-Classification(Also called sequence classification)
- Zero Shot Classification
- Text Generation
- Text Completion(mask filling)/ Masked Language Modeling
- Token Classification
- Question Answering
- Summarization
- Translation