## How can I leverage State-of-the-Art Natural Language Models with only one line of code ?

Newly introduced in transformers v2.3.0, **pipelines** provides a high-level, easy to use,
API for doing inference over a variety of downstream-tasks, including: 

- ***Sentence Classification _(Sentiment Analysis)_***: Indicate if the overall sentence is either positive or negative, i.e. *binary classification task* or *logitic regression task*.
- ***Token Classification (Named Entity Recognition, Part-of-Speech tagging)***: For each sub-entities _(*tokens*)_ in the input, assign them a label, i.e. classification task.
- ***Question-Answering***: Provided a tuple (`question`, `context`) the model should find the span of text in `content` answering the `question`.
- ***Mask-Filling***: Suggests possible word(s) to fill the masked input with respect to the provided `context`.
- ***Summarization***: Summarizes the ``input`` article to a shorter article.
- ***Translation***: Translates the input from a language to another language.
- ***Feature Extraction***: Maps the input to a higher, multi-dimensional space learned from the data.

Pipelines encapsulate the overall process of every NLP process:
 
 1. *Tokenization*: Split the initial input into multiple sub-entities with ... properties (i.e. tokens).
 2. *Inference*: Maps every tokens into a more meaningful representation. 
 3. *Decoding*: Use the above representation to generate and/or extract the final output for the underlying task.

The overall API is exposed to the end-user through the `pipeline()` method with the following 
structure:

```python
from transformers import pipeline

# Using default model and tokenizer for the task
pipeline("<task-name>")

# Using a user-specified model
pipeline("<task-name>", model="<model_name>")

# Using custom model/tokenizer as str
pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')
```

In [6]:
!pip install -q transformers

In [7]:
from __future__ import print_function
import ipywidgets as widgets
from transformers import pipeline

## 1. Sentence Classification - Sentiment Analysis

In [8]:
nlp_sentence_classif = pipeline('sentiment-analysis')
nlp_sentence_classif('He is jealous and speaks ill of all')

[{'label': 'NEGATIVE', 'score': 0.9987205862998962}]

## 2. Token Classification - Named Entity Recognition

In [10]:
nlp_token_class = pipeline('ner')
nlp_token_class('This agreement is made on the 1st day of May 2016  for ANAND RAJ, S/o. CJ Scaria, aged 26 years')

[{'entity': 'I-ORG', 'index': 13, 'score': 0.957835853099823, 'word': 'AN'},
 {'entity': 'I-ORG', 'index': 14, 'score': 0.9211783409118652, 'word': '##AN'},
 {'entity': 'I-ORG', 'index': 15, 'score': 0.8991051912307739, 'word': '##D'},
 {'entity': 'I-ORG', 'index': 16, 'score': 0.9329415559768677, 'word': 'RA'},
 {'entity': 'I-ORG', 'index': 17, 'score': 0.9433862566947937, 'word': '##J'},
 {'entity': 'I-PER', 'index': 23, 'score': 0.6606007814407349, 'word': 'C'},
 {'entity': 'I-PER', 'index': 25, 'score': 0.9957569241523743, 'word': 'Sc'},
 {'entity': 'I-PER',
  'index': 26,
  'score': 0.9778124690055847,
  'word': '##aria'}]

## 3. Question Answering

In [11]:
nlp_qa = pipeline('question-answering')
nlp_qa(context='The Transformer is a deep learning model introduced in 2017, used primarily in the field of natural language processing. Like recurrent neural networks, Transformers are designed to handle sequential data, such as natural language, for tasks such as translation and text summarization', question='Which tasks transformers is used for ?')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=473.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=260793700.0, style=ProgressStyle(descri…




{'answer': 'translation and text summarization',
 'end': 283,
 'score': 0.9806731426892785,
 'start': 250}

In [13]:
nlp_qa = pipeline('question-answering')
nlp_qa(context='A vaccine is a biological preparation that provides active acquired immunity to a particular infectious disease. A vaccine typically contains an agent that resembles a disease-causing microorganism and is often made from weakened or killed forms of the microbe, its toxins, or one of its surface proteins. The agent stimulates the bodys immune system to recognize the agent as a threat, destroy it, and to further recognize and destroy any of the microorganisms associated with that agent that it may encounter in the future. Vaccines can be prophylactic (to prevent or ameliorate the effects of a future infection by a natural or "wild" pathogen), or therapeutic (to fight a disease that has already occurred, such as cancer)', question='What does a vaccine contain ?')

{'answer': 'an agent that resembles a disease-causing microorganism',
 'end': 197,
 'score': 0.5967107704216706,
 'start': 142}

## 4. Text Generation - Mask Filling

In [16]:
nlp_fill = pipeline('fill-mask')
nlp_fill('Match is going to happen at ' + nlp_fill.tokenizer.mask_token)

Some weights of RobertaForMaskedLM were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['lm_head.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'score': 0.7720996141433716,
  'sequence': '<s>Match is going to happen at Wembley</s>',
  'token': 15101,
  'token_str': 'ĠWembley'},
 {'score': 0.0684141218662262,
  'sequence': '<s>Match is going to happen at Anfield</s>',
  'token': 18476,
  'token_str': 'ĠAnfield'},
 {'score': 0.008386010304093361,
  'sequence': '<s>Match is going to happen at home</s>',
  'token': 184,
  'token_str': 'Ġhome'},
 {'score': 0.006723654922097921,
  'sequence': '<s>Match is going to happen at Chelsea</s>',
  'token': 3098,
  'token_str': 'ĠChelsea'},
 {'score': 0.005598089657723904,
  'sequence': '<s>Match is going to happen at Liverpool</s>',
  'token': 3426,
  'token_str': 'ĠLiverpool'}]

## 5. Summarization

Summarization is currently supported by `Bart` and `T5`.

In [17]:
TEXT_TO_SUMMARIZE = """ 
Rinu Mariyam Thomas and Riya Anna Thomas- both top officials of Popular Finance, a finance company based out of Kerala- were arrested from New Delhi after scores of depositors complained that the company has cheated them. While Rinu Mariyam Thomas is the Chief Executive Officer, Rea Ann Thomas is a member of the Board of Directors and both of them are daughters of Thomas Daniel, the Managing Director (MD) of Popular Finance.


The duo were held from the Indira Gandhi International Airport on Friday while attempting to flee to Australia. The arrest was made based on a look out notice issued by the Kerala police.

Superintendent of Police of Pathanamthitta district, KG Simon confirmed to TNM that the duo will be brought to Kochi soon.


Though some reports suggested that they are already enroute to Kochi, Deputy Superintendent of Police Adoor, R Binu denied it saying that it depends on the availability of the flight from Delhi.

A team led by Konni Station House Officer PS Rajesh have reached Delhi and the duo will be brought to Kochi upon obtaining a transit warrant.

The Pathanamthitta police had issued a look-out notice against directors and board members after they fled the place following complaints by several customers that they had been swindled by the firm.

Popular Finance is headquartered in Pathanamthitta district and has 247 branches across the state. Customers allege that the company had failed to pay interest to hundreds of its depositors since this April.

 A case has been registered against all members of the board directors including Thomas Daniel.

They have been booked under various sections of the Indian Penal Code including section 406 for punishment for criminal breach of trust, section 420 for cheating and dishonestly inducing delivery of property.
"""

summarizer = pipeline('summarization')
summarizer(TEXT_TO_SUMMARIZE)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1621.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1222317369.0, style=ProgressStyle(descr…




[{'summary_text': ' The duo were held from the Indira Gandhi International Airport on Friday while attempting to flee to Australia . The arrest was made based on a look out notice issued by the Kerala police . The Pathanamthitta police had issued a look-out notice against directors and board members after they fled the place .'}]

## 6. Translation

Translation is currently supported by `T5` for the language mappings English-to-French (`translation_en_to_fr`), English-to-German (`translation_en_to_de`) and English-to-Romanian (`translation_en_to_ro`).

In [18]:
# English to French
translator = pipeline('translation_en_to_fr')
translator("HuggingFace is a French company that is based in New York City. HuggingFace's mission is to solve NLP one commit at a time")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…






HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…




Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at t5-base and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'translation_text': 'HuggingFace est une entreprise française basée à New York.'}]

In [19]:
# English to German
translator = pipeline('translation_en_to_de')
translator("The history of natural language processing (NLP) generally started in the 1950s, although work can be found from earlier periods.")

Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at t5-base and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'translation_text': 'Die Geschichte der natürlichen Sprachenverarbeitung (NLP) begann im Allgemeinen in den 1950er Jahren, obwohl Arbeit aus früheren Zeiten gefunden werden kann.'}]

## 7. Text Generation

Text generation is currently supported by GPT-2, OpenAi-GPT, TransfoXL, XLNet, CTRL and Reformer.

In [20]:
text_generator = pipeline("text-generation")
text_generator("Just reminding the national media that")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=665.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…






HBox(children=(FloatProgress(value=0.0, description='Downloading', max=548118077.0, style=ProgressStyle(descri…




Some weights of GPT2LMHeadModel were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


[{'generated_text': "Just reminding the national media that you are running an anti-Trump operation was the beginning of much of what you're doing. So just to clarify, that's not a conspiracy theory, it's a simple assertion.\n\nRODO: You"}]

## 8. Projection - Features Extraction 

In [21]:
import numpy as np
nlp_features = pipeline('feature-extraction')
output = nlp_features('Hugging Face is a French company based in Paris')
np.array(output).shape   # (Samples, Tokens, Vector Size)


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=411.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=263273408.0, style=ProgressStyle(descri…




(1, 12, 768)

Alright ! Now you have a nice picture of what is possible through transformers' pipelines, and there is more
to come in future releases. 

In the meantime, you can try the different pipelines with your own inputs

In [None]:
task = widgets.Dropdown(
    options=['sentiment-analysis', 'ner', 'fill_mask'],
    value='ner',
    description='Task:',
    disabled=False
)

input = widgets.Text(
    value='',
    placeholder='Enter something',
    description='Your input:',
    disabled=False
)

def forward(_):
    if len(input.value) > 0: 
        if task.value == 'ner':
            output = nlp_token_class(input.value)
        elif task.value == 'sentiment-analysis':
            output = nlp_sentence_classif(input.value)
        else:
            if input.value.find('<mask>') == -1:
                output = nlp_fill(input.value + ' <mask>')
            else:
                output = nlp_fill(input.value)                
        print(output)

input.on_submit(forward)
display(task, input)

Dropdown(description='Task:', index=1, options=('sentiment-analysis', 'ner', 'fill_mask'), value='ner')

Text(value='', description='Your input:', placeholder='Enter something')

[{'word': 'Peter', 'score': 0.9935821294784546, 'entity': 'I-PER'}, {'word': 'Pan', 'score': 0.9901397228240967, 'entity': 'I-PER'}, {'word': 'Marseille', 'score': 0.9984904527664185, 'entity': 'I-LOC'}, {'word': 'France', 'score': 0.9998687505722046, 'entity': 'I-LOC'}]


In [None]:
context = widgets.Textarea(
    value='Einstein is famous for the general theory of relativity',
    placeholder='Enter something',
    description='Context:',
    disabled=False
)

query = widgets.Text(
    value='Why is Einstein famous for ?',
    placeholder='Enter something',
    description='Question:',
    disabled=False
)

def forward(_):
    if len(context.value) > 0 and len(query.value) > 0: 
        output = nlp_qa(question=query.value, context=context.value)            
        print(output)

query.on_submit(forward)
display(context, query)

Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…

Text(value='Why is Einstein famous for ?', description='Question:', placeholder='Enter something')

convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 363.99it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 5178.15it/s]


{'score': 0.40340594113729367, 'start': 27, 'end': 54, 'answer': 'general theory of relativity'}
