In [1]:
from transformers import pipeline

In [2]:
classifier = pipeline("sentiment-analysis")
classifier("I am waiting for huggings face course")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9880545735359192}]

In [3]:
# we can pass several sentences
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", 
     "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

Three main steps involved when we pass some text to a pipeline:
1. The text in preprocessed into a format the model can understand
2. The preprocessed inputs are passed to the model
3. The predictions of the model are post-processed, so we can make sense of them

Some of the currently available pipelines are:
* feature-extraction (get the vector representation of text)
* fill-mask
* ner (named entity recognition)
* question-answering
* sentiment-analysis
* summarization
* text-generation
* translation
* zero-shot-classification

##### zero-shot classification
Here we need to classify texts that have not been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you do not have to rely on the labels of the pretrained model. 

In [7]:
classifier_1 = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


In [5]:
classifier_1(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445992469787598, 0.11197393387556076, 0.04342679679393768]}

In [6]:
classifier_1(
    "The economy is may be on a downward trend worldwide",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'The economy is may be on a downward trend worldwide',
 'labels': ['business', 'education', 'politics'],
 'scores': [0.9483700394630432, 0.028782233595848083, 0.02284778095781803]}

##### text generation
The main idea here is to provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones.

In [8]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [11]:
# num_return_sequences will give number of sentences
# max length is the length of each sentence
generator("In India you will find",num_return_sequences = 2, max_lenght=15)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In India you will find more and more of this kind of things in India. We know the answer to the question, how do we get rid of corruption in government?"\n\nThe prime minister recently told a conference in Gujarat it would take a long'},
 {'generated_text': 'In India you will find that if the user has to perform this command, the command will just print out the following:\n\n/bin/bash\n\nAnd you can see that now, no matter what, the new command is working. In'}]

The previous examples used the default model for the task at hand, but we can use a particular model from Hub to use in a pipeline for a specific task. Let's try the distilgpt2 model!

In [12]:
generator_1 = pipeline("text-generation",
                       model = "distilgpt2")

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/336M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [13]:
generator_1("In India you will find",num_return_sequences = 2, max_lenght=15)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In India you will find many cases of sexual assaults in Delhi, even when the police are investigating them, in cases like the case in Delhi, the police have already tried to identify the culprits.\n\n\n\nThe government's role in identifying"},
 {'generated_text': 'In India you will find much to admire in the Indian media coverage of the latest events at the World Cup. The report mentions that the Indian Government will not be buying "indo-Pakistani cricket equipment".\nThe report comes in response to the'}]