In [1]:
from transformers import pipeline

In [2]:
classifier = pipeline("sentiment-analysis")
classifier("I am waiting for huggings face course")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9880545735359192}]

In [3]:
# we can pass several sentences
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", 
     "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

Three main steps involved when we pass some text to a pipeline:
1. The text in preprocessed into a format the model can understand
2. The preprocessed inputs are passed to the model
3. The predictions of the model are post-processed, so we can make sense of them

Some of the currently available pipelines are:
* feature-extraction (get the vector representation of text)
* fill-mask
* ner (named entity recognition)
* question-answering
* sentiment-analysis
* summarization
* text-generation
* translation
* zero-shot-classification

##### zero-shot classification
Here we need to classify texts that have not been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you do not have to rely on the labels of the pretrained model. 

In [None]:
classifier_1 = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


In [None]:
classifier_1(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

In [None]:
classifier_1(
    "The economy is may be on a downward trend worldwide",
    candidate_labels=["education", "politics", "business"],
)

##### text generation
The main idea here is to provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones.

In [None]:
generator = pipeline("text-generation")

In [None]:
# num_return_sequences will give number of sentences
# max length is the length of each sentence
generator("In India you will find",num_return_sequences = 2, max_lenght=15)

The previous examples used the default model for the task at hand, but we can use a particular model from Hub to use in a pipeline for a specific task. Let's try the distilgpt2 model!

In [None]:
generator_1 = pipeline("text-generation",
                       model = "distilgpt2")

In [None]:
generator_1("In India you will find",num_return_sequences = 2, max_lenght=15)

##### Mask filling
The idea is to fill in the blanks in a given text

In [None]:
unmasker = pipeline("fill-mask")

In [None]:
# top_k argument controls how many possibilities want to be displayed
unmasker("India is the most <mask> country", top_k=2)

##### named entity recognition
NER is a task where the model has to find which parts of the input text correspond to entities such as persons, locations or organizations

In [None]:
# we pass option grouped_entities=True in the pipeline creation function
# to tell the pipeline to regroup together the parts of sentence that
# correspond to same entity
ner = pipeline("ner", grouped_entities=True)

In [None]:
# here the model has correctly grouped Shneider Electric as a single 
# organization, even though name consists of multiple words
ner("My name is AJ and I work at Schneider Electric in India.")