**Pipeline API**

In [None]:
!pip install transformers[sentencepiece]



**Sentiment Analysis**

In [None]:
from transformers import pipeline

In [None]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
classifier.model.name_or_path


'distilbert-base-uncased-finetuned-sst-2-english'

**Zero-Shot Classification**

In Zero-Shot classification, the input texts are not labeled. Here, we need to define the labels as per our needs.

In [None]:
from transformers import pipeline
classifier = pipeline('zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

**Text Generation**

The text generation is done using the initial prompt, and the model auto-completes the remaining text. However, text generation involves some randomness, and the results may not match exactly.

In [None]:
from transformers import pipeline

generator = pipeline('text-generation')

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
results = generator("I get irritated because ",
          num_return_sequences=2,
          max_length=30)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
for i in results:
    print(i['generated_text'])
    print('\n')


I get irritated because  some people in the middle class see that. 
To anyone who doesn't know me well, I'm quite the


I get irritated because  because I like to hear that about everybody. What are they saying about you? Is this guy going in with his heart




**Question Answering**

The question answering pipeline can answer questions by understanding the context of the given information.

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

**Summarization**

The summarization pipeline API has the ability to generate a summary of the given input text by keeping most of the important aspects.

In [None]:
from transformers import pipeline

summarizer = pipeline('summarization')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

**From the pipeline, use any model.**

Let us implement the text generation pipeline object by using the GPT2 model.

In [None]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2')

In [None]:
generator.model.name_or_path

'gpt2'

**Tokenizers**

In [None]:
!pip install transformers[sentencepiece] --q

In [None]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

Here, the AutoTokenizer class has been imported from the transformers library and initialized with the model checkpoint name.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=checkpoint)

**Model**

In [None]:
from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

In [None]:
model = AutoModel.from_pretrained(checkpoint)

**Sequence Classification**

In [None]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=checkpoint)

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

In [None]:
raw_inputs = ["I get irritated during the winter.",
              "Ravi received the director's gold medal for being the topper.",
              "As expected, Sumana received her promotion letter today.",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)


{'input_ids': tensor([[  101,  1045,  2131, 15560,  2076,  1996,  3467,  1012,   102,     0,
             0,     0,     0,     0,     0,     0],
        [  101, 16806,  2363,  1996,  2472,  1005,  1055,  2751,  3101,  2005,
          2108,  1996,  2327,  4842,  1012,   102],
        [  101,  2004,  3517,  1010,  7680,  5162,  2363,  2014,  4712,  3661,
          2651,  1012,   102,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])}


Our tokenized inputs are ready. Let us go for the predictions.

In [None]:
outputs = model(**inputs)
print(outputs)

SequenceClassifierOutput(loss=None, logits=tensor([[ 4.0595, -3.3044],
        [-3.9833,  4.3278],
        [-2.4997,  2.6847]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [None]:
print(outputs.logits.shape)

torch.Size([3, 2])


Our model has a matrix dimension of 3 x 2, as we have 3 sequences in the input and there are 2 classes.

We are now finding the outputs by passing them through the softmax activation to get the probabilities of each class for the input sentences.

In [None]:
#Find the label / class probabilities
import torch
outputs = torch.nn.functional.softmax(outputs.logits, dim = -1)
print(outputs)

tensor([[9.9937e-01, 6.3328e-04],
        [2.4571e-04, 9.9975e-01],
        [5.5720e-03, 9.9443e-01]], grad_fn=<SoftmaxBackward0>)


We get [0.99, 0.0006] as the output for the first input, [0.0002, 0.99] as the output for the second input, and finally [0.005,0.99] as the output for the third input sample.

Here we observe that our model is 99% confident that the first input sample belongs to the NEGATIVE class, 99% confident that the second input sample belongs to the POSITIVE class, and 99% confident that the third input sample belongs to the POSITIVE class. We observe that the model’s output is quite accurate.

We can check the labels of the model in the following way:

In [None]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}