**Zero-shot classification** is a machine learning technique where a model is able to classify text into predefined categories without having seen any examples from those categories during training. This can be useful when we have a large number of possible classes and it would be impractical or time-consuming to train a separate model for each one. The Hugging Face Transformers library provides a simple way to perform zero-shot classification using pre-trained language models such as BART.

Model Used: https://huggingface.co/facebook/bart-large-mnli

In [None]:
!pip install transformers

**!pip install transformers**: This command installs the Hugging Face Transformers library which contains the necessary functionality for performing zero-shot classification.

In [None]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")

**from transformers import pipeline**: This imports the pipeline function from the transformers module. The pipeline function allows us to easily use pre-trained models for various NLP tasks, including zero-shot classification.

In [3]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)
#{'labels': ['travel', 'dancing', 'cooking'],
# 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
# 'sequence': 'one day I will see the world'}

{'sequence': 'one day I will see the world',
 'labels': ['travel', 'dancing', 'cooking'],
 'scores': [0.9938650727272034, 0.003273802110925317, 0.002861041808500886]}

**classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")**: Here we create a new instance of the Pipeline class specifically for the task of zero-shot classification. We also specify that we want to use the facebook/bart-large-mnli pre-trained model for this task.
sequence_to_classify = "one day I will see the world": This variable holds the sequence of text that we want to classify. In this example, we are trying to determine what the person is interested in based on their statement.

**candidate_labels = ['travel', 'cooking', 'dancing']**: These are the predefined labels that our model can choose from. It's important to note that these labels should match the ones used during training.

**classifier(sequence_to_classify, candidate_labels)** : Finally, we call the classifier object with two arguments - the sequence we want to classify and the list of candidate labels. The method returns a dictionary containing three keys:

* 'labels' - The original list of candidate labels passed to the method.

* 'scores' - A list of scores corresponding to each label representing how confident the model is that the given input belongs to that particular class. The higher the score, the more confidence the model has.

* 'sequence' - The input sequence passed to the method.
By analyzing the output, we can conclude that the most likely category for the sentence "one day I will see the world" is 'travel'.

In [None]:
candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
classifier(sequence_to_classify, candidate_labels, multi_label=True)
#{'labels': ['travel', 'exploration', 'dancing', 'cooking'],
# 'scores': [0.9945111274719238,
#  0.9383890628814697,
#  0.0057061901316046715,
#  0.0018193122232332826],
# 'sequence': 'one day I will see the world'}


With manual PyTorch

In [5]:
# pose sequence as a NLI premise and label as a hypothesis
from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')


In [18]:
sequence_to_classify = "one day I will see the world"
premise = sequence_to_classify
label = 'travel'
hypothesis = f'This example is {label}.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')
logits = nli_model(x.to("cpu"))[0]

# we throw away "neutral" (dim 1) and take the probability of
# "entailment" (2) as the probability of the label being true
entail_contradiction_logits = logits[:,[0,2]]
probs = entail_contradiction_logits.softmax(dim=1)
prob_label_is_true = probs[:,1]
print(prob_label_is_true)

tensor([[-3.0856,  2.1139]], grad_fn=<IndexBackward0>)
tensor([[0.0055, 0.9945]], grad_fn=<SoftmaxBackward0>)
tensor([0.9945], grad_fn=<SelectBackward0>)
