In [None]:
from transformers import AutoTokenizer, AutoModel
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained('deepset/sentence_bert')
model = AutoModel.from_pretrained('deepset/sentence_bert')

sentence = 'Who are you voting for in 2020?'
labels = ['business', 'art & culture', 'politics']

# run inputs through model and mean-pool over the sequence
# dimension to get sequence-level representations
inputs = tokenizer.batch_encode_plus([sentence] + labels,
                                     return_tensors='pt',
                                     pad_to_max_length=True)
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
output = model(input_ids, attention_mask=attention_mask)[0]
sentence_rep = output[:1].mean(dim=1)
label_reps = output[1:].mean(dim=1)

# now find the labels with the highest cosine similarities to
# the sentence
similarities = F.cosine_similarity(sentence_rep, label_reps)
closest = similarities.argsort(descending=True)
for ind in closest:
    print(f'label: {labels[ind]} \t similarity: {similarities[ind]}')

## Using pipeline for zeroshot learning

In [8]:
from transformers import pipeline

In [9]:
classifier = pipeline("zero-shot-classification")

Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


We can use this pipeline by passing in a sequence and a list of candidate labels. The pipeline assumes by default that only one of the candidate labels is true, returning a list of scores for each label which add up to 1.

In [10]:
sequence = "Who are you voting for in 2020?"
candidate_labels = ["politics", "public health", "economics"]

classifier(sequence, candidate_labels)

{'sequence': 'Who are you voting for in 2020?',
 'labels': ['politics', 'economics', 'public health'],
 'scores': [0.9725187420845032, 0.014584202319383621, 0.01289703231304884]}

*To* do multi-class classification, simply pass `multi_class=True`. In this case, the scores will be independent, but each will fall between 0 and 1.

In [11]:
sequence = "Who are you voting for in 2020?"
candidate_labels = ["politics", "public health", "economics", "elections"]

classifier(sequence, candidate_labels, multi_class=True)

{'sequence': 'Who are you voting for in 2020?',
 'labels': ['politics', 'elections', 'public health', 'economics'],
 'scores': [0.9720695614814758,
  0.9676108360290527,
  0.03248709812760353,
  0.0061644576489925385]}

Here's an example of sentiment classification: 

In [12]:
sequence = "I hated this movie. The acting sucked."
candidate_labels = ["positive", "negative"]

classifier(sequence, candidate_labels)

{'sequence': 'I hated this movie. The acting sucked.',
 'labels': ['negative', 'positive'],
 'scores': [0.9916268587112427, 0.008373172953724861]}

So how does this method work?

The underlying model is trained on the task of Natural Language Inference (NLI), which takes in two sequences and determines whether they contradict each other, entail each other, or neither.

This can be adapted to the task of zero-shot classification by treating the sequence which we want to classify as one NLI sequence (called the premise) and turning a candidate label into the other (the hypothesis). If the model predicts that the constructed premise _entails_ the hypothesis, then we can take that as a prediction that the label applies to the text. Check out [this blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html) for a more detailed explanation.

By default, the pipeline turns labels into hypotheses with the template `This example is {class_name}.`. This works well in many settings, but you can also customize this for your specific setting. Let's add another review to our above sentiment classification example that's a bit more challenging:

In [13]:
sequences = [
    "I hated this movie. The acting sucked.",
    "This movie didn't quite live up to my high expectations, but overall I still really enjoyed it."
]
candidate_labels = ["positive", "negative"]

classifier(sequences, candidate_labels)

[{'sequence': 'I hated this movie. The acting sucked.',
  'labels': ['negative', 'positive'],
  'scores': [0.9916267991065979, 0.008373189717531204]},
 {'sequence': "This movie didn't quite live up to my high expectations, but overall I still really enjoyed it.",
  'labels': ['negative', 'positive'],
  'scores': [0.8148515820503235, 0.18514837324619293]}]

The second example is a bit harder. Let's see if we can improve the results by using a hypothesis template which is more specific to the setting of review sentiment analysis. Instead of the default, This example is {}., we'll use, The sentiment of this review is {}. (where {} is replaced with the candidate class name)

In [14]:
sequences = [
    "I hated this movie. The acting sucked.",
    "This movie didn't quite live up to my high expectations, but overall I still really enjoyed it."
]
candidate_labels = ["positive", "negative"]
hypothesis_template = "The sentiment of this review is {}."

classifier(sequences, candidate_labels, hypothesis_template=hypothesis_template)

[{'sequence': 'I hated this movie. The acting sucked.',
  'labels': ['negative', 'positive'],
  'scores': [0.9890093207359314, 0.010990677401423454]},
 {'sequence': "This movie didn't quite live up to my high expectations, but overall I still really enjoyed it.",
  'labels': ['positive', 'negative'],
  'scores': [0.9581230282783508, 0.04187696799635887]}]

## Other useful links

https://discuss.huggingface.co/t/new-pipeline-for-zero-shot-text-classification/681

https://github.com/huggingface/transformers/pull/5760

https://joeddav.github.io/blog/2020/05/29/ZSL.html

https://github.com/joeddav/blog/blob/master/_notebooks/2020-05-29-ZSL.ipynb

https://huggingface.co/zero-shot/