# Zero-shot classification

We’ll start by tackling a more challenging task where we need to classify texts that haven’t been **labelled**. This is a common scenario in real-world projects because **"annotating text"** is usually time-consuming and requires domain expertise.

For this use case, the **`zero-shot-classification` pipeline** is very powerful:

it allows you to specify which **labels** to use for the **classification**, so you don’t have to rely on the *"labels of the pretrained model"*.

You’ve already seen how the model can classify a sentence as **positive or negative** using those two labels — but it can also classify the text using any other set of labels you like.

In [1]:
from transformers import pipeline

In [2]:
classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [3]:
classifier(["Amit exemplifies a dynamic leadership style, laser focused on outcome and fueled by infectious energy that drive team to exceed expectation","Amit is a high-impact leader, sharp in focus, relentless in drive, and always a step ahead."], candidate_labels=['leadership', 'politician', 'entrepreneur', 'businessman','technologist', 'architect'])

[{'sequence': 'Amit exemplifies a dynamic leadership style, laser focused on outcome and fueled by infectious energy that drive team to exceed expectation',
  'labels': ['leadership',
   'entrepreneur',
   'businessman',
   'architect',
   'politician',
   'technologist'],
  'scores': [0.9363778829574585,
   0.020054060965776443,
   0.018768617883324623,
   0.017124617472290993,
   0.004386514890938997,
   0.003288235515356064]},
 {'sequence': 'Amit is a high-impact leader, sharp in focus, relentless in drive, and always a step ahead.',
  'labels': ['leadership',
   'businessman',
   'entrepreneur',
   'architect',
   'politician',
   'technologist'],
  'scores': [0.7972721457481384,
   0.08693934977054596,
   0.05499393120408058,
   0.03206514194607735,
   0.016450263559818268,
   0.01227930374443531]}]

This **pipeline** is called **`zero-shot`** because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!