# Apply labels with zero-shot classification

This notebook shows how zero-shot classification can be used to perform text classification, labeling and topic modeling. txtai provides a light-weight wrapper around the zero-shot-classification pipeline in Hugging Face Transformers. This method works impressively well out of the box. Kudos to the Hugging Face team for the phenomenal work on zero-shot classification!

The examples in this notebook pick the best matching label using a list of labels for a snippet of text.

[tldrstory](https://github.com/neuml/tldrstory) has full-stack implementation of a zero-shot classification system using Streamlit, FastAPI and Hugging Face Transformers. There is also a [Medium article describing tldrstory](https://towardsdatascience.com/tldrstory-ai-powered-understanding-of-headlines-and-story-text-fc86abd702fc) and zero-shot classification. 


# Install dependencies

Install `txtai` and all dependencies.

In [2]:
pip install pybind11

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
pip install txtai

Collecting txtaiNote: you may need to restart the kernel to use updated packages.

  Obtaining dependency information for txtai from https://files.pythonhosted.org/packages/bb/0b/d7dd51a844267d41afb3d0912da2af19c33b3145669b97676458b4a5bd46/txtai-6.2.0-py3-none-any.whl.metadata
  Using cached txtai-6.2.0-py3-none-any.whl.metadata (22 kB)
Collecting faiss-cpu>=1.7.1.post2 (from txtai)
  Using cached faiss-cpu-1.7.4.tar.gz (57 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
INFO: pip is looking at multiple versions of txtai to determine which version is compatible with other requirements. This could take a while.
Collecting txtai
  Obtaining dependency information for txtai from https://files.pythonhosted.or

  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [28 lines of output]
      c:\Users\SebS1\AppData\Local\Programs\Python\Python312\python.exe: No module named pip
      Traceback (most recent call last):
        File "<string>", line 38, in __init__
      ModuleNotFoundError: No module named 'pybind11'
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "c:\Users\SebS1\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "c:\Users\SebS1\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File 

In [16]:
pip list

Package           Version
----------------- -------
asttokens         2.4.1
colorama          0.4.6
comm              0.2.0
debugpy           1.8.0
decorator         5.1.1
executing         2.0.1
ipykernel         6.27.1
ipython           8.18.1
jedi              0.19.1
jupyter_client    8.6.0
jupyter_core      5.5.1
matplotlib-inline 0.1.6
nest-asyncio      1.5.8
packaging         23.2
parso             0.8.3
pip               23.2.1
platformdirs      4.1.0
prompt-toolkit    3.0.43
psutil            5.9.7
pure-eval         0.2.2
pybind11          2.11.1
Pygments          2.17.2
python-dateutil   2.8.2
pywin32           306
pyzmq             25.1.2
six               1.16.0
stack-data        0.6.3
tornado           6.4
traitlets         5.14.0
wcwidth           0.2.12
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


# Create a Labels instance

The Labels instance is the main entrypoint for zero-shot classification. This is a light-weight wrapper around the zero-shot-classification pipeline in Hugging Face Transformers.

In addition to the default model, additional models can be found on the [Hugging Face model hub](https://huggingface.co/models?search=mnli).


In [15]:
%%capture

from txtai.pipeline import Labels

# Create labels model
labels = Labels()

# Alternate models can be used via passing the model path as shown below
# labels = Labels("roberta-large-mnli")

ModuleNotFoundError: No module named 'txtai'

# Applying labels to text

The example below shows how a zero-shot classifier can be applied to arbitary text. The default model for the zero-shot classification pipeline is *bart-large-mnli*. 

Look at the results below. It's nothing short of amazing✨ how well it performs. These aren't all simple even for a human. For example, intercepted was purposely picked as that is more common in football than basketball. The amount of knowledge stored in larger Transformer models continues to impress me. 

In [None]:
data = ["Dodgers lose again, give up 3 HRs in a loss to the Giants",
        "Giants 5 Cardinals 4 final in extra innings",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Flyers 4 Lightning 1 final. 45 saves for the Lightning.",
        "Slashing, penalty, 2 minute power play coming up",
        "What a stick save!",
        "Leads the NFL in sacks with 9.5",
        "UCF 38 Temple 13",
        "With the 30 yard completion, down to the 10 yard line",
        "Drains the 3pt shot!!, 0:15 remaining in the game",
        "Intercepted! Drives down the court and shoots for the win",
        "Massive dunk!!! they are now up by 15 with 2 minutes to go"]

# List of labels
tags = ["happy", "sad", "angry", "anxious", "excited", "romantic"]

print("%-75s %s" % ("Text", "Label"))
print("-" * 100)

for text in data:
    print("%-75s %s" % (text, tags[labels(text, tags)[0][0]]))

Text                                                                        Label
----------------------------------------------------------------------------------------------------
Dodgers lose again, give up 3 HRs in a loss to the Giants                   Baseball
Giants 5 Cardinals 4 final in extra innings                                 Baseball
Dodgers drop Game 2 against the Giants, 5-4                                 Baseball
Flyers 4 Lightning 1 final. 45 saves for the Lightning.                     Hockey
Slashing, penalty, 2 minute power play coming up                            Hockey
What a stick save!                                                          Hockey
Leads the NFL in sacks with 9.5                                             Football
UCF 38 Temple 13                                                            Football
With the 30 yard completion, down to the 10 yard line                       Football
Drains the 3pt shot!!, 0:15 remaining in the game         

# Let's try emoji 😀

Does the model have knowledge of emoji? Check out the run below, sure looks like it does! Notice the labels are applied based on the perspective from which the information is presented. 

In [None]:
tags = ["😀", "😡"]

print("%-75s %s" % ("Text", "Label"))
print("-" * 100)

for text in data:
    print("%-75s %s" % (text, tags[labels(text, tags)[0][0]]))

Text                                                                        Label
----------------------------------------------------------------------------------------------------
Dodgers lose again, give up 3 HRs in a loss to the Giants                   😡
Giants 5 Cardinals 4 final in extra innings                                 😀
Dodgers drop Game 2 against the Giants, 5-4                                 😡
Flyers 4 Lightning 1 final. 45 saves for the Lightning.                     😀
Slashing, penalty, 2 minute power play coming up                            😡
What a stick save!                                                          😀
Leads the NFL in sacks with 9.5                                             😀
UCF 38 Temple 13                                                            😀
With the 30 yard completion, down to the 10 yard line                       😀
Drains the 3pt shot!!, 0:15 remaining in the game                           😀
Intercepted! Drives down the court an