# Natural Language Inference

Zero-shot classification can also be done using [Natural Language Inference (NLI)](https://nlpprogress.com/english/natural_language_inference.html), which refers to the task of determining, given a "premise" and a "hypothesis", whether the hypothesis is:

- True (**entailment**)
- False (**contradiction**)
- Undetermined (**neutral**)

Example:

| Premise | Hypothesis | Label |
| --- | --- | --- |
| A soccer game with multiple males playing. | Some men are playing a sport. | entailment |
| A man inspects the uniform of a figure in some East Asian country. | The man is sleeping. | contradiction |
| An older and younger man smiling. | Two men are smiling and laughing at the cats playing on the floor. | neutral |

NLI (Natural Language Inference) can pull off zero-shot classification by turning the task into a true/false question. Here's how:
1. take the text you want to classify (e.g., a movie review) and call it the "premise."
2. craft a "hypothesis" like, “This is a positive review.”
3. The model checks if this hypothesis follows from the premise (entailment = true) or contradicts it (false).
    - If it "entails," label it positive;
    - if it "contradicts," it’s negative.

You don't even need specific training for this.

Zero, single and few-shot classification seem to be an emergent feature of large language models. This feature seems to come about around model sizes of +100M parameters. The effectiveness of a model at a zero, single or few-shot task seems to scale with model size, meaning that larger models (models with more trainable parameters or layers) generally do better at this task.

- Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches.
- Benchmark datasets used for NLI include [SNLI](https://paperswithcode.com/dataset/snli), [MultiNLI](https://paperswithcode.com/dataset/multinli), [SciTail](https://paperswithcode.com/dataset/scitail), among others.
- You can get hands-on practice on the SNLI task by following this [d2l.ai chapter](https://d2l.ai/chapter_natural-language-processing-applications/natural-language-inference-and-dataset.html).

Let's grab a [NLI model from HuggingFace](https://huggingface.co/models?pipeline_tag=zero-shot-classification) and demonstrate how to use it for zero-shot classification:

In [1]:
%pip install -qU datasets transformers[sentencepiece]

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m512.3/512.3 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.7/47.7 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m63.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from transformers import pipeline

# Pre-trained MNLI model
pipe = pipeline(
    model="facebook/bart-large-mnli",
    device='cuda',
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda


In [3]:
predictions = pipe("I have a problem with my iphone that needs to be resolved asap!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)
predictions

{'sequence': 'I have a problem with my iphone that needs to be resolved asap!',
 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'],
 'scores': [0.5227577686309814,
  0.45814040303230286,
  0.014264780096709728,
  0.0026850062422454357,
  0.0021520727314054966]}

Let's make it multi-label classification via `multi_labels=True`:

In [4]:
predictions = pipe("I have a problem with my iphone that needs to be resolved asap!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
    multi_label=True
)
predictions

{'sequence': 'I have a problem with my iphone that needs to be resolved asap!',
 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'],
 'scores': [0.9987171292304993,
  0.9945850968360901,
  0.18989649415016174,
  0.0007674150983802974,
  0.0003826087631750852]}

Let's try running this on the `rotten_tomatoes` dataset (movie reviews):

In [5]:
import pandas as pd
from datasets import load_dataset

tomatoes = load_dataset("rotten_tomatoes")

# Pandas for easier control
# tomatoes_train_df = pd.DataFrame(tomatoes["train"])
tomatoes_eval_df = pd.DataFrame(tomatoes["test"])
tomatoes_eval_df.head()

README.md: 0.00B [00:00, ?B/s]

train.parquet:   0%|          | 0.00/699k [00:00<?, ?B/s]

validation.parquet:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

test.parquet:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Unnamed: 0,text,label
0,lovingly photographed in the manner of a golde...,1
1,consistently clever and suspenseful .,1
2,"it's like a "" big chill "" reunion of the baade...",1
3,the story gives ample opportunity for large-sc...,1
4,"red dragon "" never cuts corners .",1


It takes 44s to classify all 1,066 examples on a run on T4 GPU:

In [6]:
# Candidate labels
candidate_labels = [
    "negative movie review",
    "positive movie review",
]
candidate_labels_dict = {k: v for k, v in enumerate(candidate_labels)}

# Create predictions
predictions = pipe(tomatoes_eval_df.text.tolist(), candidate_labels)

In [7]:
predictions

[{'sequence': 'lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .',
  'labels': ['positive movie review', 'negative movie review'],
  'scores': [0.9851959943771362, 0.01480401773005724]},
 {'sequence': 'consistently clever and suspenseful .',
  'labels': ['positive movie review', 'negative movie review'],
  'scores': [0.9844918847084045, 0.015508100390434265]},
 {'sequence': 'it\'s like a " big chill " reunion of the baader-meinhof gang , only these guys are more harmless pranksters than political activists .',
  'labels': ['negative movie review', 'positive movie review'],
  'scores': [0.5252113342285156, 0.4747886657714844]},
 {'sequence': 'the story gives ample opportunity for large-scale action and suspense , which director shekhar kapur supplies with tremendous skill .',
  'labels': ['positive movie review', 'negative movie review'],
  'scores': [0.9823812246322632, 0.017618747428059578]},
 {'sequenc

## Exercise

Identify the user intent using NLI:

- `"Hello, I want to get me a Laptop how much does it cost?"`
    - `"BUY Laptop"`
- `"I am very frustrated with your service, and I wanna cancel right now!"`
    - `"CANCEL Subscription"`
- `"Here I bought this Keyboard, but it is not working. I want to get my money back!"`
    - `"REFUND Transaction"`

In [8]:
# YOUR CODE HERE
text=['Hello, I want to get me a Laptop how much does it cost?',
      'I am very frustrated with your service, and I wanna cancel right now!',
      'Here I bought this Keyboard, but it is not working. I want to get my money back!']
predictions = pipe(text,
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer" , 'buy laptop','cancel subscribtion','refund transaction'],
                   multi_label=True
)
predictions

[{'sequence': 'Hello, I want to get me a Laptop how much does it cost?',
  'labels': ['computer',
   'buy laptop',
   'urgent',
   'not urgent',
   'refund transaction',
   'cancel subscribtion',
   'phone',
   'tablet'],
  'scores': [0.9964963793754578,
   0.987543523311615,
   0.8275675773620605,
   0.011202964931726456,
   0.007790549658238888,
   0.0004302078450564295,
   0.00030100904405117035,
   0.00029095777426846325]},
 {'sequence': 'I am very frustrated with your service, and I wanna cancel right now!',
  'labels': ['urgent',
   'cancel subscribtion',
   'phone',
   'computer',
   'tablet',
   'refund transaction',
   'buy laptop',
   'not urgent'],
  'scores': [0.9726293683052063,
   0.9443413019180298,
   0.7737113237380981,
   0.5884447693824768,
   0.36059629917144775,
   0.2171366661787033,
   0.18069994449615479,
   0.01658494397997856]},
 {'sequence': 'Here I bought this Keyboard, but it is not working. I want to get my money back!',
  'labels': ['urgent',
   'refund t

In [9]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("towhid121/netflix-life-impact-dataset-nlid")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/towhid121/netflix-life-impact-dataset-nlid?dataset_version_number=1...


100%|██████████| 6.06k/6.06k [00:00<00:00, 9.93MB/s]

Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/towhid121/netflix-life-impact-dataset-nlid/versions/1





In [10]:
import os
os.listdir(path)

['Netflix Life Impact Dataset (NLID).csv']

In [11]:
netflix=pd.read_csv(path+'/Netflix Life Impact Dataset (NLID).csv')
netflix.head()


Unnamed: 0,Movie Title,Genre,Release Year,Average Rating,Number of Reviews,Review Highlights,Minute of Life-Changing Insight,How Discovered,Meaningful Advice Taken,Suggested to Friends/Family (Y/N %)
0,The Pursuit of Happyness,Drama,2006,9.1,42000,"""Will Smith’s struggle hit hard. A must-watch!...",78:15 – Chris gets the job,Friend suggested,Persistence pays off.,92% Y
1,The Social Dilemma,Documentary,2020,8.2,35000,"""Eye-opening about tech addiction."" / ""Some cl...",12:40 – Algorithm manipulation,Social media,Limit screen time for mental health.,88% Y
2,Parasite,Thriller/Drama,2019,9.3,50000,"""Masterpiece on class inequality."" / ""Too dark...",1:12:00 – The flood scene,Netflix recommendation,Privilege isn’t always visible.,85% Y
3,Paddington 2,Comedy/Family,2017,8.8,28000,"""Pure joy! Teaches kindness effortlessly."" / ""...",33:10 – Paddington’s jail speech,Friend suggested,Always choose kindness.,95% Y
4,Inception,Sci-Fi,2010,9.0,45000,"""Mind-bending brilliance."" / ""Confusing plot.""",1:05:22 – Cobb’s totem scene,Social media,Reality is subjective.,80% Y


In [12]:
labels=netflix['Genre'].unique()
text=netflix['Review Highlights']
GT=netflix['Genre']


In [13]:
# YOUR CODE HERE
predictions = pipe(text.tolist(),
    candidate_labels=labels,
)


In [14]:
predictions

[{'sequence': '"Will Smith’s struggle hit hard. A must-watch!" / "Overly sentimental but inspiring."',
  'labels': ['Drama',
   'Action',
   'Thriller/Drama',
   'Comedy/Drama',
   'Adventure',
   'Romance/Drama',
   'Mystery',
   'Documentary',
   'Western',
   'Crime/Drama',
   'Thriller',
   'War/Drama',
   'Romance',
   'Comedy/Family',
   'Satire',
   'Historical Drama',
   'Horror/Comedy',
   'Comedy',
   'Comedy/Crime',
   'Animation',
   'Sci-Fi',
   'Horror'],
  'scores': [0.13516230881214142,
   0.11017797887325287,
   0.10778098553419113,
   0.06969153881072998,
   0.05830329284071922,
   0.05724642798304558,
   0.049460623413324356,
   0.045289840549230576,
   0.04216363653540611,
   0.041838981211185455,
   0.04013817012310028,
   0.035174403339624405,
   0.03192659094929695,
   0.029295964166522026,
   0.025565123185515404,
   0.025310100987553596,
   0.020900143310427666,
   0.02002245932817459,
   0.019076017662882805,
   0.014474038034677505,
   0.011513088829815388,
 

In [15]:
y_hat=[]
prob=[]
for prediction in predictions:
    y_hat.append(prediction['labels'][0])
    prob.append(prediction['scores'][0])


df=pd.DataFrame({"text":text,'GT':GT,'prediction':y_hat,'score':prob})
pd.set_option('display.max_colwidth', None)
df

Unnamed: 0,text,GT,prediction,score
0,"""Will Smith’s struggle hit hard. A must-watch!"" / ""Overly sentimental but inspiring.""",Drama,Drama,0.135162
1,"""Eye-opening about tech addiction."" / ""Some claims feel exaggerated.""",Documentary,Action,0.133140
2,"""Masterpiece on class inequality."" / ""Too dark for some viewers.""",Thriller/Drama,Action,0.171654
3,"""Pure joy! Teaches kindness effortlessly."" / ""Not for edgy viewers.""",Comedy/Family,Comedy/Family,0.183969
4,"""Mind-bending brilliance."" / ""Confusing plot.""",Sci-Fi,Mystery,0.141721
...,...,...,...,...
77,"""Worst Netflix film?"" / ""Unwatchable.""",Action,Mystery,0.074018
78,"""Instant Christmas classic."" / ""Predictable.""",Animation,Comedy/Family,0.366599
79,"""Chadwick Boseman’s last role."" / ""Theatrical.""",Drama,Drama,0.247807
80,"""Devastating childbirth scene."" / ""Graphic.""",Drama,Horror,0.458950


In [16]:
# cal acc between GT and y_hat
from sklearn.metrics import accuracy_score
accuracy_score(df['GT'], df['prediction'])

0.2073170731707317