# Political DEBATE: Zero-shot NLI Classification

#### This tutorial demonstrations zero-shot classification with the DEBATE models. This includes:
1. How to download the models from the Huggingface hub.
2. How to pass the models to the GPU for accelerated classification.
3. How to use the NLI classification framework and the transformers pipeline.

The Transformers library will provide access to pre-trained language models as well as an easy to use pipeline for classification.

Read the [Transformers documentation](https://huggingface.co/docs/transformers/index)

Explore the [repository of pre-trained models](https://huggingface.co/models)

#### Requirements:
1. A very basic understanding of Python.
2. Access to a GPU is beneficial, but not necessarily required for smaller data sets. Free services like Google Colab can be used if you don't have a desktop GPU.

If you want to use the models for few-shot classification, there is a [tutorial here](https://colab.research.google.com/drive/1Sv82jqRSwiIyuvEIDrhTiqF8_ClQaL0r#scrollTo=Uuaw8-qpn8S6).

# Library and Data Import

In [13]:
# install necessary libraries. The transformers and datasets libraries are necessary to run in Google Colab. If you are running this script locally you may need to install other libraries as well.
!pip install transformers datasets
# If you're using a Macbook integrated GPU, make sure to install the "accelerate" library as well.
#!pip install accelerate



In [24]:
import pandas as pd
import torch
from datasets import load_dataset
from transformers import pipeline
from sklearn.metrics import matthews_corrcoef

For this example we will use a random sample from the PolNLI test set. We can download it directly from the huggingface hub and then subset it. The 'premise' column in the dataset is the document we're going to classify. The 'hypothesis' is a statement that the model will determine if it is true or false given the contents of the premise. If the hypothesis is true, the 'entailment' label is 0. If it is not true, the label is 1.

In [16]:
ds = load_dataset("mlburnham/Pol_NLI")
test = ds['test'].to_pandas()
# we'll use a random sample of 1,000 documents for this example
test = test[['premise', 'hypothesis', 'entailment', 'task']].sample(1000, random_state = 1)
test.reset_index(drop = True, inplace = True)
test[['premise', 'entailment']].head()

Unnamed: 0,premise,entailment
0,The soldiers storming the beaches on D-Day may...,0
1,Regime warplanes and helicopters targeted Al-L...,1
2,rt @scottwalker first up this morning at #ncsc...,1
3,"With protection from the Taliban, al Qaeda and...",0
4,@aiyegbayo @KadariaAhmed LOL what nigerian pro...,1


The Transformers library offers a simple pipeline we can use to classify the data. All we need to do is specify the task and the model we will use. More information on the model can be found [here](https://huggingface.co/mlburnham/Political_DEBATE_base_v1.0)

We want to make sure we are using a GPU for fast inference. So here we quickly check to see if one is available. If you're running this on a discrete GPU then leave the code as is. If you're running this on a macbook with an integrated GPU, then change "cuda" to "mps". If this prints 'cpu' rather than 'cuda' or 'mps' then something went wrong. If you're using a Colab notebook, make sure that the runtime is set to GPU. You can change this at the top by selecting a GPU under the "change runtime type" option.

# Setting up the Classifier

Here we instantiate our classifier using the pipeline class. This is a fast and easy way to use any model on the Hugging Face Hub. The first step is to specify the device the model will use, generally a CPU or a GPU. Always use a GPU if possible as it will significantly speed up classification.

In [25]:
# Here we define a variable that will be passed to our classifier. This will check if a GPU is available, and use the CPU if one is not available.
device = "cuda" if torch.cuda.is_available() else "cpu" # if you want to use the GPU on a macbook change 'cuda' to 'mps' and make sure you have the 'accelerate' library installed.
# This line prints the device that will be used. Make sure it prints 'cuda' or 'mps' if you are trying to use a GPU.
print(f"Device: {device}")

Device: cuda


The pipeline() class will instantiate our classifier. We pass the device variable we defined above to make sure the model uses the GPU, and define the number of documents passed through the model at a time with the batch_size variable. Lower batch sizes will take longer to classify, but higher batch sizes require more GPU memory. There are diminishing return to higher batch sizes. 32 is generally a good starting point.

In [19]:
pipe = pipeline("zero-shot-classification", model="mlburnham/Political_DEBATE_base_v1.0", device = device, batch_size = 32) # To use the base model
#pipe = pipeline("zero-shot-classification", model='mlburnham/Political_DEBATE_large_v1.0', device = device, batch_size = 32) # To use the large model

# Classification
In our test set each document is paired with a different hypothesis. NLI classifiers work by pairing documents with "hypotheses" and determining if the hypothesis is true given the information in the text. To ensure that each document is paired with the correct hypothesis the code below will loop through each row of the dataframe, pairing documents with their associated hypothesis and then classifying one at a time. This is slower because it doesn't take advantage of batching, which classifies multiple documents in parallel. If all documents will be classified with the same hypothesis or set of hypotheses, or if you can group documents together by hypotheses, see the batching section for faster inference.

## Setting the multi_label argument.

The multi_label argument in the pipeline can significant change the behavior of your classifier. When a document is classified, it will calculate a probability for each possible label. If the multi_label argument is set to True, the classifier will return a dictionary that contains the probabilities for each label. In general, we recommend setting multi_label to True when using the DEBATE models.

When the multi_label argument is set to False, the classifier will use a softmax function to transform the probabilities and make sure they all add up to one. The resulting scores are thus probabilities *relative* to the other labels. This means that the returned probabilities can be inflated or depressed.

Let's look at an example of what the classifier returns with each setting.

In [26]:
test_doc = 'I cant wait for Trump to leave the White House!'
hypothesis_template = 'The author of this text {} Trump.'
test_labels = ['supports', 'opposes', 'hates']
pipe(test_doc, test_labels, hypothesis_template = hypothesis_template, multi_label = True)

{'sequence': 'I cant wait for Trump to leave the White House!',
 'labels': ['opposes', 'hates', 'supports'],
 'scores': [0.9999675750732422, 0.9999517202377319, 3.6217461456544697e-06]}

In [27]:
test_doc = 'I cant wait for Trump to leave the White House!'
hypothesis_template = '{}'
test_labels = ['The author of this text supports Trump.', 'This text is about Trump.', 'This text is about climate change.']
pipe(test_doc, test_labels, hypothesis_template = hypothesis_template, multi_label = True)

{'sequence': 'I cant wait for Trump to leave the White House!',
 'labels': ['This text is about Trump.',
  'This text is about climate change.',
  'The author of this text supports Trump.'],
 'scores': [0.999994695186615, 0.0025538229383528233, 3.6217461456544697e-06]}

In the above example, the model returns a dictionary with the raw probabilities in a list labeled 'scores.' We see that neither label is likely.

In [22]:
test_doc = 'I cant wait for Trump to leave the White House!'
hypothesis_template = 'The author of this text {} Trump.'
test_labels = ['supports', 'opposes', 'hates']
pipe(test_doc, test_labels, hypothesis_template = hypothesis_template, multi_label = False)

{'sequence': 'I cant wait for Trump to leave the White House!',
 'labels': ['opposes', 'hates', 'supports'],
 'scores': [0.552904486656189, 0.4470865726470947, 8.94826098374324e-06]}

In this example, relative scores are calculated using a softmax function.

Set multi_label to False when you know that the correct label for your documents is one of the labels in your set of hypotheses. If it is possible for multiple or none of the hypotheses to be true, set multi_label to True.

## Inference with a for loop for when each documents has a unique hypothesis.
If you want to pair each document with a set of hypotheses particular to that document, you can use this for loop. In our sample data, each document is paired with a single specific hypothesis, and the classifier will return entailment probability for each document-hypothesis pair.

In [None]:
colname = 'debate_label' # the name of the column where we will assign out labels to
test[colname] = 0

for i in test.index:
    hypothesis = test.loc[i, 'hypothesis'] # get the right entailment hypothesis
    sample = test.loc[i, 'premise'] # get the document to be classified
    res = pipe(sample, hypothesis, hypothesis_template = '{}', multi_label = multi_label) # classify the document-hypothesis pair
    test.loc[i, colname] = round(res['scores'][0]) # here we extract the probability from the resulting dictionary, round the number to 0 or 1, and assign it to the dataframe
test[colname].replace({0:1, 1:0}, inplace = True) # in our data entailment is labeled as 0 and not entailment is 1, so we recode the 0 and 1 probabilities to match the entailment labels
test[colname] = test[colname].astype(int)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  test[colname].replace({0:1, 1:0}, inplace = True) # in our data entailment is labeled as 0 and not entailment is 1, so we recode the 0 and 1 probabilities to match the entailment labels


In [None]:
# Check the results
matthews_corrcoef(test['entailment'], test['debate_label'])

0.9004438415985251

## Batched inference for when all documents are classified with the same hypotheses.
If all documents will be classified with the same hypotheses, you can use batched inference for faster classification.

In [None]:
test.head()

Unnamed: 0,premise,hypothesis,entailment,task
0,The soldiers storming the beaches on D-Day may...,This text advocates for violence against jews.,0,hatespeech and toxicity
1,Regime warplanes and helicopters targeted Al-L...,This text is about a protest with excessive fo...,1,event extraction
2,rt @scottwalker first up this morning at #ncsc...,The author of this text opposes jim jordan.,1,stance detection
3,"With protection from the Taliban, al Qaeda and...",This text is about terrorism.,0,topic classification
4,@aiyegbayo @KadariaAhmed LOL what nigerian pro...,This text advocates for violence against peopl...,1,hatespeech and toxicity


For batched inference we create a list of all the documents we want to classify, a template for the hypothesis that each document will be paired with, and then a list of possible labels. The {} in the hypothesis template will be populated with each string in the list of labels.

In [None]:
samples = list(test['premise']) # convert all of the documents we want to classify to a list
template = 'The author of this tweet {} Trump.'
# multilabel entailment labels
labels = ['supports', 'opposes', 'is neutral towards']

Now we classify the data by passing our documents, labels, and template to the classifier. The model will pair each document with each of the three hypotheses:
* The author of this tweet supports Trump.
* The author of this tweet opposes Trump.
* The author of this tweet is neutral towards Trump.

It well then determine the probablility that each hypothesis is true given the document. The assigned label will be the hypothesis that is most likely to be true.

In [None]:
# classify the documents
# The multilabel option determines if more than one hypothesis can be true for the document.
# If false, the string representing the most likely label is returned. If true, a dictionary of labels and their estimated probability is returned.
res = pipe(samples, labels, hypothesis_template = template, multi_label = True)
# return the most probable label and add it to our data frame
test['debate_label'] = [label['labels'][0] for label in res]
test.head()

Unnamed: 0,premise,hypothesis,entailment,task,debate_label
0,The soldiers storming the beaches on D-Day may...,This text advocates for violence against jews.,0,hatespeech and toxicity,supports
1,Regime warplanes and helicopters targeted Al-L...,This text is about a protest with excessive fo...,1,event extraction,is neutral towards
2,rt @scottwalker first up this morning at #ncsc...,The author of this text opposes jim jordan.,1,stance detection,is neutral towards
3,"With protection from the Taliban, al Qaeda and...",This text is about terrorism.,0,topic classification,is neutral towards
4,@aiyegbayo @KadariaAhmed LOL what nigerian pro...,This text advocates for violence against peopl...,1,hatespeech and toxicity,is neutral towards


Labels are returned as plain text, so we now recode them to binary labels to evaluate classification performance.

In [None]:
# recode multilabel labels
test['debate_label'].replace(regex = {r'supports':1, r'opposes':0, r'is neutral towards': 0}, inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  test['debate_label'].replace(regex = {r'supports':1, r'opposes':0, r'is neutral towards': 0}, inplace = True)
  test['debate_label'].replace(regex = {r'supports':1, r'opposes':0, r'is neutral towards': 0}, inplace = True)


In [None]:
test.head()

Unnamed: 0,premise,hypothesis,entailment,task,debate_label
0,The soldiers storming the beaches on D-Day may...,This text advocates for violence against jews.,0,hatespeech and toxicity,1
1,Regime warplanes and helicopters targeted Al-L...,This text is about a protest with excessive fo...,1,event extraction,0
2,rt @scottwalker first up this morning at #ncsc...,The author of this text opposes jim jordan.,1,stance detection,0
3,"With protection from the Taliban, al Qaeda and...",This text is about terrorism.,0,topic classification,0
4,@aiyegbayo @KadariaAhmed LOL what nigerian pro...,This text advocates for violence against peopl...,1,hatespeech and toxicity,0


These hypotheses weren't related to the original documents and so we won't evaluate classification performance.