# Zero-Shot Event Classification

In [1]:
import json
import torch
import numpy as np
from pathlib import Path
from collections import defaultdict, Counter
from pprint import pprint
from sklearn.metrics import precision_recall_fscore_support
from sentence_transformers import SentenceTransformer, util


def read_dataset(path):
    """Load tsv dataset from CASE 2021 shared task."""
    with open(path) as f:
        dataset = []
        for line in list(f)[1:]:
            id, text, label = line.strip().split("\t")
            item = {
                "id": id, "text": text, "label": label
            }
            dataset.append(item)
    return dataset

## Prepare Dataset and Labels

We first load the following data from the CASE 2021 Fine Grained Event classification shared task
* `test_set_final_release_with_labels.tsv`: The test dataset, containing event descriptions and their labels
* `label_to_description.json`: A mapping between between labels and label descriptions

In [2]:
DIR = Path("../data")

dataset = read_dataset(DIR / "test_set_final_release_with_labels.tsv")

with open(DIR / "label_to_description.json") as f:
    label_to_description = json.load(f)

By "label description" we mean a string of text that represents the meaning of a label that we want to be able to predict. In zero-shot learning, these descriptions can replace ground-truth labeled examples for a class. 

The original label descriptions in the ACLED event classification taxonomy are short phrases describing each concept:

In [3]:
label_to_description

{'ABDUCT_DISSAP': 'Abduction/forced disappearance',
 'AGREEMENT': 'Agreement',
 'AIR_STRIKE': 'Air/drone strike',
 'ARMED_CLASH': 'Armed clash',
 'ARREST': 'Arrests',
 'ART_MISS_ATTACK': 'Shelling/artillery/missile attack',
 'ATTACK': 'Attack',
 'ATTRIB': 'Attribution of responsibility',
 'CHANGE_TO_GROUP_ACT': 'Change to group/activity',
 'CHEM_WEAP': 'Chemical weapon',
 'DIPLO': 'Diplomatic event',
 'DISR_WEAP': 'Disrupted weapons use',
 'FORCE_AGAINST_PROTEST': 'Excessive force against protesters',
 'GOV_REGAINS_TERIT': 'Government regains territory',
 'GRENADE': 'Grenade',
 'HQ_ESTABLISHED': 'Headquarters or base established',
 'MAN_MADE_DISASTER': 'Man-made disaster',
 'MOB_VIOL': 'Mob violence',
 'NATURAL_DISASTER': 'Natural disaster',
 'NON_STATE_ACTOR_OVERTAKES_TER': 'Non-state actor overtakes territory',
 'NON_VIOL_TERRIT_TRANSFER': 'Non-violent transfer of territory',
 'ORG_CRIME': 'Organized crime',
 'OTHER': 'Other',
 'PEACE_PROTEST': 'Peaceful protest',
 'PROPERTY_DISTRUCT

Let's now extract the useful bits from the shared task files that we'll need later for classification and evaluation:

In [4]:
texts = [x["text"] for x in dataset]
y_true = [x["label"] for x in dataset]

label_names = sorted(label_to_description)
label_descriptions = [label_to_description[l] for l in label_names]

### Zero Shot Labels

The following subset of labels is used for a zero-shot evaluation in the CASE 2021 shared task. Note that we treat all labels in a zero-shot fashion, but for the purposes of the shared task we will do separate experiments on this subset to in order to compare results with other submissions in the shared task.

In [5]:
zero_shot_labels = ["ORG_CRIME", "NATURAL_DISASTER", "MAN_MADE_DISASTER", "DIPLO", "ATTRIB"]

In [6]:
texts_zs, y_true_zs = zip(*[(text, label) for text, label in zip(texts, y_true) if label in zero_shot_labels])

## Implementing Simple Zero-Shot Classifier

Our approach in a nutshell:
* We use a sentence encoder from `sentence-transformers` to convert both label descriptions and texts to predict into embeddings that live in the same embedding space.
* At test time, we embed a new text and compare it to each label embedding via cosine similarity.
* We assign the label with the highest similarity to the item.
* Optionally, we define a minimum similarity threshold that a label needs to pass. If no label passes this threshold, we assign the "OTHER" class.


In [7]:
class ZeroShotClassifier:
    
    def __init__(self, model=None, threshold=None, null_label="OTHER"):
        self.model = model
        self.labels = []
        self.label_embeddings = None
        self.threshold = threshold
        self.null_label = null_label
    
    def train(self, labels, descriptions):
        self.labels = labels
        self.label_embeddings = model.encode(descriptions)
    
    def predict(self, input_texts=None, input_embeddings=None, output_scores=False):
        
        if input_embeddings is None:
            input_embeddings = self.model.encode(input_texts)
            
        S = util.pytorch_cos_sim(input_embeddings, self.label_embeddings)
        
        predicted_labels = []
        predicted_scores = []
        for i in range(input_embeddings.shape[0]):
            label_scores = S[i].tolist()
            scored = sorted(
                zip(self.labels, label_scores),
                key=lambda x: x[1],
                reverse=True
            )
            pred, score = scored[0]
            if self.threshold is not None and score < self.threshold:
                pred = self.null_label
                
            predicted_scores.append(scored)
            predicted_labels.append(pred)        
        
        if output_scores:
            return predicted_labels, predicted_scores
        else:
            return predicted_labels

## Initializing Classifier

In [10]:
device = "cpu" # set as "cuda" instead if you have a GPU set up
# the first time this line runs the model will be downloaded 
model = SentenceTransformer("paraphrase-mpnet-base-v2", device=device)

In [11]:
zs_classifier = ZeroShotClassifier(model=model)
zs_classifier.train(labels=label_names, descriptions=label_descriptions)

## Predicting and Evaluating Labels

In [12]:
def evaluate(true_labels, pred_labels, label_set=None):
    for avg in ["micro", "macro", "weighted"]:        
        p, r, f, _ = precision_recall_fscore_support(
            true_labels, pred_labels,
            average=avg, labels=label_set, zero_division=0
        )
        gap = " " * (9 - len(avg))
        print(f"{avg}{gap}precision: {p:.3f}, recall: {r:.3f}, f-score: {f:.3f}")

In [13]:
predicted_labels = zs_classifier.predict(input_texts=texts)

In [15]:
# remember, as mentioned above, we're doing zero-shot classification everywhere
# in this system, but we separated the label sets to fit the shared-task evaluation 
# setup
predicted_labels_zs = zs_classifier.predict(input_texts=texts_zs)

Evaluation on the entire test set:

In [16]:
evaluate(y_true, predicted_labels)

micro    precision: 0.520, recall: 0.520, f-score: 0.520
macro    precision: 0.528, recall: 0.495, f-score: 0.461
weighted precision: 0.569, recall: 0.520, f-score: 0.489


Evaluating only on subset of the labels that is used for zero-shot in the shared task:

In [17]:
evaluate(y_true_zs, predicted_labels_zs, label_set=zero_shot_labels)

micro    precision: 0.840, recall: 0.358, f-score: 0.502
macro    precision: 0.914, recall: 0.383, f-score: 0.477
weighted precision: 0.920, recall: 0.358, f-score: 0.443


While there is room for improvement, these results are pretty good given that we didn't use any training data at all!

## Bonus: Building your own Zero-Shot Classifier

You can build a custom zero-shot classifier in a few lines of code!

Let's say we're interested in a small number of natural disasters mentioned in news headlines: earthquakes, wildfires and floods. <br>
We want our classifier to detect and classify these and label everything else as "OTHER".

To do this, we set our classifier up with embeddings of very simple label descriptions ("earthquake", "wildfire", "floods"):

In [18]:
my_classifier = ZeroShotClassifier(
    model=model,
    threshold=0.3,    
    null_label="OTHER"
)

my_classifier.train(
    labels=["EARTHQUAKE", "WILDFIRE", "FLOODS"],
    descriptions=["earthquake", "wildfire", "floods"]
)

Let's apply the classifier to some examples:

In [19]:
my_classifier.predict([
    "Death toll from Hurricane Ida floods rises to 65 in US",
    "As California burns, some ecologists say it’s time to rethink forest management",
    "Maharashtra: Tremor in Kolhapur, no casualty",
    "Leaked Guntrader firearms data file shared. Worst case scenario?",
    "Taliban take control of last holdout in Panjshir Valley"
])

['FLOODS', 'WILDFIRE', 'EARTHQUAKE', 'OTHER', 'OTHER']

Results look good!

The test examples for `WILDFIRE` and `EARTHQUAKE` above demonstrate that we can correctly classify based on semantic proximity rather than literal word match.

This is not going to work perfectly in all cases! But it's a good start for 1 minute of effort. To improve this approach you can tweak the label descriptions and the threshold. 

You can also use this approach to mine examples for each class you're interested for later manual verification, to build a dataset of ground-truth examples.

### Another Example with fine-grained event types

In [20]:
my_classifier = ZeroShotClassifier(
    model=model,
    threshold=0.2,    
    null_label="OTHER"
)

my_classifier.train(
    labels=["COMP-ACQUISITION", "STAKE-ACQUISITION"],
    descriptions=[
        "Company acquires other company",
        "Company buys stocks/stake in other company"
    ]
)

In [21]:
my_classifier.predict([
    "Galetech Group buys majority stake in Optinergy",
    "SoftBank acquires minor stake in Deutsche Telekom in new 'long-term partnership'",
    "EQT buys stake in Sweden's Storytel, becomes second largest shareholder",
    "UK’s Digital 9 Infrastructure acquires Verne Global for €269.1M; here’s why",
    "French technology company Lectra acquires Gemini CAD systems",
    "Quercus buys Arcadia Books as Bielenberg named publisher",
])

['STAKE-ACQUISITION',
 'STAKE-ACQUISITION',
 'STAKE-ACQUISITION',
 'COMP-ACQUISITION',
 'COMP-ACQUISITION',
 'COMP-ACQUISITION']