# Zero-shot text classification for error analysis

**Author:** Glen Koundry  
**Date:** June 6, 2023  

## Overview

This notebook outlines how to use zero-shot text classification with large language models (LLMs), focusing on its application in error analysis of supervised text classification models.

This notebook's primary goal is to equip you with practical knowledge of using LLMs for zero-shot text classification and its application in performing error analysis of supervised text classification models. This understanding is becoming an increasingly valuable skill in the era of big data and dynamically evolving categories.

### Core concepts

**Text Classification** is the process of attributing predefined categories (or labels) to text. For example, identifying emails as spam or not spam is a classic text classification task.

**Zero-shot learning** represents an approach in machine learning where a model can tackle a task it hasn't explicitly seen during its training phase. This implies that the text classifier can discern and attribute labels that it hasn't been specifically trained on, a critical ability in real-world situations where labeled data is scarce or when new categories surface post-training.

**Error analysis** involves examining and understanding the mistakes made by a supervised learning model. In the context of text classification, this entails identifying the cases where a model has misclassified the text and understanding why those misclassifications occurred. Error analysis can provide insights into how to improve the model, whether by gathering more or different data, modifying the model architecture, or adjusting the training process.

Zero-shot classification exploits the power of **large language models (LLMs)** such as GPT-3, GPT-4, and others by OpenAI. These models, trained on extensive and diverse internet text, can produce human-like text, answer queries, translate languages, and more. A collateral benefit of this wide-ranging training is their ability to comprehend context and make intelligent predictions, a quality you can utilize for zero-shot classification and error analysis.

### Approaches for zero-shot text classification

In this notebook, you will delve into three distinctive approaches for zero-shot text classification using large language models:

* Embeddings-based classification: Extract high-dimensional vector representations (embeddings) of both text and labels, then measure their similarity.
* Natural Language Inference-based (NLI) Classification: This method capitalizes on the model's capability to ascertain whether a given statement is true or false within a specific context.
* Direct classification: This method involves framing the task as a question to a conversational model and interpreting its generated response.

Each of these techniques comes with its unique strengths and challenges, and are explored throughout the notebook.

## Setup

### Import libraries

In [1]:
import datetime
import json

import datarobot as dr
import numpy as np
import openai
import pandas as pd
from datasets import load_dataset
from IPython.display import HTML, Markdown, display
from openai.embeddings_utils import cosine_similarity, get_embedding
from transformers import pipeline

### Credentials

**IMPORTANT**: Before running this cell, you need to provide your personal DataRobot API key and your OpenAI API key. Read more about different options for [connecting to DataRobot from the client](https://docs.datarobot.com/en/docs/api/api-quickstart/api-qs.html).


To find your OpenAI API key, log in to your OpenAI account, click on your name in the upper right corner and select "View API Keys".

In the cell, replace the existing string within the quotes (`"your_openapi_api_key"`) with your personal OpenAI API Key. Be sure to keep your key within the quotes.
        
Don't forget to keep your API keys secure and avoid sharing them publicly.

In [2]:
# Place your DataRobot API key and URL here
#DATAROBOT_API_KEY = "<insert DataRobot API key here>"
DATAROBOT_API_ENDPOINT = "https://app.datarobot.com/api/v2/"                                                            

# Place you OpenAI API key here
OPENAI_API_KEY = "<insert OpenAI API key here>"
openai.api_key = OPENAI_API_KEY


### Import data

For experiments with text classification. use the `financial_phrasebank` dataset from Huggingface. This dataset comprises sentences from financial news, each labeled with a sentiment: negative, neutral, or positive, as determined by human annotators.

In [3]:
# Dataset to load from Huggingface dataset hub
DATASET_INFO = {
    "path": "financial_phrasebank",
    "name": "sentences_50agree",
    "split": "train",
}
TEXT_FIELD_NAME = "sentence"
LABEL_FIELD_NAME = "label"

dataset = load_dataset(**DATASET_INFO)

# Financial phrasebank labels
# 0 = negative, 1 = neutral, 2 = positive
labels = [                                                                                             
    "negative",                                                                                        
    "neutral",                                                                                         
    "positive",                                                                                        
]

dataset.select(range(5)).to_pandas()


Found cached dataset financial_phrasebank (/home/glen/.cache/huggingface/datasets/financial_phrasebank/sentences_50agree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)


Unnamed: 0,sentence,label
0,"According to Gran , the company has no plans t...",1
1,Technopolis plans to develop in stages an area...,1
2,The international electronic industry company ...,0
3,With the new production plant the company woul...,2
4,According to the company 's updated strategy f...,2


### Create a training and testing split

Prepare the data for your project. Divide the dataset into two subsets: one for training a supervised model and one for testing its performance. After the model is trained, the test subset will be used to evaluate the model's accuracy and to conduct error analysis, exploring where and why the model makes mistakes.

In [4]:
TEST_SIZE = 1000   

# Create a training/testing split with TEST_SIZE test_split
dataset = dataset.train_test_split(test_size=TEST_SIZE)

# Make DataFrames from the dataset
# `train_df` is only used for DataRobot supervised model training                                        
train_df = dataset["train"].to_pandas()                                                                
test_df = dataset["test"].to_pandas()

## Modeling

### Create a project for multi-class supervised learning

This code creates and trains a supervised learning model using DataRobot's AutoML functionality. It first establishes a connection to the DataRobot application. It then initiates a project and starts the AutoML process, which automatically selects the best model and retrains it on the full training dataset. Once the best model is identified and trained, the code generates predictions for the test dataset. These predictions are then converted into a more readable format, facilitating subsequent error analysis. This model will serve as the primary tool for understanding the strengths and weaknesses of supervised learning in text classification.

In [5]:
# Connect to DataRobot
dr.Client(                                                                                             
    token=DATAROBOT_API_KEY,                                                                                     
    endpoint=DATAROBOT_API_ENDPOINT,                                                                             
)

# Create a project and start Autopilot
project = dr.Project.start(                                                                            
    train_df,                                                                                          
    target=LABEL_FIELD_NAME,                                                                                    
    target_type="Multiclass",                                                                          
    project_name=f"{DATASET_INFO['path']}_{datetime.datetime.now().strftime('%Y-%m-%d %H:%M')}"                                             
)                                                                                                      
project.wait_for_autopilot() 

# Get the best-performing model (highest scoring model retrained on 100% of training data)
models = project.get_models()                                                                          
final_model = [model for model in models if model.sample_pct == 100][0]
print(f"Best model: {final_model.model_type}\n")

# Get predictions for the testing dataset
predict_job = final_model.request_predictions(dataframe=test_df)                                       
prediction_df = predict_job.get_result_when_complete()

# Change the label to `int` to make indexing the `labels` list easier
prediction_df["prediction"] = prediction_df["prediction"].astype(int)

# Show predictions
prediction_df = prediction_df.rename(columns={f"class_{i}": label for i, label in enumerate(labels)})
prediction_df

In progress: 3, queued: 0 (waited: 0s)
In progress: 3, queued: 0 (waited: 1s)
In progress: 3, queued: 0 (waited: 1s)
In progress: 3, queued: 0 (waited: 3s)
In progress: 3, queued: 0 (waited: 4s)
In progress: 3, queued: 0 (waited: 6s)
In progress: 3, queued: 0 (waited: 10s)
In progress: 3, queued: 0 (waited: 17s)
In progress: 1, queued: 0 (waited: 30s)
In progress: 1, queued: 0 (waited: 50s)
In progress: 4, queued: 8 (waited: 71s)
In progress: 2, queued: 7 (waited: 91s)
In progress: 4, queued: 4 (waited: 112s)
In progress: 4, queued: 1 (waited: 132s)
In progress: 4, queued: 0 (waited: 153s)
In progress: 3, queued: 0 (waited: 173s)
In progress: 1, queued: 0 (waited: 194s)
In progress: 0, queued: 0 (waited: 214s)
In progress: 0, queued: 0 (waited: 235s)
In progress: 0, queued: 0 (waited: 255s)
In progress: 1, queued: 0 (waited: 276s)
In progress: 0, queued: 0 (waited: 296s)
In progress: 0, queued: 0 (waited: 317s)
In progress: 0, queued: 0 (waited: 337s)
Best model: Stochastic Gradient De

Unnamed: 0,row_id,prediction,negative,neutral,positive
0,0,2,0.402966,0.179682,0.417352
1,1,2,0.061204,0.067632,0.871164
2,2,2,0.202477,0.235605,0.561918
3,3,1,0.038149,0.884505,0.077346
4,4,1,0.038980,0.812669,0.148352
...,...,...,...,...,...
995,995,1,0.034410,0.904119,0.061471
996,996,2,0.157924,0.140881,0.701195
997,997,1,0.044810,0.561261,0.393929
998,998,1,0.009900,0.833872,0.156229


### Supervised learning model results

This code evaluates the accuracy of the supervised learning model and collects details about misclassified examples. It first determines the model's accuracy by comparing predictions against actual labels. Then, it creates a list of details about instances where the model's predictions were incorrect. Each entry includes the actual and predicted labels and the text that was misclassified. For demo purposes, this process stops after collecting details for a maximum of ten misclassified instances. This information is crucial for understanding and diagnosing the types of errors our model makes.

In [6]:
# Show model accuracy
correct_predictions = prediction_df["prediction"].astype(int) == test_df[LABEL_FIELD_NAME]
print(f"Model accuracy: {correct_predictions.sum()} correct out of {correct_predictions.shape[0]}")

# Limit the number of mistakes to classify since this is just a demo
MAX_ERRORS = 20

# Create list of misclassification details
result_df = pd.concat((prediction_df, test_df), axis=1)
error_details = []
for _, result_row in result_df.iterrows():                                                             
    if result_row[LABEL_FIELD_NAME] != result_row["prediction"]:                                
        error_details.append(                                                                          
            {                                                                                          
                "actual_label": labels[result_row[LABEL_FIELD_NAME]],                                
                "predicted_label": labels[result_row["prediction"]],                                           
                "text": result_row[TEXT_FIELD_NAME],                                                   
            }                                                                                          
        )  
        if len(error_details) == MAX_ERRORS:
            break

Model accuracy: 791 correct out of 1000


## Error analysis of misclassified text

### Define error classes

The first step is to decide on what types of misclassification you are interested in. This piece of code declares a list, `error_class_templates`, containing potential classes of errors that may occur during text sentiment classification. These error classes are expressed in the form of descriptive strings:

1. "The sentiment of this statement is ambiguous." This error class indicates that the sentiment expressed in the given statement cannot be clearly determined. It could be due to the use of both positive and negative language, sarcasm, or complex language structure.

2. "The sentiment is unclear and requires more context" - This error class implies that the given statement lacks enough context to make an accurate sentiment classification. This can occur if the statement is too short, vague, or depends heavily on preceding or following information not included in the text.

3. "The sentiment of this statement is not {}" - This error class represents situations where the classifier incorrectly assigns a sentiment. The `{}` is a placeholder that will be replaced with the actual sentiment label predicted by the classifier, allowing this message to dynamically reflect the specific misclassification that occurred.

4. "The sentiment of this statement is {}" - This means that the example and label are valid and that
the supervised classifier made a mistake.

These classes represent common types of errors that the classifier might make, and they serve as a starting point for error analysis. During this analysis, misclassified examples will be categorized according to these templates.

In [7]:
# List of possible classification errors. Replace "{}" with the actual label.
error_class_templates = [
    "The sentiment of this statement is ambiguous.",
    "The sentiment is unclear and requires more context",
    "The sentiment of this statement is not {}",
    "The sentiment of this statement is {}",
]

### Helper function for visualizing results

This function takes a DataFrame containing the results of the error analysis and displays the results using markdown formatting.

In [8]:
def show_error_class_results(result_df):
    for _, result_row in result_df.iterrows():
        display(Markdown(f"## Sentence\n{result_row['sentence']}"))
        display(Markdown(f"**Actual label:** {result_row['actual_label']}"))
        display(Markdown(f"**Predicted label:** {result_row['predicted_label']}"))
        probabilities = result_row[error_class_templates].tolist()
        error_classes = [
            class_template.format(result_row['actual_label'])
            for class_template in error_class_templates
        ]
        display(pd.Series(probabilities, index=error_classes).to_frame(name="probability"))    

## Zero-shot classification

With a text classification model and list of errors, you can now venture into the exciting domain of zero-shot learning. Zero-shot learning allows you to apply the model to text categories it has never seen before during training, broadening the applicability and versatility of the classifier. In the next sections, explore three distinct methods for implementing zero-shot text classification. These methods are:

* Embeddings-based classification. 
* Natural Language Inference-based (NLI) classification
* Direct classification 

Each technique offers a unique approach to the problem and is based on different principles, with its own benefits and considerations. The aim is to equip you with a variety of tools and perspectives on how to leverage large language models for text classification in a zero-shot setting. Let's delve into each of these techniques and understand how they enhance text classification capabilities.

### Embeddings-based classification

Embeddings-based classification is a technique for zero-shot text classification that leverages the semantic representations of text encoded as high-dimensional vectors, also known as embeddings. Large language models (LLMs), like GPT-3, GPT-4, or BERT, are excellent at generating these embeddings as they have been trained on a diverse range of internet text, learning the contextual and semantic nuances of the language in the process.

Here's a simplified breakdown of how the process works:

1. **Generate embeddings**: For each text input and each possible label, generate embeddings. This is done by passing the text and labels through the LLM. The LLM then produces a high-dimensional vector for each, encapsulating their semantic information.

2. **Calculate cosine similarity**: Calculate the cosine similarity between the text's embedding and each of the label embeddings. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. The higher the cosine similarity, the smaller the angle and greater the match between vectors.

3. **Classify**: The label whose embedding has the highest cosine similarity to the text's embedding is chosen as the predicted label for that text.

The power of this method lies in its ability to capture semantic similarity between text and labels, allowing it to categorize text into labels that it wasn't explicitly trained on. This method is especially useful when you don't have a lot of labeled data for training, a common scenario in real-world applications.

#### Implement embeddings-based zero-shot classification

The cell below performs zero-shot text classification using the embeddings-based method. It starts by importing the necessary functions from OpenAI's `embeddings_utils` module. The `get_embedding` function is used to generate embeddings for each label and sentence in the test dataset using OpenAI's `text-embedding-ada-002` engine.

Next, the cosine similarity between the embeddings of each sentence and each label is computed. These similarities are then scaled and converted into pseudo-probabilities representing the likelihood of each label being the correct one for each sentence.

Finally, the predictions and pseudo-probabilities are stored in a DataFrame, which is then passed to the `show_error_class_results` function to display the results of the classification.


In [9]:
# This method doesn't estimate probabilities but you can approximate them by using a scaled softmax
CS_SCALE = 32                                                                                      

embedding_results = []
for error_detail in error_details:
    # Get embeddings for the misclassified examples
    sentence_embedding = get_embedding(error_detail["text"], engine="text-embedding-ada-002")

    # Add the label to "{}" fields in error class templates
    error_classes = [
        error_class_template.format(error_detail["actual_label"]) 
        for error_class_template in error_class_templates
    ]
    
    # Get embeddings for each error class
    label_embeddings = [
        get_embedding(error_class, engine="text-embedding-ada-002") 
        for error_class in error_classes
    ]

    # Compute the similarity of the sentence with each label
    similarities = [                                                                                           
        cosine_similarity(sentence_embedding, label_embedding)                                           
        for label_embedding in label_embeddings                                                      
    ]
    
    # Make pseudo-probabilities from similarity scores
    probs = np.exp(CS_SCALE * np.array(similarities))                                                          
    probs = probs / probs.sum()
    
    embedding_results.append(
        dict(
            sentence=error_detail["text"],
            actual_label=error_detail["actual_label"],
            predicted_label=error_detail["predicted_label"],
            **dict(zip(error_class_templates, probs)),
        )
    )

embedding_results_df = pd.DataFrame(embedding_results)
show_error_class_results(embedding_results_df)

## Sentence
The sales of the Tiimari segment fell by 4.0 % year-on-year to EUR3 .3 m in June 2010 .

**Actual label:** negative

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.224257
The sentiment is unclear and requires more context,0.182567
The sentiment of this statement is not negative,0.201853
The sentiment of this statement is negative,0.391323


## Sentence
At 1411 CET , ArcelorMittal had lost 7.26 % to EUR 17.38 on Euronext Paris , coming at the lead of the blue-chip fallers .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.17861
The sentiment is unclear and requires more context,0.264658
The sentiment of this statement is not negative,0.199378
The sentiment of this statement is negative,0.357354


## Sentence
The liquidity providing was interrupted on May 11 , 2007 when Aspocomp Group Oyj 's shares traded below 0.50 cent ( Aspocomp 's stock exchange release 11.5.2007 ) .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.284236
The sentiment is unclear and requires more context,0.326175
The sentiment of this statement is not negative,0.128053
The sentiment of this statement is negative,0.261536


## Sentence
At 10.58 am , Outokumpu declined 2.74 pct to 24.87 eur , while the OMX Helsinki 25 was 0.55 pct higher at 2,825.14 and the OMX Helsinki added 0.64 pct to 9,386.89 .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.213051
The sentiment is unclear and requires more context,0.286119
The sentiment of this statement is not neutral,0.232696
The sentiment of this statement is neutral,0.268134


## Sentence
A new production line is being completed for the contract production of hormone treatments .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.15748
The sentiment is unclear and requires more context,0.167685
The sentiment of this statement is not positive,0.275202
The sentiment of this statement is positive,0.399633


## Sentence
The company intends to raise production capacity in 2006 .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.200621
The sentiment is unclear and requires more context,0.206832
The sentiment of this statement is not positive,0.19004
The sentiment of this statement is positive,0.402506


## Sentence
A & euro ; 4.8 million investment in 13.6 % of Lewa netted Deutsche Beteiligungs & euro ; 21 million .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.197793
The sentiment is unclear and requires more context,0.249004
The sentiment of this statement is not positive,0.231761
The sentiment of this statement is positive,0.321442


## Sentence
`` Several growth initiatives in the chosen geographic areas are already ongoing , '' it continued , noting Lindex opened its first store in the Czech Republic this autumn in Brno .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.190319
The sentiment is unclear and requires more context,0.213648
The sentiment of this statement is not positive,0.21676
The sentiment of this statement is positive,0.379273


## Sentence
We offer our clients integrated management consulting , total solutions for complex projects and efficient , best-in-class design and supervision .

**Actual label:** neutral

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.292363
The sentiment is unclear and requires more context,0.290439
The sentiment of this statement is not neutral,0.19476
The sentiment of this statement is neutral,0.222438


## Sentence
Finnish automation solutions developer Cencorp Corporation ( OMX Helsinki : CNC1V ) said on Friday ( 27 June ) that it has completed employee negotiations regarding a reorganisation of its operations .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.206098
The sentiment is unclear and requires more context,0.223982
The sentiment of this statement is not positive,0.242336
The sentiment of this statement is positive,0.327584


## Sentence
Employing 112 in Finland and 280 abroad , the unit recorded first-quarter 2007 sales of 8.6 mln eur , with an operating loss of 1.6 mln eur .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.235814
The sentiment is unclear and requires more context,0.20749
The sentiment of this statement is not negative,0.225702
The sentiment of this statement is negative,0.330994


## Sentence
Kesko pursues a strategy of healthy , focused growth concentrating on sales and services to consumer-customers .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.190971
The sentiment is unclear and requires more context,0.188719
The sentiment of this statement is not positive,0.19927
The sentiment of this statement is positive,0.421041


## Sentence
The new technology improves the glass quality and consistency while increasing throughput .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.168874
The sentiment is unclear and requires more context,0.172209
The sentiment of this statement is not positive,0.16854
The sentiment of this statement is positive,0.490377


## Sentence
After the reporting period , BioTie North American licensing partner Somaxon Pharmaceuticals announced positive results with nalmefene in a pilot Phase 2 clinical trial for smoking cessation .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.206888
The sentiment is unclear and requires more context,0.193635
The sentiment of this statement is not positive,0.23786
The sentiment of this statement is positive,0.361617


## Sentence
- The Group -¦ s result before taxes was a loss of EUR 0.6 ( +0.6 ) million .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.181527
The sentiment is unclear and requires more context,0.266386
The sentiment of this statement is not neutral,0.232033
The sentiment of this statement is neutral,0.320054


## Sentence
( ADPnews ) - May 4 , 2010 - Finnish cutlery and hand tools maker Fiskars Oyj Abp ( HEL : FISAS ) said today its net profit declined to EUR 12.9 million ( USD 17m ) in the first quarter of 2010 from EUR 17 million in the correspond

**Actual label:** negative

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.193569
The sentiment is unclear and requires more context,0.253255
The sentiment of this statement is not negative,0.22824
The sentiment of this statement is negative,0.324935


## Sentence
The brokerage said 2006 has seen a ` true turning point ' in European steel base prices , with better pricing seen carrying through the second quarter of 2006 .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.172948
The sentiment is unclear and requires more context,0.177072
The sentiment of this statement is not positive,0.225208
The sentiment of this statement is positive,0.424773


## Sentence
Other carriers and handset makers spin it as a positive event that will raise interest for higher-end phones and pricier data plans .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.168415
The sentiment is unclear and requires more context,0.145042
The sentiment of this statement is not positive,0.221896
The sentiment of this statement is positive,0.464647


## Sentence
One of the challenges in the oil production in the North Sea is scale formation that can plug pipelines and halt production .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.244714
The sentiment is unclear and requires more context,0.284396
The sentiment of this statement is not negative,0.144763
The sentiment of this statement is negative,0.326127


## Sentence
F-Secure reported that : - The first half of 2008 has seen a growing number of targeted malware attacks on individuals , companies , and organizations .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.33205
The sentiment is unclear and requires more context,0.271678
The sentiment of this statement is not neutral,0.211004
The sentiment of this statement is neutral,0.185268


### Use an Natural Language Inference (NLI) model

Zero-shot classification using a Natural Language Inference (NLI) model is a method where the model is not trained directly to predict the desired classes. Instead, the task is framed as an inference problem. For example, if you want to classify a sentence into one of several categories, you would pair the sentence with a category in the form of a hypothesis (e.g., "The sentiment of this sentence is positive"). The NLI model then predicts whether this hypothesis is true or false. You repeat this for each category, essentially asking the model to infer the most likely category. This is a powerful technique that allows you to classify text into categories that the model was never explicitly trained on, hence the term "zero-shot".

#### Implement Natural Language Inference (NLI) zero-shot classification

This code performs error analysis on misclassified instances using a Natural Language Inference (NLI) model. It creates an NLI-based classifier using the HuggingFace transformers library. For each misclassified example, it generates custom error classes using the actual label and error class templates. The NLI model is then used to determine the probabilities of each error class given the misclassified text. Finally, the code displays the sentence, actual label, predicted label, and a table of probabilities for each error class. This analysis helps in understanding why a particular misclassification occurred.

In [15]:
# Create the classifier using HuggingFace's zero-shot-classification pipeline
# NOTE: Change `device=0` to `device=-1` if you do not have a GPU
nli_classifier = pipeline(                                                                           
    "zero-shot-classification", model="facebook/bart-large-mnli", device=0                            
)

nli_results = []
for error_detail in error_details:
    # Add actual label to "{}" fields in error class templates
    error_classes = [
        error_class_template.format(error_detail["actual_label"]) 
        for error_class_template in error_class_templates
    ]
    
    # Suppress warnings from the pipeline
    nli_classifier.call_count = 0
    
    # Get error class probabilities
    predictions = nli_classifier(error_detail["text"], error_classes)
    
    # Get probabilities and change the order to match error class order
    probabilites = pd.Series(
        predictions["scores"],
        index=[error_classes.index(error_class) for error_class in predictions["labels"]],
    ).sort_index()
    
    nli_results.append(
        dict(
            sentence=error_detail["text"],
            actual_label=error_detail["actual_label"],
            predicted_label=error_detail["predicted_label"],
            **dict(zip(error_class_templates, probabilites)),
        )
    )

nli_results_df = pd.DataFrame(nli_results)
show_error_class_results(nli_results_df)

## Sentence
The sales of the Tiimari segment fell by 4.0 % year-on-year to EUR3 .3 m in June 2010 .

**Actual label:** negative

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.142118
The sentiment is unclear and requires more context,0.287853
The sentiment of this statement is not negative,0.083798
The sentiment of this statement is negative,0.486231


## Sentence
At 1411 CET , ArcelorMittal had lost 7.26 % to EUR 17.38 on Euronext Paris , coming at the lead of the blue-chip fallers .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.158349
The sentiment is unclear and requires more context,0.274411
The sentiment of this statement is not negative,0.07555
The sentiment of this statement is negative,0.491691


## Sentence
The liquidity providing was interrupted on May 11 , 2007 when Aspocomp Group Oyj 's shares traded below 0.50 cent ( Aspocomp 's stock exchange release 11.5.2007 ) .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.205035
The sentiment is unclear and requires more context,0.426099
The sentiment of this statement is not negative,0.101476
The sentiment of this statement is negative,0.267389


## Sentence
At 10.58 am , Outokumpu declined 2.74 pct to 24.87 eur , while the OMX Helsinki 25 was 0.55 pct higher at 2,825.14 and the OMX Helsinki added 0.64 pct to 9,386.89 .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.202095
The sentiment is unclear and requires more context,0.299777
The sentiment of this statement is not neutral,0.404149
The sentiment of this statement is neutral,0.093979


## Sentence
A new production line is being completed for the contract production of hormone treatments .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.165696
The sentiment is unclear and requires more context,0.294062
The sentiment of this statement is not positive,0.257409
The sentiment of this statement is positive,0.282833


## Sentence
The company intends to raise production capacity in 2006 .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.173354
The sentiment is unclear and requires more context,0.20578
The sentiment of this statement is not positive,0.214386
The sentiment of this statement is positive,0.40648


## Sentence
A & euro ; 4.8 million investment in 13.6 % of Lewa netted Deutsche Beteiligungs & euro ; 21 million .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.210959
The sentiment is unclear and requires more context,0.275693
The sentiment of this statement is not positive,0.299492
The sentiment of this statement is positive,0.213856


## Sentence
`` Several growth initiatives in the chosen geographic areas are already ongoing , '' it continued , noting Lindex opened its first store in the Czech Republic this autumn in Brno .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.276161
The sentiment is unclear and requires more context,0.324903
The sentiment of this statement is not positive,0.081254
The sentiment of this statement is positive,0.317682


## Sentence
We offer our clients integrated management consulting , total solutions for complex projects and efficient , best-in-class design and supervision .

**Actual label:** neutral

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.102444
The sentiment is unclear and requires more context,0.269746
The sentiment of this statement is not neutral,0.551713
The sentiment of this statement is neutral,0.076097


## Sentence
Finnish automation solutions developer Cencorp Corporation ( OMX Helsinki : CNC1V ) said on Friday ( 27 June ) that it has completed employee negotiations regarding a reorganisation of its operations .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.260757
The sentiment is unclear and requires more context,0.415549
The sentiment of this statement is not positive,0.183352
The sentiment of this statement is positive,0.140342


## Sentence
Employing 112 in Finland and 280 abroad , the unit recorded first-quarter 2007 sales of 8.6 mln eur , with an operating loss of 1.6 mln eur .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.185393
The sentiment is unclear and requires more context,0.414636
The sentiment of this statement is not negative,0.189526
The sentiment of this statement is negative,0.210445


## Sentence
Kesko pursues a strategy of healthy , focused growth concentrating on sales and services to consumer-customers .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.074584
The sentiment is unclear and requires more context,0.154676
The sentiment of this statement is not positive,0.126356
The sentiment of this statement is positive,0.644385


## Sentence
The new technology improves the glass quality and consistency while increasing throughput .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.077865
The sentiment is unclear and requires more context,0.211917
The sentiment of this statement is not positive,0.131065
The sentiment of this statement is positive,0.579153


## Sentence
After the reporting period , BioTie North American licensing partner Somaxon Pharmaceuticals announced positive results with nalmefene in a pilot Phase 2 clinical trial for smoking cessation .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.05961
The sentiment is unclear and requires more context,0.12108
The sentiment of this statement is not positive,0.021794
The sentiment of this statement is positive,0.797517


## Sentence
- The Group -¦ s result before taxes was a loss of EUR 0.6 ( +0.6 ) million .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.186486
The sentiment is unclear and requires more context,0.229859
The sentiment of this statement is not neutral,0.458099
The sentiment of this statement is neutral,0.125555


## Sentence
( ADPnews ) - May 4 , 2010 - Finnish cutlery and hand tools maker Fiskars Oyj Abp ( HEL : FISAS ) said today its net profit declined to EUR 12.9 million ( USD 17m ) in the first quarter of 2010 from EUR 17 million in the correspond

**Actual label:** negative

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.236957
The sentiment is unclear and requires more context,0.408623
The sentiment of this statement is not negative,0.099079
The sentiment of this statement is negative,0.255341


## Sentence
The brokerage said 2006 has seen a ` true turning point ' in European steel base prices , with better pricing seen carrying through the second quarter of 2006 .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.057389
The sentiment is unclear and requires more context,0.078691
The sentiment of this statement is not positive,0.041785
The sentiment of this statement is positive,0.822135


## Sentence
Other carriers and handset makers spin it as a positive event that will raise interest for higher-end phones and pricier data plans .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.032126
The sentiment is unclear and requires more context,0.025707
The sentiment of this statement is not positive,0.010454
The sentiment of this statement is positive,0.931713


## Sentence
One of the challenges in the oil production in the North Sea is scale formation that can plug pipelines and halt production .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.134883
The sentiment is unclear and requires more context,0.442526
The sentiment of this statement is not negative,0.178965
The sentiment of this statement is negative,0.243626


## Sentence
F-Secure reported that : - The first half of 2008 has seen a growing number of targeted malware attacks on individuals , companies , and organizations .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.113098
The sentiment is unclear and requires more context,0.196593
The sentiment of this statement is not neutral,0.643474
The sentiment of this statement is neutral,0.046835


### Conversational model prompting

Zero-shot classification using a conversational model is a method in which you frame the classification task as a question and answer conversation. Instead of providing an explicit label, the input text is presented to the model in the form of a question, and the model generates a response. 

For example, if you want to classify a piece of text as either 'positive', 'negative', or 'neutral', you can ask the model a question like "What is the sentiment of this text?" The model will then generate a response based on its understanding of the text and the question, predicting one of the classes 'positive', 'negative', or 'neutral'. 

This approach leverages the ability of conversational models to understand and generate human-like text. As these models have been trained on diverse and vast amounts of data, they can often generate surprisingly accurate predictions for classes they were not explicitly trained on, hence the term "zero-shot".

#### Implement conversational model prompting

This code analyzes misclassification errors using a conversational AI model. For each error, it provides the AI model with a set of instructions to assess why the error might have occurred based on four possible reasons. The AI model is then asked to assign a probability to each reason for the given misclassified statement. The model's response is processed and displayed as a DataFrame, providing insights into the potential causes of misclassification. By incorporating human-like understanding and context awareness, this method provides an additional layer of depth to your error analysis.

In [20]:
# Instructions for the conversational model
# Note that the instructions ask for JSON formatted output so you can have Python process the results
system_prompt_template = (                                                                                      
    "ChatGPT, I would like you to analyze a series of misclassified statements from a financial "
    "news sentiment classification model and estimate the probabilites of reasons for the misclassification. "
    "Please review each statement, and based on your understanding, determine whether the misclassification "
    "likely occurred because:\n"
    "    1. There was not enough context to accurately determine the sentiment.\n"
    "    2. The statement itself is ambiguous, making it hard to assign a clear sentiment.\n"
    "    3. The sentiment label ({label}) provided in the dataset is incorrect.\n"
    "    4. The sentiment label ({label}) provided in the dataset is correct and the model is wrong.\n"
    "Remember, your task is to analyze the statement and give your best guess for the probability "
    "of each misclassification reason from the above four numbered options.\n"
    "Please respond in JSON format with only the reason numbers and probability estimates. "
    "Use the numbers (1, 2, 3 and 4) for the reasons.\n"
    "IMPORTANT: Don't include any explanations or anything that cannot be parsed as JSON."
)                                                                                                      

chat_results = []
for error_detail in error_details:
    # Create message with the instructions and the text you want to classify
    messages = [
        {
            "role": "system",
            "content": system_prompt_template.format(label=error_detail["actual_label"]),
        },
        {
            "role": "user",
            "content": (
                f'Statement: "{error_detail["text"]}"\n'
                f'Correct Sentiment: {error_detail["actual_label"]}\n'
                f'Predicted Sentiment: {error_detail["predicted_label"]}'
            ),
        },
    ]
    
    # Send your request to a conversational model (GPT3.5)
    completion = openai.ChatCompletion.create(                                                         
        model="gpt-3.5-turbo", messages=messages, temperature=0                                        
    )
    
    # Turn ChatGPT response into a DataFrame
    result = completion.choices[0].message.content.strip()
    result_json = json.loads(result)
    
    # Get probabilities and make sure order matches error class order
    probabilities = pd.Series(result_json.values(), index=result_json.keys()).sort_index()

    chat_results.append(
        dict(
            sentence=error_detail["text"],
            actual_label=error_detail["actual_label"],
            predicted_label=error_detail["predicted_label"],
            **dict(zip(error_class_templates, probabilities)),
        )
    )

# Convert results to DataFrame and set missing probabilities to zero
chat_results_df = pd.DataFrame(chat_results).fillna(0)

show_error_class_results(chat_results_df)

## Sentence
The sales of the Tiimari segment fell by 4.0 % year-on-year to EUR3 .3 m in June 2010 .

**Actual label:** negative

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not negative,0.2
The sentiment of this statement is negative,0.0


## Sentence
At 1411 CET , ArcelorMittal had lost 7.26 % to EUR 17.38 on Euronext Paris , coming at the lead of the blue-chip fallers .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not negative,0.1
The sentiment of this statement is negative,0.0


## Sentence
The liquidity providing was interrupted on May 11 , 2007 when Aspocomp Group Oyj 's shares traded below 0.50 cent ( Aspocomp 's stock exchange release 11.5.2007 ) .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.8
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not negative,0.0
The sentiment of this statement is negative,0.0


## Sentence
At 10.58 am , Outokumpu declined 2.74 pct to 24.87 eur , while the OMX Helsinki 25 was 0.55 pct higher at 2,825.14 and the OMX Helsinki added 0.64 pct to 9,386.89 .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.8
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not neutral,0.0
The sentiment of this statement is neutral,0.0


## Sentence
A new production line is being completed for the contract production of hormone treatments .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not positive,0.1
The sentiment of this statement is positive,0.0


## Sentence
The company intends to raise production capacity in 2006 .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not positive,0.1
The sentiment of this statement is positive,0.0


## Sentence
A & euro ; 4.8 million investment in 13.6 % of Lewa netted Deutsche Beteiligungs & euro ; 21 million .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.8
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not positive,0.0
The sentiment of this statement is positive,0.0


## Sentence
`` Several growth initiatives in the chosen geographic areas are already ongoing , '' it continued , noting Lindex opened its first store in the Czech Republic this autumn in Brno .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.7
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not positive,0.0
The sentiment of this statement is positive,0.0


## Sentence
We offer our clients integrated management consulting , total solutions for complex projects and efficient , best-in-class design and supervision .

**Actual label:** neutral

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.2
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not neutral,0.1
The sentiment of this statement is neutral,0.4


## Sentence
Finnish automation solutions developer Cencorp Corporation ( OMX Helsinki : CNC1V ) said on Friday ( 27 June ) that it has completed employee negotiations regarding a reorganisation of its operations .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.8
The sentiment is unclear and requires more context,0.1
The sentiment of this statement is not positive,0.05
The sentiment of this statement is positive,0.05


## Sentence
Employing 112 in Finland and 280 abroad , the unit recorded first-quarter 2007 sales of 8.6 mln eur , with an operating loss of 1.6 mln eur .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.8
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not negative,0.0
The sentiment of this statement is negative,0.0


## Sentence
Kesko pursues a strategy of healthy , focused growth concentrating on sales and services to consumer-customers .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.2
The sentiment is unclear and requires more context,0.7
The sentiment of this statement is not positive,0.1
The sentiment of this statement is positive,0.0


## Sentence
The new technology improves the glass quality and consistency while increasing throughput .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.2
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not positive,0.1
The sentiment of this statement is positive,0.4


## Sentence
After the reporting period , BioTie North American licensing partner Somaxon Pharmaceuticals announced positive results with nalmefene in a pilot Phase 2 clinical trial for smoking cessation .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not positive,0.0
The sentiment of this statement is positive,0.2


## Sentence
- The Group -¦ s result before taxes was a loss of EUR 0.6 ( +0.6 ) million .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.7
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not neutral,0.05
The sentiment of this statement is neutral,0.05


## Sentence
( ADPnews ) - May 4 , 2010 - Finnish cutlery and hand tools maker Fiskars Oyj Abp ( HEL : FISAS ) said today its net profit declined to EUR 12.9 million ( USD 17m ) in the first quarter of 2010 from EUR 17 million in the correspond

**Actual label:** negative

**Predicted label:** positive

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not negative,0.1
The sentiment of this statement is negative,0.1


## Sentence
The brokerage said 2006 has seen a ` true turning point ' in European steel base prices , with better pricing seen carrying through the second quarter of 2006 .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.4
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not positive,0.1
The sentiment of this statement is positive,0.2


## Sentence
Other carriers and handset makers spin it as a positive event that will raise interest for higher-end phones and pricier data plans .

**Actual label:** positive

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.2
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not positive,0.1
The sentiment of this statement is positive,0.4


## Sentence
One of the challenges in the oil production in the North Sea is scale formation that can plug pipelines and halt production .

**Actual label:** negative

**Predicted label:** neutral

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.3
The sentiment of this statement is not negative,0.05
The sentiment of this statement is negative,0.05


## Sentence
F-Secure reported that : - The first half of 2008 has seen a growing number of targeted malware attacks on individuals , companies , and organizations .

**Actual label:** neutral

**Predicted label:** negative

Unnamed: 0,probability
The sentiment of this statement is ambiguous.,0.6
The sentiment is unclear and requires more context,0.2
The sentiment of this statement is not neutral,0.1
The sentiment of this statement is neutral,0.1


## Compare results from error analysis methods

### Create a table of results for the three methods

This following cell synthesizes the results of the three zero-shot learning methods – embedding-based, NLI-based, and chat-based – to analyze misclassified examples from the supervised model. It assigns simplified labels to the most probable error class for each method, resulting in a clear and concise summary. This information is then used to create a comparison table, presenting a side-by-side view of each sentence and the respective error analyses from each zero-shot method. The table provides an intuitive way to understand and compare the insights gained from different techniques.

In [22]:
# Short names to use as column headers
short_error_classes = ["Ambiguous", "No Context", "Mislabeled", "Model Misclassification"]

# For each method, get the highest probability and assign a short label
embedding_results_short = [
    short_error_classes[error_class]
    for error_class in np.argmax(embedding_results_df[error_class_templates].values, axis=1)
]
nli_results_short = [
    short_error_classes[error_class]
    for error_class in np.argmax(nli_results_df[error_class_templates].values, axis=1)
]
chat_results_short = [
    short_error_classes[error_class]
    for error_class in np.argmax(chat_results_df[error_class_templates].values, axis=1)
]

# Show summary of results in a table
html = "<table><tr><th>Sentence</th><th>Embedding</th><th>NLI</th><th>Chat</th></tr>"
for sentence, embedding_error_class, nli_error_class, chat_error_class in zip(
    embedding_results_df["sentence"],
    embedding_results_short,
    nli_results_short,
    chat_results_short,
):
    html += (
        f"<tr><td>{sentence}</td><td>{embedding_error_class}</td><td>{nli_error_class}</td>"
        f"<td>{chat_error_class}</td></tr>"
    )
display(HTML(html+"</table>"))

Sentence,Embedding,NLI,Chat
The sales of the Tiimari segment fell by 4.0 % year-on-year to EUR3 .3 m in June 2010 .,Model Misclassification,Model Misclassification,Ambiguous
"At 1411 CET , ArcelorMittal had lost 7.26 % to EUR 17.38 on Euronext Paris , coming at the lead of the blue-chip fallers .",Model Misclassification,Model Misclassification,Ambiguous
"The liquidity providing was interrupted on May 11 , 2007 when Aspocomp Group Oyj 's shares traded below 0.50 cent ( Aspocomp 's stock exchange release 11.5.2007 ) .",No Context,No Context,Ambiguous
"At 10.58 am , Outokumpu declined 2.74 pct to 24.87 eur , while the OMX Helsinki 25 was 0.55 pct higher at 2,825.14 and the OMX Helsinki added 0.64 pct to 9,386.89 .",No Context,Mislabeled,Ambiguous
A new production line is being completed for the contract production of hormone treatments .,Model Misclassification,No Context,Ambiguous
The company intends to raise production capacity in 2006 .,Model Misclassification,Model Misclassification,Ambiguous
A & euro ; 4.8 million investment in 13.6 % of Lewa netted Deutsche Beteiligungs & euro ; 21 million .,Model Misclassification,Mislabeled,Ambiguous
"`` Several growth initiatives in the chosen geographic areas are already ongoing , '' it continued , noting Lindex opened its first store in the Czech Republic this autumn in Brno .",Model Misclassification,No Context,Ambiguous
"We offer our clients integrated management consulting , total solutions for complex projects and efficient , best-in-class design and supervision .",Ambiguous,Mislabeled,Model Misclassification
Finnish automation solutions developer Cencorp Corporation ( OMX Helsinki : CNC1V ) said on Friday ( 27 June ) that it has completed employee negotiations regarding a reorganisation of its operations .,Model Misclassification,No Context,Ambiguous


### Compare Zero-Shot Techniques

The table presents a comparative analysis of the outcomes from three different methods of zero-shot classification— Embeddings-based, Natural Language Inference (NLI), and Conversational model prompting (Chat)—applied on a set of sentences. The goal is to identify the primary cause of misclassification for each sentence as determined by the prior trained model. The potential error classes include "Ambiguous", "No Context", "Mislabeled", and "Model Misclassification".

One key observation from the results is the variability in error classification among the three methods. Each method often interprets and analyzes the same sentence differently, leading to diverse classifications of the source of the error. This indicates that the choice of method can significantly influence the outcomes and subsequent interpretation of error analysis.

However, there are instances where all three methods agree on the error classification for a sentence. These instances can provide robust evidence for a particular type of misclassification, suggesting that these might be areas where the original supervised model struggled most.

Conversely, sentences that elicit widely varied error classifications across the three methods might indicate particularly challenging instances for automatic text classification. These cases could provide valuable insights into potential areas for improvement in model training or data annotation.

## Conclusion

This notebook demonstrates the use of zero-shot text classification for error analysis in a supervised learning model. After training the model and identifying its errors, you employed various zero-shot techniques, such as embeddings-based, NLI-based, and conversational AI models, to analyze these misclassifications.

Through this approach, you gained valuable insights into the reasons behind the model's mistakes, helping to understand its limitations and potential areas of improvement. This showcases how zero-shot classification can enrich the error analysis process, ultimately leading to more robust and accurate models.