# Loading the dataset
The code for this notebook was taken from this [Github Repo](https://github.com/DiveshRKubal/LargeLanguageModels/blob/main/Zero_Shot_Text_Classification/Zero_Shot_Text_Classification.ipynb), all credits go to DiveshRKubal. In this notebook I am going to breakdown each code-block and explain whats happening to learn the concept of zero-shot learning.

To start we will load the rotten tomatoes dataset from the Hugging Face Datasets Library. This dataset from rotten tomatoes consists of movie reviews which are either labeled as positive or negative. 

The DatasectDict looks like a python dictionary, but with special-dataset objects. This specific dataset consisnt of 3 splits:

- Train
- Validation
- Test


In [1]:
# Importing the load_dataset function from the datasets library
from datasets import load_dataset

# Loading the "rotten_tomatoes" dataset and storing it in the variable 'data'
data = load_dataset("rotten_tomatoes")

# Printing the content of the loaded dataset to the console
print(data)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})


Each split has 2 features:

- `text`: de filmreview
- `label`: a numeric label (0 = negative, 1 = positive)

In [2]:
for split in ['train', 'validation', 'test']:
    print(f"Example from {split} set:")
    print(data[split][0])
    print("-" * 40)

Example from train set:
{'text': 'the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .', 'label': 1}
----------------------------------------
Example from validation set:
{'text': 'compassionately explores the seemingly irreconcilable situation between conservative christian parents and their estranged gay and lesbian children .', 'label': 1}
----------------------------------------
Example from test set:
{'text': 'lovingly photographed in the manner of a golden book sprung to life , stuart little 2 manages sweetness largely without stickiness .', 'label': 1}
----------------------------------------


This print shows some example values from the train, validation and test set. As you can see the dataset is built on a review and then the label. All examples above show positive reviews.

In [3]:
# Check the features of the dataset and their types
print(data["train"].features)

{'text': Value('string'), 'label': ClassLabel(names=['neg', 'pos'])}


# Load Embeddings Model

Below here we're importing `SentenceTransformer` which is made to conver sentences to vectors.

After that we load a pretrained sentence mebedding model `all-mpnet-base-v2` is a model from the *Sentence Transformers* library. Its based on MPNet which is a transformer variant from microsoft. It's trained on millions of sentence pairings to learn semantic similairities.

In [4]:
# Importing the SentenceTransformer class from the sentence_transformers library
from sentence_transformers import SentenceTransformer

# Initializing the model by loading the pre-trained 'all-mpnet-base-v2' model
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

# Generate Embeddings for Labels

Below we're creating two labeldescriptions. The model `all-mpnet-base-v21` will tranform these sentences in **vector embeddings**. The reason why we're doing this is because labels are normally numbers (0 = negative, 1 = positive). Here we're giving the labels in a "natural language", this is how a pretrained model can link the meaning of the label to a text.

Which is esentially the core of zero-shot:
* No need to train a seperate classifier.
* You embedd your labels like you embedd your input.
* Afterwards you compare: "Does the review looks like the embedding of 'positive' or 'negative'?"

In [5]:
# Encoding a list of sentences ("Negative movie review" and "Positive movie review")
# using the loaded model to obtain their embeddings (numerical representations)
label_embeddings = model.encode(["Negative sentiment",  "Positive sentiment"])

# Generate Embeddings for Test Data (Text)

Whats happening below is that `data["test"]["text"]` takes all the review-texts from the test-split (1066 rows).

`model.encode(...)` turns every review into an embedding vector which gives you a matrix of (1066, 768):
* 1066 test-sentences
* 768 is the embedding size of `all-mpnet-base-v2`

The reason why were doing this is, before we had the label embeddings: "Positive movie review" and "Negative movie review". And seperate review embeddings ex. "The moive was great".

But to truly learn/ evaluatie something we need to encode all testreviews, so we can do the following per review:
1. Compare it with both label embeddings
2. Predict if the label is positive or negative
3. Compare with the real `label` column in the dataset.

In [6]:
# Encoding the "text" column from the "test" split of the dataset `data` using the model.
# This will generate embeddings for each sentence in the "test" set.
# The `show_progress_bar=True` argument will display a progress bar during encoding.
test_embeddings = model.encode(data["test"]["text"], show_progress_bar=True)

Batches:   0%|          | 0/34 [00:00<?, ?it/s]

# Zero-Shot Classification
```bash
sim_matrix = cosine_similarity(test_embeddings, label_embeddings)
```

* `test_embeddings`: Is a matrix (1066, 768) with all testreviews
* `label_embeddings`: Is a matrix (2. 768) with "Negative movie review", "Positive movie review"

**Cosine similarity** calculates how close the two vectors are in the semantic space.

This results in the `sim_matrix` (1066, 2)

Each row = a review
Each column = Similarity with a label

    - Column 0 = how negative the review sounds
    - Column 1 = how positive the review sounds

```bash
predictions = np.argmax(sim_matrix, axis=1)
```
* `argmax` searches the index with the highest value per row

For review i:

    * if column 0 > column 1 -> preidiction = 0 (negative)
    * if column 1 > column 0 -> prediction = 1 (positive)
    
So simply put `predictions` is a vector with 1066 predictions (0 or 1)

In [7]:
# Importing the cosine_similarity function from sklearn.metrics.pairwise
# to compute the cosine similarity between two sets of embeddings
from sklearn.metrics.pairwise import cosine_similarity

# Importing numpy for numerical operations (e.g., finding the index of the max value)
import numpy as np

# Compute the cosine similarity between the test embeddings and the label embeddings.
# The result is a similarity matrix where each row corresponds to a test sentence,
# and each column corresponds to one of the label embeddings ("Negative" or "Positive").
sim_matrix = cosine_similarity(test_embeddings, label_embeddings)

# For each document in the test set, find the index of the label with the highest cosine similarity.
# This gives the predicted label (0 for "Negative", 1 for "Positive").
predictions = np.argmax(sim_matrix, axis=1)

# Compute the Classification Report / Confusion Matrix

Below is the classification reports that shows how well the zero-shot classifier identified the positive or negative reviews.

In [8]:
# Importing the classification_report function from sklearn.metrics
# to evaluate the model's performance by generating precision, recall, and F1-score metrics
from sklearn.metrics import classification_report

# Extracting the true labels (ground truth) from the 'label' column in the "test" split of the dataset
y_true = data['test']["label"]

# Generating a classification report to evaluate the model's predictions against the true labels
# The report includes precision, recall, and F1-score for each class (Negative and Positive)
report = classification_report(y_true, predictions,
                               target_names=["Negative sentiment", "Positive sentiment"])

# Printing the classification report to the console
print(report)


                    precision    recall  f1-score   support

Negative sentiment       0.81      0.74      0.78       533
Positive sentiment       0.76      0.82      0.79       533

          accuracy                           0.78      1066
         macro avg       0.79      0.78      0.78      1066
      weighted avg       0.79      0.78      0.78      1066



With a weighted accuracy of 78% the model is decent but not perfect. 

Looking at the precision and recall for the labels the model is better in finding **Positive Reviews** and worse when it comes to **Negative Reviews**.
Essentially its a bit **positively-optimistic**
    - It has a bias towards positive reviews and therefore misses some negative reviews
    - But if it does predict a negative review theres a high chance its correct.

### Summary
To summarize what we did in this notebook and what zero-shot learning is.

* In this notebook we had a dataset with a given label for each review (0 = negative, 1 = positive)
* This model was trained to read the sentence and then see 0 means negative and 1 means positive
* This is **suprvised learning** you learn with examples *with* a label

What we did in this notebook is skip the training. The model did not know rotten tomatoes.

Instead we gave the model the 2 labels with their description:

* "Negative sentiment"
* "Positive sentiment"

These are label-descriptions in a 'normal language'

Then we asked the model the following:

* Turn these reviews into vectors
* Also turn the labels into vectors

Then we compared

* Does the review look like the vector "Positive sentiment" or more like "Negative sentiment"

**But how does that "look like" work technically?**
This pretrained model has already seen alot of texts, this is how it knows that words like "great, amazing, wonderful" usually correspond to Positive. Where-as "boring, terrible, bad" correspond to Negative. So if you were to say to the model:

    * "This movie was fantastic!"
    * The model will think: these words (vectors) fit more into the Positive sentiment then the Negative sentiment

The interesting part is, we did not use the rotten tomatoes-labels to train this model, yet it can still seperate the sentiment. Which is exatly what **zero-shot** learning is: solve a task without training for that specific task.

So in one sentence:

`We asked the model of every review resonates more with "*Negative sentiment*" or "*Positive sentiment*" and thats how, without training it could classify the reviews as positive or negative.`