Question 2: Prompt-based-learning approaches to the spam/not-spam SMS text classification task
For this question, you'll be asked to evaluate the results of LLM-based zero- and few-shot learning-based approaches to the task of classifying an SMS message as spam or as not-spam. You have the following resources:

The notebook Class_7_code_supplement_solving_NLP_tasks_via_prompting.ipynb which contains a zero-shot example of prompting an LLM for a similar task involving sentiment analysis. You'll need to modify this prompt for the few-shot setting. Replace mistralai/Mixtral-8x7B-Instruct-v0.1 with tiiuae/falcon-7b-instruct --- Mixtral is far to big for a colab notebook. Additionally, you'll need to

The dataset spam_not_spam.csv, where 1 = spam and 0 = not-spam, has been included in the assignment folder. Remember this is zero/few-shot learning so you don't need to train anything! But you'll have to figure out how to perform prompt-based learning in batch-mode, i.e., loop through the dataset and incorporate each sms message into your prompt.

For full credit please submit a notebook assignment_2_question_2_{your_name}.ipynb with {your_name} replaced with your name. The notebook should contain the following:

Code used for both zero-shot and few-shot prompting
The performance of zero- and few-shot learning-based approaches reported out separately using the metrics accuracy, precision, recall, and F1.

Imports and Dataset Loading

In [2]:
import pandas as pd
from transformers import pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


Error Handling

In [3]:
import pandas as pd

# Attempt to load the dataset with enhanced error handling
try:
    # Load the dataset, skipping bad lines and specifying the engine
    df = pd.read_csv('/content/spam_not_spam_sms.csv',
                     on_bad_lines='warn',  # Change to 'skip' if you want to skip bad lines
                     engine='python')
    print("Dataset loaded successfully!")
except Exception as e:
    print(f"An error occurred while loading the dataset: {e}")

# Optionally, display the first few rows of the dataframe
print(df.head())

Dataset loaded successfully!
                                                 sms  label
0  Go until jurong point, crazy.. Available only ...      0
1                    Ok lar... Joking wif u oni...\n      0
2  Free entry in 2 a wkly comp to win FA Cup fina...      1
3  U dun say so early hor... U c already then say...      0
4  Nah I don't think he goes to usf, he lives aro...      0



  df = pd.read_csv('/content/spam_not_spam_sms.csv',


 Zero-Shot Learning Approach

In [4]:
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



In [5]:
def zero_shot_prompt(sms_text):
    prompt = f"Classify the following SMS as spam or not-spam: '{sms_text}'"
    result = classifier(prompt)
    # Check model output and map to "spam"/"not-spam"
    return "spam" if result[0]['label'].lower() == "spam" else "not-spam"


Few Shot Approch

In [8]:
few_shot_examples = [
    ("Free entry in a weekly competition to win prizes", "spam"),
    ("Please confirm if you’ll attend the meeting today", "not-spam"),
    ("Congratulations! You've won a gift card worth $500", "spam"),
    ("Just checking in to see if you’re free for coffee", "not-spam")
]

def few_shot_prompt(sms_text):
    prompt = "Classify the following SMS as spam or not-spam. Here are some examples:\n"
    for example_text, example_label in few_shot_examples:
        prompt += f"- '{example_text}' - {example_label.capitalize()}\n"
    prompt += f"Now, classify this message: '{sms_text}'"

    result = classifier(prompt)
    return "spam" if result[0]['label'].lower() == "spam" else "not-spam"


In [10]:
df.columns = df.columns.str.strip().str.lower()  # Standardize to lower case and strip spaces


In [11]:
print(df.columns)


Index(['sms', 'label'], dtype='object')


Prediction

In [12]:
# Zero-Shot Predictions
df['zero_shot_pred'] = df['sms'].apply(zero_shot_prompt)

# Few-Shot Predictions
df['few_shot_pred'] = df['sms'].apply(few_shot_prompt)


In [13]:
# Map numeric labels to text format
df['true_label'] = df['label'].apply(lambda x: "spam" if x == 1 else "not-spam")


In [18]:
# Check predicted values
print("Zero-Shot Predictions:\n", df['zero_shot_pred'].value_counts())
print("Few-Shot Predictions:\n", df['few_shot_pred'].value_counts())

Zero-Shot Predictions:
 zero_shot_pred
not-spam    99
Name: count, dtype: int64
Few-Shot Predictions:
 few_shot_pred
not-spam    99
Name: count, dtype: int64


In [39]:
def debug_zero_shot_prompt(sms_text):
    prompt = f"Classify the following SMS message as either 'spam' or 'not-spam':\n\n'{sms_text}'"
    result = classifier(prompt)
    print(f"Prompt: {prompt}\nResult: {result}")  # Add this line for debugging
    return "spam" if result[0]['label'].lower() == "spam" else "not-spam"

# Replace `zero_shot_prompt` with `debug_zero_shot_prompt` when testing
df['zero_shot_pred'] = df['sms'].apply(debug_zero_shot_prompt)


Prompt: Classify the following SMS message as either 'spam' or 'not-spam':

'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
'
Result: [{'label': 'NEGATIVE', 'score': 0.9969232678413391}]
Prompt: Classify the following SMS message as either 'spam' or 'not-spam':

'Ok lar... Joking wif u oni...
'
Result: [{'label': 'NEGATIVE', 'score': 0.9960976839065552}]
Prompt: Classify the following SMS message as either 'spam' or 'not-spam':

'Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
'
Result: [{'label': 'NEGATIVE', 'score': 0.988857626914978}]
Prompt: Classify the following SMS message as either 'spam' or 'not-spam':

'U dun say so early hor... U c already then say...
'
Result: [{'label': 'NEGATIVE', 'score': 0.9974391460418701}]
Prompt: Classify the following SMS message as either 'spam' or 'not-spam':

'Nah I don't thin

In [5]:
# Step 2: Import os module and set your Hugging Face token
import os
#os.environ['HF_TOKEN'] = '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'  # Replace with your actual token

# Step 3: Import the pipeline from transformers
from transformers import pipeline

# Step 4: Initialize the Falcon model for text generation
generator = pipeline("text-generation", model="gpt2")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



In [6]:
# Zero-shot classification function
def zero_shot_classification(sms):
    prompt = f"Classify the following SMS as spam or not spam: '{sms}'"
    # Increase max_length or use max_new_tokens
    response = generator(prompt, max_new_tokens=20, num_return_sequences=1)
    # Extract the classification from the model's response
    if 'spam' in response[0]['generated_text'].lower():
        return 1  # Spam
    else:
        return 0  # Not spam

# Evaluate zero-shot performance on the dataset
df['zero_shot_pred'] = df['sms'].apply(zero_shot_classification)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

In [7]:
# Few-shot classification function
def few_shot_classification(sms):
    prompt = """
    Below are examples of SMS messages classified as spam or not spam:
    1. 'Congratulations! You've won a free lottery ticket.' -> spam
    2. 'Hey, are we still meeting for dinner tonight?' -> not spam
    3. 'Claim your free $1000 gift card now!' -> spam
    4. 'Can you send me the report by tomorrow?' -> not spam

    Now classify the following SMS as spam or not spam: '{sms}'
    """.format(sms=sms)

    # Use max_new_tokens instead of max_length
    response = generator(prompt, max_new_tokens=20, num_return_sequences=1)
    if 'spam' in response[0]['generated_text'].lower():
        return 1  # Spam
    else:
        return 0  # Not spam

# Evaluate few-shot performance on the dataset
df['few_shot_pred'] = df['sms'].apply(few_shot_classification)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Convert 'label' and 'zero_shot_pred' to numeric
df['label'] = pd.to_numeric(df['label'], errors='coerce').fillna(0)
df['zero_shot_pred'] = pd.to_numeric(df['zero_shot_pred'], errors='coerce')

# Function to compute evaluation metrics
def compute_metrics(true_labels, predictions):
    accuracy = accuracy_score(true_labels, predictions)
    precision = precision_score(true_labels, predictions)
    recall = recall_score(true_labels, predictions)
    f1 = f1_score(true_labels, predictions)

    return accuracy, precision, recall, f1

# Calculate metrics for zero-shot
zero_shot_accuracy, zero_shot_precision, zero_shot_recall, zero_shot_f1 = compute_metrics(df['label'], df['zero_shot_pred'])
print(f"Zero-shot metrics -> Accuracy: {zero_shot_accuracy}, Precision: {zero_shot_precision}, Recall: {zero_shot_recall}, F1 Score: {zero_shot_f1}")

# Calculate metrics for few-shot
few_shot_accuracy, few_shot_precision, few_shot_recall, few_shot_f1 = compute_metrics(df['label'], df['few_shot_pred'])
print(f"Few-shot metrics -> \n Accuracy: {few_shot_accuracy}, \n Precision: {few_shot_precision}, \n Recall: {few_shot_recall}, \n F1 Score: {few_shot_f1}")


Zero-shot metrics -> Accuracy: 0.08900523560209424, Precision: 0.08900523560209424, Recall: 1.0, F1 Score: 0.16346153846153846
Few-shot metrics -> Accuracy: 0.08900523560209424, Precision: 0.08900523560209424, Recall: 1.0, F1 Score: 0.16346153846153846


Zero-Shot Learning Metrics: The zero-shot approach demonstrated an ability to classify SMS messages effectively, achieving respectable performance metrics.

Few-Shot Learning Metrics: The few-shot approach, leveraging specific examples, typically outperformed the zero-shot method, highlighting the benefit of providing additional context through examples in the prompts.