# Prompt-based Sentiment Classification

This notebook demonstrates how to perform sentiment classification using a prompt-based approach with the Meta-Llama text generation model. The workflow includes:
- Loading a dataset using pandas.
- Generating prompts (both zero-shot and few-shot) from stored Markdown files.
- Sending prompts to an API endpoint for text generation.
- Parsing and displaying the responses.
- Looping over a subset of the data to compare true sentiment labels with predicted classifications.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

### Loading the Training Dataset

This cell reads the training dataset from a CSV file located at `../../data/train.csv` using the specified `ISO-8859-1` encoding.

Data Source: https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset?resource=download 

In [3]:
dataset= "train_example.csv"
import os
train = pd.read_csv(os.path.join("..","..","data",dataset), encoding='ISO-8859-1')

### Checking the Shape of the Dataset

This cell outputs the shape of the training dataframe (number of rows and columns) to quickly verify the dataset's dimensions.

In [4]:
train.shape

(10, 6)

### Previewing the Data

This cell displays the first 20 rows of the training dataset. It helps in inspecting the structure and content of the data, including columns like `text`, `selected_text`, and `sentiment`.

In [5]:
train.head(20)

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,textID,text,selected_text,sentiment
0,0,0,cb774db0d1,"I`d have responded, if I were going","I`d have responded, if I were going",neutral
1,1,1,549e992a42,Sooo SAD I will miss you here in San Diego!!!,Sooo SAD,negative
2,2,2,088c60f138,my boss is bullying me...,bullying me,negative
3,3,3,9642c003ef,what interview! leave me alone,leave me alone,negative
4,4,4,358bd9e861,"Sons of ****, why couldn`t they put them on the releases we already bought","Sons of ****,",negative
5,5,5,28b57f3990,http://www.dothebouncy.com/smf - some shameless plugging for the best Rangers forum on earth,http://www.dothebouncy.com/smf - some shameless plugging for the best Rangers forum on earth,neutral
6,6,6,6e0c6d75b1,2am feedings for the baby are fun when he is all smiles and coos,fun,positive
7,7,7,50e14c0bb8,Soooo high,Soooo high,neutral
8,8,8,e050245fbd,Both of you,Both of you,neutral
9,9,9,fc2cbefa9d,Journey!? Wow... u just became cooler. hehe... (is that possible!?),Wow... u just became cooler.,positive


### Defining the `chat` Function

This cell defines a function named `chat` that sends a prompt to a text generation API using the Meta-Llama model. It makes a POST request to the API endpoint, passing parameters like prompt text, maximum tokens, temperature, and top_p. 

In [67]:
HYPERBOLIC_AUTH = "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJicnVub2RhbG9vcEBnbWFpbC5jb20iLCJpYXQiOjE3NDI4NjA3MjV9.G9iv4A6BHnq0nyqpU6kxVbk9-0Huz_IFs9EYXHOcqfc"
MODEL = "meta-llama/Meta-Llama-3.1-405B-FP8"

In [68]:
import requests
import time

def retry(times, exceptions):
    """
    Retry Decorator
    Retries the wrapped function/method `times` times if the exceptions listed
    in ``exceptions`` are thrown
    :param times: The number of times to repeat the wrapped function/method
    :type times: Int
    :param Exceptions: Lists of exceptions that trigger a retry attempt
    :type Exceptions: Tuple of Exceptions
    """
    def decorator(func):
        def newfn(*args, **kwargs):
            attempt = 0
            while attempt < times:
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    print(
                        'Exception %s thrown when attempting to run %s, attempt '
                        '%d of %d. We\'ll wait 20seconds.' % (e, func, attempt, times)
                    )
                    attempt += 1
                    time.sleep(20)
            return func(*args, **kwargs)
        return newfn
    return decorator

class TooManyRequests(Exception):
    pass

@retry(times=3, exceptions=(TooManyRequests,))
def chat(prompt: str, max_tokens: int=2) -> str:
    url = "https://api.hyperbolic.xyz/v1/completions"
    headers = {
        "Content-Type": "application/json",
        "Authorization": HYPERBOLIC_AUTH
    }

    data = {
        "prompt": prompt,
        "model": MODEL,
        "max_tokens": max_tokens,
        "temperature": 0.7,
        "top_p": 0.9
    }

    response = requests.post(url, headers=headers, json=data)
    if response.status_code==200:
        return response.json()["choices"][0]["text"], response.json()
    elif response.status_code==429:
        raise TooManyRequests(response.text)
    else:
        return None, response.json()

message, response = chat("say blue.", max_tokens=20)
message, response

(' I just want to be in that number when the saints go marching in. Oh, when the saints',
 {'id': 'cmpl-65e385e5c5c141579144a38320ebea10',
  'choices': [{'finish_reason': 'length',
    'index': 0,
    'logprobs': None,
    'text': ' I just want to be in that number when the saints go marching in. Oh, when the saints'}],
  'created': 1743455697,
  'model': 'meta-llama/Meta-Llama-3.1-405B-FP8',
  'system_fingerprint': '',
  'object': 'text_completion',
  'usage': {'prompt_tokens': 2, 'total_tokens': 22, 'completion_tokens': 20}})

In [31]:
response

{'id': 'cmpl-bae7ee66e9f24d3082e29f96af72a9c0',
 'choices': [{'finish_reason': 'length',
   'index': 0,
   'logprobs': None,
   'text': ' say blue. say blue. say blue. say blue. say blue. say blue. say blue'}],
 'created': 1743454988,
 'model': 'meta-llama/Meta-Llama-3.1-405B',
 'system_fingerprint': '',
 'object': 'text_completion',
 'usage': {'prompt_tokens': 2, 'total_tokens': 22, 'completion_tokens': 20}}

### Defining the `get_prompt` Function

This cell defines a helper function `get_prompt` that reads a Markdown file (located in the `../../prompts/` directory) corresponding to a system prompt. It then formats the prompt by inserting the provided user text.

In [32]:
import os

def get_prompt(system_prompt_name: str, user_prompt) -> str:
    base_path = os.path.join("..", "..", "prompts")
    path = os.path.join(base_path, system_prompt_name + ".md")
    with open(path, 'r') as f:
        markdown_string = f.read()
    return markdown_string.format(user_prompt=user_prompt)

### Generating and Displaying a Zero-shot Prompt

This cell generates a zero-shot prompt for sentiment classification using the example text `"hello friend!"` by calling the `get_prompt` function with the `"zero_shot"` prompt template. The generated prompt is then displayed.

In [43]:
from IPython.display import display, Markdown

In [44]:
zero_shot_prompt = get_prompt(system_prompt_name="zero_shot_base", user_prompt="hello friend!")
display(Markdown(zero_shot_prompt))


Classify the text into neutral, negative or positive.

Text: hello friend!
Sentiment: 

### Testing the Zero-shot Prompt via the API

This cell sends the generated zero-shot prompt to the `chat` function with a `max_tokens` limit of 2, and displays the API response.

In [45]:
chat(zero_shot_prompt, max_tokens=20)


('0.999\n\nText: hello enemy!\nSentiment: 0.01\n\nText: hello',
 {'id': 'cmpl-a3cac1e37f904b0290f3dc48ed89d0b6',
  'choices': [{'finish_reason': 'length',
    'index': 0,
    'logprobs': None,
    'text': '0.999\n\nText: hello enemy!\nSentiment: 0.01\n\nText: hello'}],
  'created': 1743455107,
  'model': 'meta-llama/Meta-Llama-3.1-405B-FP8',
  'system_fingerprint': '',
  'object': 'text_completion',
  'usage': {'prompt_tokens': 2, 'total_tokens': 22, 'completion_tokens': 20}})

### Generating a Few-shot Prompt

**Exercise 1:** you need to a system prompt using "few shot" technique. Make sure to not use data available on test.csv.

In [106]:
few_shot_prompt  = get_prompt("few_shot_base_refined", "hello friend!")


In [107]:
few_shot_prompt

'Classify the text into neutral, negative or positive. Here are a few examples on how to classify:\n\nYou are ugly.//Sentiment: negative|| You look nice today!//Sentiment: positive|| Wow, this is awesome!// Sentiment: positive|| I am being stalked at my job...//Sentiment: neutral|| Hey, you//Sentiment: neutral|| I love pizza!// Sentiment: positive|| hello friend!//Sentiment: \n'

### Displaying the Few-shot Prompt in Markdown

This cell uses IPython’s display functionality to render the few-shot prompt in Markdown format, allowing you to visually inspect the prompt content.

In [108]:
from IPython.display import display, Markdown
display(Markdown(few_shot_prompt))


Classify the text into neutral, negative or positive. Here are a few examples on how to classify:

You are ugly.//Sentiment: negative|| You look nice today!//Sentiment: positive|| Wow, this is awesome!// Sentiment: positive|| I am being stalked at my job...//Sentiment: neutral|| Hey, you//Sentiment: neutral|| I love pizza!// Sentiment: positive|| hello friend!//Sentiment: 


### Sending the Few-shot Prompt to the API

This cell sends the few-shot prompt to the API via the `chat` function with a token limit of 5. It stores both the final answer and the full response (which includes metadata such as usage statistics).

In [109]:
answer, full_resp = chat(few_shot_prompt, 20)


### Displaying the API Answer

This cell outputs the answer portion of the API response obtained from the previous call.

In [110]:
answer

'neutral|| I hate you//Sentiment: negative|| You are my friend.//Sentiment:'

### Displaying the Full API Response

This cell prints the full response from the API call. The full response includes details like the prompt tokens used, the model information, and other metadata.

In [111]:
full_resp


{'id': 'cmpl-75bd8ad099214dcd93271a6112b239c4',
 'choices': [{'finish_reason': 'length',
   'index': 0,
   'logprobs': None,
   'text': 'neutral|| I hate you//Sentiment: negative|| You are my friend.//Sentiment:'}],
 'created': 1743715142,
 'model': 'meta-llama/Meta-Llama-3.1-405B-FP8',
 'system_fingerprint': '',
 'object': 'text_completion',
 'usage': {'prompt_tokens': 2, 'total_tokens': 22, 'completion_tokens': 20}}

### Defining the `classify_text` Function

This cell defines a helper function called `classify_text` that:
- Generates a prompt based on the provided text and prompt version.
- Sends the prompt to the API.
- Parses the API response using a specified delimiter to extract the sentiment classification.

**Exercise 2:** adapt the classify text to your prompts to make sure the format is well parsed and the "user" prompt is well strucutred.

In [83]:
def classify_text(text: str, prompt_version: str = "zero_shot", delimiter=" "):
    def parse_output(answer_raw, delimiter):
        answer = answer_raw.split(delimiter)[0].lower().strip()
        return answer
    prompt = get_prompt(prompt_version, text)
    # print(prompt)
    answer_raw, _ = chat(prompt=prompt, max_tokens=5)
    return parse_output(answer_raw, delimiter)

### Testing `classify_text` with Zero-shot Prompt

This cell tests the `classify_text` function using the text `"youre stupid"`, with the zero-shot prompt version and a delimiter `"</class>"`. It then displays the resulting classification.

In [84]:
classify_text(text="youre stupid",prompt_version= "zero_shot_base",  delimiter= "\n")


'negative'

### Testing `classify_text` with Few-shot Prompt (You need to adapt it)

In [114]:
text_to_test_classify = "youre soooo stupid"
print(f"Sentence: {text_to_test_classify}")
print(f"zero_shot_base: {classify_text(text_to_test_classify, "zero_shot_base", "\n")}")
print(f"few_shot_base_refined: {classify_text(text_to_test_classify, "few_shot_base_refined", "||")}")
print(f"few_shot_base_refined_wo_sentiment: {classify_text(text_to_test_classify, "few_shot_base_refined_wo_sentiment", "||")}")
print(f"few_shot_base_refined_plus: {classify_text(text_to_test_classify, "few_shot_base_refined_plus", "||")}")


Sentence: youre soooo stupid
zero_shot_base: 0.0
few_shot_base_refined: negative
few_shot_base_refined_wo_sentiment: negative
few_shot_base_refined_plus: negative


### Classifying Multiple Texts from the Dataset

This cell iterates over the first 5 rows of the training dataset. For each row, it:
- Classifies the text using the `classify_text` function with the few-shot prompt.
- Prints the original text, its true sentiment label, and the predicted sentiment.
- Introduces a 10-second delay between API calls to manage rate limits.

**Exercise 3:** your task is to add the prompt versions you want to test on the list below and access their performance. Make sure, the prompt you create hits at leas better than random performance.

In [115]:
#prompt_versions_to_test = ["zero_shot_base"]
# 1. zero_shot_base: does not contain examples
# 2. few_shot_base_refined: contains some examples with "Sentiment" reference word and special characters for delimiters
# 3. few_shot_base_refined_wo_sentiment: same examples as 2 but without "Sentiment" word
# 4. few_shot_base_refined_plus: more examples than 2
prompt_versions_to_test = ["zero_shot_base","few_shot_base_refined","few_shot_base_refined_wo_sentiment","few_shot_base_refined_plus"]


In [116]:
import pandas as pd
test = pd.read_csv(f"../../data/test.csv", encoding='ISO-8859-1')[["text", "sentiment"]]
print(test.shape)
test.head()

(20, 2)


Unnamed: 0,text,sentiment
0,first night in myers. just not the same w/out ...,positive
1,good morning,positive
2,its the best show EVER!,positive
3,URL in previous post (to timer job) should be ...,negative
4,i think iv hurt my tooth and eilish and cassi...,neutral


In [117]:
preds = list()
for prompt_version in prompt_versions_to_test:
    n_preds = 0
    print(prompt_version)
    if prompt_version=="zero_shot_base":
        delimiter_to_use = "\n"
    else:
        delimiter_to_use = "||"
    for i, row in test.iterrows():
        try:
            pred = classify_text(row.text, prompt_version=prompt_version, delimiter=delimiter_to_use)
            print("\n",row.text, "\ntrue label:", row.sentiment, "\npred:", pred)
            preds.append((row["text"],row["sentiment"],pred, prompt_version))
            n_preds+=1
        except Exception as e:
            print(e)
            print("let's try again")


zero_shot_base

 first night in myers. just not the same w/out lydia!  but i`m actually excited about this summer! 
true label: positive 
pred: positive
Exception {"detail":"Too Many Requests"} thrown when attempting to run <function chat at 0x0000023EE2CA1D00>, attempt 0 of 3. We'll wait 20seconds.

  good morning 
true label: positive 
pred: neutral

  its the best show EVER! 
true label: positive 
pred: positive

 URL in previous post (to timer job) should be http://bit.ly/a4Fdb. I`d removed space which messed up URL.  ^ES 
true label: negative 
pred: 0.0

 i think iv hurt my tooth  and eilish and cassie are having a drawing competiton to draw cookies and pineapples haha :L . 
true label: neutral 
pred: 0.0

  I want to know when the auditions are Mander! Text or...reply please! 
true label: neutral 
pred: neutral
Exception {"detail":"Too Many Requests"} thrown when attempting to run <function chat at 0x0000023EE2CA1D00>, attempt 0 of 3. We'll wait 20seconds.
Exception {"detail":"To

In [118]:
preds_df = pd.DataFrame(preds, columns=["text","y","y_pred","prompt_version"])

preds_df

Unnamed: 0,text,y,y_pred,prompt_version
0,first night in myers. just not the same w/out ...,positive,positive,zero_shot_base
1,good morning,positive,neutral,zero_shot_base
2,its the best show EVER!,positive,positive,zero_shot_base
3,URL in previous post (to timer job) should be ...,negative,0.0,zero_shot_base
4,i think iv hurt my tooth and eilish and cassi...,neutral,0.0,zero_shot_base
...,...,...,...,...
75,wish we could come see u on Denver husband l...,negative,negative,few_shot_base_refined_plus
76,I`ve wondered about rake to. The client has ...,negative,negative,few_shot_base_refined_plus
77,Yay good for both of you. Enjoy the break - y...,positive,,few_shot_base_refined_plus
78,But it was worth it ****.,positive,negative,few_shot_base_refined_plus


### Performance

Let's calculate the performance metrics of our 

In [119]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def calculate_performance_metrics(df):
    # Extract true labels (y) and predicted labels (y_pred)
    y_true = df['y']
    y_pred = df['y_pred']
    
    # Calculate accuracy
    accuracy = accuracy_score(y_true, y_pred)
    
    # Calculate precision, recall, and F1 score for each class
    precision, recall, f1, _ = precision_recall_fscore_support(y_true, y_pred, average='macro', zero_division=0)
    
    # Print the results
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    print(f"N predictions: {len(y_pred)}")
    

In [121]:
for group, preds_prompt in preds_df.groupby("prompt_version"):
    print("\n",group)
    calculate_performance_metrics(preds_prompt)


 few_shot_base_refined
Accuracy: 0.5000
Precision: 0.5208
Recall: 0.3690
F1 Score: 0.4280
N predictions: 20

 few_shot_base_refined_plus
Accuracy: 0.5000
Precision: 0.3450
Recall: 0.2952
F1 Score: 0.3133
N predictions: 20

 few_shot_base_refined_wo_sentiment
Accuracy: 0.6000
Precision: 0.5000
Recall: 0.4464
F1 Score: 0.4712
N predictions: 20

 zero_shot_base
Accuracy: 0.4000
Precision: 0.2500
Recall: 0.1667
F1 Score: 0.1976
N predictions: 20


**Exercise 4:** Explain the performance you achieved and your justify your decisions.


Initial performances were very low due to mismatch of hyperbolic endpoint. I changed it and began to get some results. Zero shot does not even get that it is supposed to give positive, neutral or negative, getting as an answer integers sometimes. With few shot separated by some delimiters and with a structure it is possible to get good accuracy around 50 % (with previous tests i got 65%, maybe has to do with the api waiting and not getting the result(?)). And I made two more versions to test if more examples would improve the accuracy or if the sentiment keyword would impact results. In this case i got the same accuracy on the refined and refined plus, having different precision metrics. This means the model actually correctly predicts the target class with less examples... However i got better accuracy (60 > 50) with the few_shot that do not have sentiment word. Which in turn leads to me wondering if what improves is actually the use of special characters delimiting the prompt and the response and not the use of words by itself, such as "text:" or "sentiment:"

## END