# Great Learning: Sentiment Analysis Final Project: A Case Study of ExpressWay Logistics

# **Step 1. Setup**

Installation

In [5]:
!pip install openai==1.2 tiktoken datasets session-info --quiet

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m219.9/219.9 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.3/474.3 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.2 MB/s[0m eta [36m0:00:0

Imports

In [6]:
from openai import AzureOpenAI
import json
import random
import requests
import tiktoken
import session_info

import pandas as pd
import numpy as np

from collections import Counter
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from tabulate import tabulate

Authentication

A)Loading API credentials from environment variables

In [8]:
import os

# Set environment variables
os.environ['AZURE_OPENAI_KEY'] = 'your-api-key'
os.environ['AZURE_OPENAI_ENDPOINT'] = 'your-endpoint'
os.environ['AZURE_OPENAI_APIVERSION'] = '2024-02-15-preview'
os.environ['CHATGPT_MODEL'] = 'gpt-4o'

In [9]:
import os

# Load API credentials from environment variables
api_key = os.getenv('AZURE_OPENAI_KEY')
endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
api_version = os.getenv('AZURE_OPENAI_APIVERSION')
model_name = os.getenv('CHATGPT_MODEL')

Utilities

In [16]:
def num_tokens_from_messages(messages):

    """
    Return the number of tokens used by a list of messages.
    Adapted from the Open AI cookbook token counter
    """

    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

    # Each message is sandwiched with <|start|>role and <|end|>
    # Hence, messages look like: <|start|>system or user or assistant{message}<|end|>

    tokens_per_message = 3 # token1:<|start|>, token2:system(or user or assistant), token3:<|end|>

    num_tokens = 0

    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

    return num_tokens

# **Step 2. Assemble Data**

A.) Upload and read csv file
B.) Count positive and negative sentiment reviews
C.) Split the dataset

A.) Upload and read csv file

In [None]:
cs_reviews_df = pd.read_csv('/content/courier-service_reviews.csv')

In [None]:
cs_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         131 non-null    int64 
 1   review     131 non-null    object
 2   sentiment  131 non-null    object
dtypes: int64(1), object(2)
memory usage: 3.2+ KB


In [None]:
cs_reviews_df.sample(5)

Unnamed: 0,id,review,sentiment
103,104,"I had a delicate piece of artwork to ship, and...",Positive
75,76,What I love most about ExpressWay Logistics is...,Positive
38,39,I had a frustrating experience with ExpressWay...,Negative
111,112,I had a disappointing experience with ExpressW...,Negative
89,90,ExpressWay Logistics consistently delivers par...,Positive


B.) Count Positive and Negative Sentiment Reviews

In [None]:
sentiment_counts = cs_reviews_df['sentiment'].value_counts()
print(sentiment_counts)

sentiment
Positive    68
Negative    63
Name: count, dtype: int64


In [None]:
cs_reviews_df.shape

(131, 3)

C.) Split the dataset

In [None]:
cs_examples_df, cs_gold_examples_df = train_test_split(
    cs_reviews_df,
    test_size=0.2,
    random_state=42
)

In [None]:
(cs_examples_df.shape, cs_gold_examples_df.shape)

((104, 3), (27, 3))

In [None]:
columns_to_select = ['review','sentiment']

In [None]:
gold_examples = (
        cs_gold_examples_df.loc[:, columns_to_select]
                                     .sample(21, random_state=42) #<- ensures that gold examples are the same for every session
                                     .to_json(orient='records')
)

In [None]:
gold_examples

'[{"review":"The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damage despite the delivery executive\'s efforts. The packaging seemed more than adequate to protect the contents during transit.","sentiment":"Positive"},{"review":"ExpressWay Logistics failed to meet my expectations. The delivery was delayed, and the customer support team was unresponsive and unhelpful when I tried to inquire about the status of my parcel.","sentiment":"Negative"},{"review":"ExpressWay Logistics\' incompetence resulted in a major inconvenience when my package was delivered to the wrong recipient. Despite providing accurate delivery information, the package ended up in the hands of someone else, and efforts to retrieve it were unsuccessful. When I contacted customer service for assistance, I was met with apathy and a lack of urgency. Their fa

In [None]:
json.loads(gold_examples)[0]     #Json format

{'review': "The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damage despite the delivery executive's efforts. The packaging seemed more than adequate to protect the contents during transit.",
 'sentiment': 'Positive'}

##**Step 3: Derive Prompt (12 Marks)**

(A) Write Zero Shot System Message (3 Marks)

(B) Create Zero Shot Prompt (2 Marks)

(C) Write Few Shot System Message (3 Marks)

(D) Create Examples For Few shot prompte (2 Marks)

(E) Create Few Shot Prompt (2 Marks)

In [None]:
user_message_template = """```{courier_service_review}```"""

**(A) Write Zero Shot System Message (3 Marks)**

In [None]:
zero_shot_system_message = """
You are a sentiment analysis assistant. Your objective is to read courier service reviews enclosed within triple backticks and classify the sentiment as either Positive or Negative.
Ensure you only respond with 'Positive' or 'Negative' without providing any additional information.
Do not infer any sentiment if it's unclear or ambiguous; simply choose based on the content of the review.

The format for the reviews will be as follows:
```{courier_service_review}```

Your response should be in the following format:
- Output: 'Positive' or 'Negative'
"""

**(B) Create Zero Shot Prompt (2 Marks)**

In [None]:
# Load your API credentials from config
with open('config.json', 'r') as az_creds:
    creds = json.load(az_creds)

# Azure OpenAI credentials
api_key = creds["AZURE_OPENAI_KEY"]
endpoint = creds["AZURE_OPENAI_ENDPOINT"]
api_version = creds["AZURE_OPENAI_APIVERSION"]
model_name = creds["CHATGPT_MODEL"]

# Define the headers for the API request
headers = {
    "Content-Type": "application/json",
    "api-key": api_key
}

# Function to calculate the number of tokens from messages
def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):
    encoding = tiktoken.encoding_for_model(model)
    tokens_per_message = 3
    num_tokens = 0

    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    num_tokens += 3  # Every reply is primed with <|start|>assistant<|message|>
    return num_tokens

# Zero-shot system message for task definition
zero_shot_system_message = """
You are a sentiment analysis assistant. Your objective is to read courier service reviews enclosed within triple backticks and classify the sentiment as either Positive or Negative.
Ensure you only respond with 'Positive' or 'Negative' without providing any additional information.
The format for the reviews will be as follows:
```{courier_service_review}```

Your response should be in the following format:
- Output: 'Positive' or 'Negative'
"""

# Example user review text
review_text = "ExpressWay Logistics' pricing may seem attractive at first glance, but beware of some internal hidden fees that may sometimes add up. But still I appreciate that my parcel got delivered safely a day after the promised window."

# User message with the review text
user_message_template = """```{courier_service_review}```"""

# Full message setup for the API call (System and User messages)
messages = [
    {"role": "system", "content": zero_shot_system_message},  # System message
    {"role": "user", "content": user_message_template.format(courier_service_review=review_text)}  # User message with review text
]

# Calculate the number of tokens used by the zero-shot prompt
num_tokens = num_tokens_from_messages(messages, model="gpt-3.5-turbo")
print(f"Number of tokens used: {num_tokens}")

# Define the API request URL (Azure OpenAI Endpoint)
url = f"{endpoint}/openai/deployments/{model_name}/chat/completions?api-version={api_version}"

# Define the payload (messages and max tokens)
payload = {
    "messages": messages,
    "max_tokens": 5  # Adjust the max tokens if needed
}

# Make the POST request to the Azure OpenAI API
response = requests.post(url, headers=headers, json=payload)

# Check if the request was successful
if response.status_code == 200:
    # Extract and print the result
    result = response.json()
    print("Model Prediction:", result['choices'][0]['message']['content'].strip())
else:
    print(f"Error: {response.status_code}, {response.text}")


Number of tokens used: 143
Model Prediction: Negative


**(C) Write Few Shot System Message (3 Marks)**

In [None]:
# Define the create_examples function
def create_examples(dataset, n=4):
    """
    Return a JSON list of randomized examples of size 2n with two classes.
    Create subsets of each class, choose random samples from the subsets,
    merge and randomize the order of samples in the merged list.
    Each run of this function creates a different random sample of examples
    chosen from the training data.
    """
    positive_reviews = (dataset.sentiment == 'Positive')
    negative_reviews = (dataset.sentiment == 'Negative')
    columns_to_select = ['review', 'sentiment']

    positive_examples = dataset.loc[positive_reviews, columns_to_select].sample(n)
    negative_examples = dataset.loc[negative_reviews, columns_to_select].sample(n)

    examples = pd.concat([positive_examples, negative_examples])
    randomized_examples = examples.sample(2*n, replace=False)

    return randomized_examples.to_json(orient='records')

# Generate a few-shot example using 2 positive and 2 negative samples
few_shot_examples_json = create_examples(cs_reviews_df, n=2)

# Few-shot system message with examples from the function
few_shot_system_message = f"""
You are a sentiment analysis assistant. Your task is to classify reviews as either Positive or Negative based on the review content. Below are some examples to guide you:

Examples:
{few_shot_examples_json}

Based on the examples above, classify the following review as either Positive or Negative:
"""

# Example review to classify
review_text = "ExpressWay Logistics' commitment to transparency gives us confidence in their services. They provide clear and upfront pricing, so we know exactly what to expect. With ExpressWay Logistics, there are no hidden fees or surprises, just reliable service at a fair price."

# User message with the review text
user_message_template = """```{courier_service_review}```"""

# Full message setup for the API call (System and User messages)
messages = [
    {"role": "system", "content": few_shot_system_message},  # System message with few-shot examples
    {"role": "user", "content": user_message_template.format(courier_service_review=review_text)}  # User message with the new review
]

# Calculate the number of tokens used by the few-shot prompt
num_tokens = num_tokens_from_messages(messages, model="gpt-3.5-turbo")
print(f"Number of tokens used: {num_tokens}")

# Define the API request URL (Azure OpenAI Endpoint)
url = f"{endpoint}/openai/deployments/{model_name}/chat/completions?api-version={api_version}"

# Define the payload (messages and max tokens)
payload = {
    "messages": messages,
    "max_tokens": 5  # Adjust the max tokens if needed
}

# Make the POST request to the Azure OpenAI API
response = requests.post(url, headers=headers, json=payload)

# Check if the request was successful
if response.status_code == 200:
    # Extract and print the result
    result = response.json()
    print("Model Prediction:", result['choices'][0]['message']['content'].strip())
else:
    print(f"Error: {response.status_code}, {response.text}")

Number of tokens used: 361
Model Prediction: Positive


**(D) Create Examples For Few shot prompts (2 Marks)**

In [None]:
# Generate few-shot examples from your dataset (e.g., 2 positive and 2 negative examples)
examples_json = create_examples(cs_reviews_df, n=2)

# Load the examples into a Python object using json.loads
examples = json.loads(examples_json)

# Now, format the examples properly into the system message for the few-shot prompt
formatted_examples = "\n".join([f"- Review: \"{example['review']}\"\n  Sentiment: {example['sentiment']}" for example in examples])

# Here's how the examples will look in the few-shot prompt
few_shot_system_message = f"""
You are a sentiment analysis assistant. Your task is to classify reviews as either Positive or Negative based on the review content. Below are some examples to guide you:

Examples:
{formatted_examples}

Based on the examples above, classify the following review as either Positive or Negative:
"""

# Example review to classify
review_text = "ExpressWay Logistics' pricing may seem attractive at first glance, but beware of some internal hidden fees that may sometimes add up. But still I appreciate that my parcel got delivered safely a day after the promised window."

# User message with the review text
user_message_template = """```{courier_service_review}```"""

# Full message setup for the API call (System and User messages)
messages = [
    {"role": "system", "content": few_shot_system_message},  # System message with few-shot examples
    {"role": "user", "content": user_message_template.format(courier_service_review=review_text)}  # User message with the new review
]

# Calculate the number of tokens used by the few-shot prompt
num_tokens = num_tokens_from_messages(messages, model="gpt-3.5-turbo")
print(f"Number of tokens used: {num_tokens}")

# Define the API request URL (Azure OpenAI Endpoint)
url = f"{endpoint}/openai/deployments/{model_name}/chat/completions?api-version={api_version}"

# Define the payload (messages and max tokens)
payload = {
    "messages": messages,
    "max_tokens": 5  # Adjust the max tokens if needed
}

# Make the POST request to the Azure OpenAI API
response = requests.post(url, headers=headers, json=payload)

# Check if the request was successful
if response.status_code == 200:
    # Extract and print the result
    result = response.json()
    print("Model Prediction:", result['choices'][0]['message']['content'].strip())
else:
    print(f"Error: {response.status_code}, {response.text}")

Number of tokens used: 408
Model Prediction: Sentiment: Negative


In [None]:
def create_prompt(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): system message with instructions for sentiment analysis
        examples (str): JSON string with list of examples
        user_message_template (str): string with a placeholder for courier service reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = [{'role':'system', 'content': system_message}]

    for example in json.loads(examples):
        example_review = example['review']
        example_sentiment = example['sentiment']

        few_shot_prompt.append(
            {
                'role': 'user',
                'content': user_message_template.format(
                    courier_service_review=example_review
                )
            }
        )

        few_shot_prompt.append(
            {'role': 'assistant', 'content': f"{example_sentiment}"}
        )

    return few_shot_prompt

**(E) Create Few Shot Prompt (2 Marks)**

In [None]:
# System message that instructs the model
system_message = """
You are a sentiment analysis assistant. Your task is to classify reviews as either Positive or Negative based on the review content. Follow the format of the examples below:
"""

# Generate a few-shot example using 2 positive and 2 negative samples
few_shot_examples_json = create_examples(cs_reviews_df, n=2)

# Example review to classify
review_text = "ExpressWay Logistics' pricing may seem attractive at first glance, but beware of some internal hidden fees that may sometimes add up. But still I appreciate that my parcel got delivered safely a day after the promised window."

# User message template with the review text
user_message_template = """```{courier_service_review}```"""

# Create few-shot prompt using the create_prompt function
few_shot_prompt = create_prompt(system_message, few_shot_examples_json, user_message_template)

# Add the new review you want the model to classify
few_shot_prompt.append(
    {"role": "user", "content": user_message_template.format(courier_service_review=review_text)}
)

# Print the few-shot prompt to inspect it
print(json.dumps(few_shot_prompt, indent=4))

# Define the API request URL (Azure OpenAI Endpoint)
url = f"{endpoint}/openai/deployments/{model_name}/chat/completions?api-version={api_version}"

# Define the payload (messages and max tokens)
payload = {
    "messages": few_shot_prompt,  # Pass the few-shot prompt
    "max_tokens": 5  # Adjust the max tokens if needed
}

# Make the POST request to the Azure OpenAI API
response = requests.post(url, headers=headers, json=payload)

# Check if the request was successful
if response.status_code == 200:
    # Extract and print the result
    result = response.json()
    print("Model Prediction:", result['choices'][0]['message']['content'].strip())
else:
    print(f"Error: {response.status_code}, {response.text}")

[
    {
        "role": "system",
        "content": "\nYou are a sentiment analysis assistant. Your task is to classify reviews as either Positive or Negative based on the review content. Follow the format of the examples below:\n"
    },
    {
        "role": "user",
        "content": "```ExpressWay Logistics' delivery drivers have repeatedly left packages in unsafe and unsecured locations, putting them at risk of theft or damage. Despite specific instructions to leave packages in a designated area, they have ignored these requests and left them exposed to the elements or in plain view of passersby. The lack of professionalism and regard for customer property from ExpressWay Logistics' drivers is unacceptable, and I'm deeply disappointed by their disregard for basic security protocols.```"
    },
    {
        "role": "assistant",
        "content": "Negative"
    },
    {
        "role": "user",
        "content": "```I encountered numerous issues with ExpressWay Logistics, includi

In [None]:
few_shot_prompt

[{'role': 'system',
  'content': '\nYou are a sentiment analysis assistant. Your task is to classify reviews as either Positive or Negative based on the review content. Follow the format of the examples below:\n'},
 {'role': 'user',
  'content': "```ExpressWay Logistics' delivery drivers have repeatedly left packages in unsafe and unsecured locations, putting them at risk of theft or damage. Despite specific instructions to leave packages in a designated area, they have ignored these requests and left them exposed to the elements or in plain view of passersby. The lack of professionalism and regard for customer property from ExpressWay Logistics' drivers is unacceptable, and I'm deeply disappointed by their disregard for basic security protocols.```"},
 {'role': 'assistant', 'content': 'Negative'},
 {'role': 'user',
  'content': '```I encountered numerous issues with ExpressWay Logistics, including late deliveries, damaged packaging, and unhelpful customer support. I will not be using 

In [None]:
num_tokens_from_messages(few_shot_prompt)

382

##**Step 4: Evaluate prompts (8 Marks)**

(A) Evaluate Zero Shot Prompt (2 Marks)

(B) Evaluate Few Shot Prompt (2 marks)

(C) Calculate Mean and Standard Deviation for Zero Shot Prompt and Few Shot Prompt (4 Marks)

In [None]:
def evaluate_prompt(prompt, gold_examples, user_message_template):

    """
    Return the micro-F1 score for predictions on gold examples.
    For each example, we make a prediction using the prompt. Gold labels and
    model predictions are aggregated into lists and compared to compute the
    F1 score.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for courier service review

    Output:
        micro_f1_score (float): Micro-F1 score computed by comparing model predictions
                                with ground truth
    """

    model_predictions, ground_truths, review_texts = [], [], []

    for example in json.loads(gold_examples):
        gold_input = example['review']
        user_input = [
            {
                'role':'user',
                'content': user_message_template.format(courier_service_review=gold_input)
            }
        ]

        try:
            response = client.chat.completions.create(
                model=chat_model_id,
                messages=prompt+user_input,
                temperature=0, # <- Note the low temperature (For a deterministic response)
                max_tokens=2 # <- Note how we restrict the output to not more than 2 tokens
            )

            prediction = response.choices[0].message.content
            # response = openai.ChatCompletion.create(
            #     deployment_id=chat_model_id,
            #     messages=prompt+user_input,
            #     temperature=0, # <- Note the low temperature(For a deterministic response)
            #     # max_tokens=2 # <- Note how we restrict the output to not more than 2 tokens
            # )

            # prediction = response['choices'][0]['message']['content']
            model_predictions.append(prediction.strip()) # <- removes extraneous white spaces
            ground_truths.append(example['sentiment'])
            review_texts.append(gold_input)

        except Exception as e:
            continue

    micro_f1_score = f1_score(ground_truths, model_predictions, average="micro")

    table_data = [[text, pred, truth] for text, pred, truth in zip(review_texts, model_predictions, ground_truths)]
    headers = ["Review", "Model Prediction", "Ground Truth"]
    print(tabulate(table_data, headers=headers, tablefmt="grid"))

    return micro_f1_score

**(A) Evaluate zero shot prompt (2 Marks)**

In [None]:
# Zero-shot system message for task definition
zero_shot_system_message = """
You are a sentiment analysis assistant. Your objective is to read courier service reviews enclosed within triple backticks and classify the sentiment as either Positive or Negative.
Ensure you only respond with 'Positive' or 'Negative' without providing any additional information.
The format for the reviews will be as follows:
```{courier_service_review}```

Your response should be in the following format:
- Output: 'Positive' or 'Negative'
"""

# Example gold review text from the dataset for evaluation (use this or replace it with your gold examples)
review_text = "ExpressWay Logistics' pricing may seem attractive at first glance, but beware of some internal hidden fees that may sometimes add up."

# User message template
user_message_template = """```{courier_service_review}```"""

# Create zero-shot prompt
zero_shot_prompt = [
    {"role": "system", "content": zero_shot_system_message},
    {"role": "user", "content": user_message_template.format(courier_service_review=review_text)}
]

**(B) Evaluate few shot prompt (2 Marks)**

In [None]:
# Gold examples JSON string (for evaluation purposes, use your actual gold examples)
gold_examples = cs_gold_examples_df.loc[:, columns_to_select].to_json(orient='records')

# Evaluate the zero-shot prompt using the evaluation function
micro_f1_zero_shot = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)
print(f"Zero-Shot Micro F1 Score: {micro_f1_zero_shot}")

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

In [None]:
# System message that instructs the model
system_message = """
You are a sentiment analysis assistant. Your task is to classify reviews as either Positive or Negative based on the review content. Follow the format of the examples below:
"""

# Generate a few-shot example using 2 positive and 2 negative samples
few_shot_examples_json = create_examples(cs_reviews_df, n=2)

# Example review to classify (as part of evaluation)
review_text = "ExpressWay Logistics' pricing may seem attractive at first glance, but beware of some internal hidden fees that may sometimes add up. But still I appreciate that my parcel got delivered safely a day after the promised window."

# User message template for classification
user_message_template = """```{courier_service_review}```"""

# Create the few-shot prompt using the create_prompt function
few_shot_prompt = create_prompt(system_message, few_shot_examples_json, user_message_template)

# Add the new review to classify as the last user message in the few-shot prompt
few_shot_prompt.append(
    {"role": "user", "content": user_message_template.format(courier_service_review=review_text)}
)

# Print the few-shot prompt to check its structure (optional)
#print(json.dumps(few_shot_prompt, indent=4))

In [None]:
# Gold examples JSON string (for evaluation purposes, use your actual gold examples)
gold_examples = cs_gold_examples_df.loc[:, columns_to_select].to_json(orient='records')

# Evaluate the few-shot prompt using the evaluation function
micro_f1_few_shot = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)
print(f"Few-Shot Micro F1 Score: {micro_f1_few_shot}")

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

In [None]:
num_eval_runs = 5

In [None]:
zero_shot_performance = []
few_shot_performance = []

In [None]:
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(cs_examples_df)

    # Assemble the zero shot prompt with these examples
    zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]
    # zero_shot_prompt = create_prompt(zero_shot_system_message, examples, user_message_template)

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate zero shot prompt accuracy on gold examples
    zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

    # Evaluate few shot prompt accuracy on gold examples
    few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    zero_shot_performance.append(zero_shot_micro_f1)
    few_shot_performance.append(few_shot_micro_f1)

  0%|          | 0/5 [00:00<?, ?it/s]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 20%|██        | 1/5 [04:26<17:46, 266.73s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 40%|████      | 2/5 [09:06<13:43, 274.50s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 60%|██████    | 3/5 [12:42<08:15, 247.81s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 80%|████████  | 4/5 [16:52<04:08, 248.66s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

100%|██████████| 5/5 [20:56<00:00, 251.27s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      




**(C) Calculate Mean and Standard Deviation for Zero Shot Prompt and Few Shot Prompt (4 Marks)**

Compute the average (mean) and measure the variability (standard deviation) of the evaluation scores for both zero shot and few shot prompts.

In [None]:
#Compute the average (mean) and measure the variability (standard deviation) of the evaluation scores for both zero shot and few shot prompts.

zero_shot_avg = np.mean(zero_shot_performance)
zero_shot_std = np.std(zero_shot_performance)

few_shot_avg = np.mean(few_shot_performance)
few_shot_std = np.std(few_shot_performance)


print(f"Zero-Shot Average Performance: {zero_shot_avg:.4f}, Standard Deviation: {zero_shot_std:.4f}")
print(f"Few-Shot Average Performance: {few_shot_avg:.4f}, Standard Deviation: {few_shot_std:.4f}")


Zero-Shot Average Performance: 0.9259, Standard Deviation: 0.0000
Few-Shot Average Performance: 0.9179, Standard Deviation: 0.0146


In [None]:
sentiment_distribution = sentiment_counts / sentiment_counts.sum() * 100
print(f"Percentage of Positive Reviews: {sentiment_distribution['Positive']:.2f}%")
print(f"Percentage of Negative Reviews: {sentiment_distribution['Negative']:.2f}%")

Percentage of Positive Reviews: 51.91%
Percentage of Negative Reviews: 48.09%


# **Step 5: Observation, Insights, and Business Perspective**

**Observations**:

Zero Shot vs. Few Shot Performance
- Zero shot approach had a slightly better F1 score showing consistent performance and little variablity.
- The few shot approach performed well, but showcased a small degree of variability in its predicitions. This suggests this model's performance depends on the quality of the examples used in few shot learning.

Sentiment Analysis
- The AI model was successfully able to understand the sentiment of customer feedback and classifed reviews either positively or negatively.

**Insights**:

Distribution of Positive and Negative Reviews
- We can calculate the percentage of positve and negative customer reviews.
- 51.91% of customer reviews were positive and 48.09% were negative.

**Business Perspective**
- Sentiment analysis can help ExpressWay Logistics improve its services in many ways.
- The company can use this classification to identify patterns in customer feedback over time. They can gain a deep understanding of customer satisfaction, identify areas for improvement, and implement strategies to address pain points. This can lead to better service delivery, increased customer loyalty, and improved brand reputation.