<a href="https://colab.research.google.com/github/Alina-Tur/sentiment_analysis/blob/main/Copy_of_Project2_Alina.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project : A Case Study of ExpressWay Logistics**

**Business Overview:**

ExpressWay Logistics is a dynamic logistics service provider, committed to delivering efficient, reliable and cost-effective courier transportation and warehousing solutions. With a focus on speed, precision and customer satisfaction, we aim to be the go-to partner for our customers seeking seamless courier services. Our core service involves ensuring operational efficiency throughout our delivery and courier services, including inventory management, durable packaging and swift dispatch of couriers, real time tracking of shipments and on-time delivery of couriers as promised. We are committed to enhance our logistics and courier services and improve seamless connectivity for our customers.

**Current Challenge:**

ExpressWay Logistics faces numerous challenges in ensuring seamless deliveries and customer satisfaction. These challenges include managing various customer demands simultaneously, addressing delays in deliveries and ensuring products arrive intact and safe. Additionally, the company struggles with complexity of efficiently storing and handling a large volume of packages and ultimately meeting customer expectations. Moreover, maintaining a skilled workforce capable of handling various aspects of logistics operations presents its own set of challenges. Overcoming these obstacles requires a comprehensive approach that integrates innovative technology, strategic planning, and continuous improvement initiatives to ensure smooth operations and exceptional service delivery.

**Objective:**

The primary objective is to conduct a sentiment analysis of user-generated reviews across various digital channels and platforms. By paying attention to their feedback, the company wants to find ways to make their services better - like handling different customer demands simultaneously, dealing with late deliveries, and keeping packages secured and intact. Through the application of prompt engineering methodologies and sentiment analysis, it will be figured out if sentiments expressed by users for the courier services are Positive or Negative. This will help to understand where it is needed to improve in order to meet customer expectations and keep them happy.

**Data Description:**

The dataset titled "courier-service_reviews.csv" is structured to facilitate sentiment analysis for courier service reviews. Here's a brief description of the data columns:

1. id: This column contains unique identifiers for each review entry. It helps in distinguishing and referencing individual reviews.
2. review: This column includes the actual text of the courier service reviews. The reviews are likely composed of customer opinions and experiences regarding different aspects of the services provided by ExpressWay Logistics.
3. sentiment: This column provides an additional layer of classification (positive and negative) for the mentioned reviews.

##**Step 1. Setup**


### Installation

In [None]:
!pip install openai==0.28.0 tiktoken datasets session-info --quiet

### Imports

Import all Python packages required to access the Azure Open AI API and to access datasets and create examples.

In [None]:
# Import all Python packages required to access the Azure Open AI API.
# Import additional packages required to access datasets and create examples.

import openai
import json
import random
import tiktoken
import session_info

import statistics

import pandas as pd
import numpy as np

from collections import Counter
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from tabulate import tabulate

### Authentication

In [None]:
# Define  configuration information
config_data = {
    "AZURE_OPENAI_KEY": "2edea9c0d5d145d28efa4215c89d021c",
    "AZURE_OPENAI_ENDPOINT": "https://openairesoursegroup.openai.azure.com/",
    "AZURE_OPENAI_APITYPE": "azure",
    "AZURE_OPENAI_APIVERSION": "2024-02-01",
    "CHATGPT_MODEL": "gpt4latest"
}

#Replace "" with your credentials

In [None]:
# Write the configuration information into the config.json file
with open('config.json', 'w') as config_file:
    json.dump(config_data, config_file, indent=4)

print("Config file created successfully!")

Config file created successfully!


Reading the config.json file

In [None]:
with open('config.json', 'r') as az_creds:
    data = az_creds.read()

In [None]:
creds = json.loads(data)

In [None]:
openai.api_key = creds["AZURE_OPENAI_KEY"]
openai.api_base = creds["AZURE_OPENAI_ENDPOINT"]
openai.api_type = creds["AZURE_OPENAI_APITYPE"]
openai.api_version = creds["AZURE_OPENAI_APIVERSION"]

In [None]:
chat_model_id = creds["CHATGPT_MODEL"]

### Utilities

Define a function for token counter to keep track of the completion window available in the prompt.

In [None]:
def num_tokens_from_messages(messages):

    """
    Return the number of tokens used by a list of messages.
    Adapted from the Open AI cookbook token counter
    """

    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

    # Each message is sandwiched with <|start|>role and <|end|>
    # Hence, messages look like: <|start|>system or user or assistant{message}<|end|>

    tokens_per_message = 3 # token1:<|start|>, token2:system(or user or assistant), token3:<|end|>

    num_tokens = 0

    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

    return num_tokens

# Task: Sentiment Analysis

##**Step 2: Assemble Data**



**Upload and read csv file**

In [None]:
cs_reviews_df = pd.read_csv('/content/courier-service_reviews.csv')
cs_reviews_df.head()


Unnamed: 0,id,review,sentiment
0,1,ExpressWay Logistics' commitment to transparen...,Positive
1,2,The tracking system implemented by ExpressWay ...,Positive
2,3,ExpressWay Logistics is a lifesaver when it co...,Positive
3,4,Expressway Logistics is the worst courier serv...,Negative
4,5,ExpressWay Logistics failed to meet my expecta...,Negative


**Creating a new column named as "label" (target column) corresponding to the sentiments in the dataset**

In [None]:
cs_reviews_df['label'] = cs_reviews_df['sentiment'].apply(lambda x: 1 if x == "Positive" else 0)
cs_reviews_df.head()

Unnamed: 0,id,review,sentiment,label
0,1,ExpressWay Logistics' commitment to transparen...,Positive,1
1,2,The tracking system implemented by ExpressWay ...,Positive,1
2,3,ExpressWay Logistics is a lifesaver when it co...,Positive,1
3,4,Expressway Logistics is the worst courier serv...,Negative,0
4,5,ExpressWay Logistics failed to meet my expecta...,Negative,0


Split the data into two segments (use split_ratio of 0.2) - one segment (80%) that gives us a pool to draw few-shot examples from and another segment (20%) that gives us a pool of gold examples.

In [None]:
cs_examples_df, cs_gold_examples_df = train_test_split(
    cs_reviews_df, #<- the full dataset
    test_size=0.2, #<- 20% random sample selected for gold examples
    random_state=42 #<- ensures that the splits are the same for every session
)

Select the correct columns for further analysis which should exclude the target column.

In [None]:
columns_to_select = ['review','sentiment']

Create gold examples and select a random sample of rows from the gold examples dataframe(cs_gold_examples_df).

In [None]:
gold_examples = (
        cs_gold_examples_df.loc[:, columns_to_select]
                                     .sample(21, random_state=42) #<- ensures that gold examples are the same for every session
                                     .to_json(orient='records')   # for better readability
)

To select gold examples for this session, sample randomly from the test data using a `random_state=42`. This ensures that the examples from multiple runs of the sampling are the same (i.e., they are randomly selected but do not change between different runs of the notebook). Note that it is done only to keep execution times low for illustration. In practise, large number of gold examples facilitate robust estimates of model accuracy.

##**Step 3: Derive Prompt**


#### Create prompts

In [None]:
user_message_template = """```{courier_service_review}```"""

**Zero Shot Prompt**

In [None]:
zero_shot_system_message = """Please analyze the reviews. Classify them as positive or negative based on the sentiment score.
Follow the instruction:
1. Each individual Review will be enclosed within triple backticks.
2. Use the following criteria for classification:

Positive Reviews: Reviews with a sentiment score greater than or equal to 0.
Negative Reviews: Reviews with a sentiment score less than 0.


3. The output should only be "Positive" or  "Negative".  Do not explain your results.
"""


In [None]:
zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

**Few Shot Prompt**

In [None]:
few_shot_system_message = """Please analyze the reviews. Classify them as positive or negative based on the sentiment score.
Follow the instruction:
1. Each individual Review will be enclosed within triple backticks.
2. Use the following criteria for classification:

Positive Reviews: Reviews with a sentiment score greater than or equal to 0.
Negative Reviews: Reviews with a sentiment score less than 0.


3. The output should only be "Positive" or  "Negative".  Do not explain your results.
"""


Merely selecting random samples from the polarity subsets is not enough because the examples included in a prompt are prone to a set of known biases such as:
 - Majority label bias (frequent answers in predictions)
 - Recency bias (examples near the end of the prompt)


To avoid these biases, it is important to have a balanced set of examples that are arranged in random order.

###Define "create_examples" function

In [None]:
def create_examples(dataset, n=4):

    """
    Return a JSON list of randomized examples of size 2n with two classes.
    Create subsets of each class, choose random samples from the subsets,
    merge and randomize the order of samples in the merged list.
    Each run of this function creates a different random sample of examples
    chosen from the training data.

    Args:
        dataset (DataFrame): A DataFrame with examples (review + label)
        n (int): number of examples of each class to be selected

    Output:
        randomized_examples (JSON): A JSON with examples in random order
    """

    positive_reviews = (dataset.sentiment == 'Positive')
    negative_reviews = (dataset.sentiment == 'Negative')
    columns_to_select = ['review', 'sentiment']

    positive_examples = dataset.loc[positive_reviews, columns_to_select].sample(n)
    negative_examples = dataset.loc[negative_reviews, columns_to_select].sample(n)

    examples = pd.concat([positive_examples, negative_examples])
    # sampling without replacement is equivalent to random shuffling
    randomized_examples = examples.sample(2*n, replace=False)

    return randomized_examples.to_json(orient='records')

In [None]:
examples = create_examples(cs_examples_df, 2)

In [None]:
print(examples)


[{"review":"Although ExpressWay Logistics boasts about their extensive network, the reality is their tracking system is unreliable. Packages often go missing without any explanation or compensation for the inconvenience caused.","sentiment":"Negative"},{"review":"My experience with ExpressWay Logistics was marred by constant delays and poor communication. Despite numerous assurances that my package would arrive on time, it was consistently delayed without explanation. When I tried to inquire about the status of my shipment, I was met with unhelpful responses and a complete lack of transparency.","sentiment":"Negative"},{"review":"I recently switched to ExpressWay Logistics for my shipping needs, and I couldn't be happier with the decision. Their user-friendly platform makes it easy to book shipments and track packages, saving me valuable time. What's more, their competitive rates and reliable service have helped me streamline my shipping costs and improve my bottom line. ExpressWay Log

**2. Define "create_prompt" function**

In [None]:
def create_prompt(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): system message with instructions for sentiment analysis
        examples (str): JSON string with list of examples
        user_message_template (str): string with a placeholder for courier service reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = [{'role':'system', 'content': system_message}]

    for example in json.loads(examples):
        example_review = example['review']
        example_sentiment = example['sentiment']

        few_shot_prompt.append(
            {
                'role': 'user',
                'content': user_message_template.format(
                    courier_service_review=example_review
                )
            }
        )

        few_shot_prompt.append(
            {'role': 'assistant', 'content': f"{example_sentiment}"}
        )

    return few_shot_prompt

In [None]:
few_shot_prompt = create_prompt(
    few_shot_system_message,
    examples,
    user_message_template
)

In [None]:
print(few_shot_prompt)


[{'role': 'system', 'content': 'Please analyze the reviews. Classify them as positive or negative based on the sentiment score. \nFollow the instruction:\n1. Each individual Review will be enclosed within triple backticks.\n2. Use the following criteria for classification:\n\nPositive Reviews: Reviews with a sentiment score greater than or equal to 0.\nNegative Reviews: Reviews with a sentiment score less than 0.\n\n\n3. The output should only be "Positive" or  "Negative".  Do not explain your results.\n'}, {'role': 'user', 'content': '```Although ExpressWay Logistics boasts about their extensive network, the reality is their tracking system is unreliable. Packages often go missing without any explanation or compensation for the inconvenience caused.```'}, {'role': 'assistant', 'content': 'Negative'}, {'role': 'user', 'content': '```My experience with ExpressWay Logistics was marred by constant delays and poor communication. Despite numerous assurances that my package would arrive on t

##**Step 4: Evaluate prompts**

### Define Evaluation scorer

In [None]:
def evaluate_prompt(prompt, gold_examples, user_message_template):

    """
    Return the micro-F1 score for predictions on gold examples.
    For each example, we make a prediction using the prompt. Gold labels and
    model predictions are aggregated into lists and compared to compute the
    F1 score.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for courier service review

    Output:
        micro_f1_score (float): Micro-F1 score computed by comparing model predictions
                                with ground truth
    """

    model_predictions, ground_truths, review_texts = [], [], []

    for example in json.loads(gold_examples):
        gold_input = example['review']
        user_input = [
            {
                'role':'user',
                'content': user_message_template.format(courier_service_review=gold_input)
            }
        ]

        try:
            response = openai.ChatCompletion.create(
                deployment_id=chat_model_id,
                messages=prompt+user_input,
                temperature=0, # <- low temperature(For a deterministic response)
                max_tokens=2 # <-  restrict the output to not more than 2 tokens
            )

            prediction = response['choices'][0]['message']['content']
            model_predictions.append(prediction.strip()) # <- removes extraneous white spaces
            ground_truths.append(example['sentiment'])
            review_texts.append(gold_input)

        except Exception as e:
            continue

    micro_f1_score = f1_score(ground_truths, model_predictions, average="micro")

    table_data = [[text, pred, truth] for text, pred, truth in zip(review_texts, model_predictions, ground_truths)]
    headers = ["Review", "Model Prediction", "Ground Truth"]
    print(tabulate(table_data, headers=headers, tablefmt="grid"))

    return micro_f1_score


**Evaluate zero shot prompt**

In [None]:
zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples,zero_shot_system_message )
print(zero_shot_micro_f1)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

**Evaluate few shot prompt**




In [None]:
few_shot_micro_f1 =evaluate_prompt(few_shot_prompt, gold_examples, few_shot_system_message )
print(few_shot_micro_f1)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 This is just one choice of examples. Multiple sets of examples will need to be used for these evaluations to understand the variability in the F1 score for the few-shot prompt. For instance, evaluations could be run for the few-shot prompt 5 times.

In [None]:
num_eval_runs = 5

In [None]:
zero_shot_performance = []
few_shot_performance = []

In [None]:
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(cs_examples_df)

    # Assemble the zero shot prompt with these examples
    zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate zero shot prompt accuracy on gold examples
    zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

    # Evaluate few shot prompt accuracy on gold examples
    few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    zero_shot_performance.append(zero_shot_micro_f1)
    few_shot_performance.append(few_shot_micro_f1)

  0%|          | 0/5 [00:00<?, ?it/s]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 20%|██        | 1/5 [00:21<01:27, 21.83s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 40%|████      | 2/5 [00:42<01:04, 21.41s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 60%|██████    | 3/5 [00:57<00:36, 18.18s/it]

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                    | Model Prediction   | Ground Truth   |
| The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damag

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
 80%|████████  | 4/5 [01:00<00:12, 12.17s/it]

+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                             

100%|██████████| 5/5 [01:31<00:00, 18.24s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      




**Mean and Standard Deviation for Zero Shot and Few Shot**



In [None]:
print(zero_shot_performance)

[0.8571428571428571, 0.8571428571428571, 0.8571428571428571, 0.0, 0.8571428571428571]


In [None]:
print(few_shot_performance)

[0.8571428571428571, 0.9047619047619048, 1.0, 0.0, 0.9]


In [None]:
zero_shot_mean = statistics.mean(zero_shot_performance)
zero_shot_std = statistics.stdev(zero_shot_performance)
print(f"Mean Zero Shot: {zero_shot_mean}")
print(f"Standard Deviation Zero Shot: {zero_shot_std}")


Mean Zero Shot: 0.6857142857142857
Standard Deviation Zero Shot: 0.3833259389999639


In [None]:
few_shot_mean = statistics.mean(few_shot_performance)
few_shot_std = statistics.stdev(few_shot_performance)
print(f"Mean Few Shot: {few_shot_mean}")
print(f"Standard Deviation Few Shot: {few_shot_std}")


Mean Few Shot: 0.7323809523809524
Standard Deviation Few Shot: 0.4127283261442254


In [None]:
num_eval_runs_2 = 20

In [None]:
zero_shot_performance_2 = []
few_shot_performance_2 = []

In [None]:
for _ in tqdm(range(num_eval_runs_2)):

    # For each run create a new sample of examples
    examples = create_examples(cs_examples_df)

    # Assemble the zero shot prompt with these examples
    zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate zero shot prompt accuracy on gold examples
    zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

    # Evaluate few shot prompt accuracy on gold examples
    few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    zero_shot_performance_2.append(zero_shot_micro_f1)
    few_shot_performance_2.append(few_shot_micro_f1)

  0%|          | 0/20 [00:00<?, ?it/s]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

  5%|▌         | 1/20 [00:25<08:00, 25.28s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 10%|█         | 2/20 [00:42<06:14, 20.80s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 15%|█▌        | 3/20 [00:54<04:42, 16.60s/it]

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
 20%|██        | 4/20 [00:58<03:03, 11.48s/it]

+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


 25%|██▌       | 5/20 [01:01<02:08,  8.59s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 30%|███       | 6/20 [01:08<01:53,  8.08s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                                                                      

 35%|███▌      | 7/20 [01:16<01:45,  8.12s/it]

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                               | Model Prediction   | Ground Truth   |
| ExpressWay Logistics consistently delivers on its promises, providing fast, efficient, and reliable courier services that exceeded my expectations.                                                                  | Positive           | Positive       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------

 40%|████      | 8/20 [01:33<02:08, 10.73s/it]

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                                

 45%|████▌     | 9/20 [01:57<02:43, 14.85s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
 50%|█████     | 10/20 [02:00<01:53, 11.34s/it]

+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


 55%|█████▌    | 11/20 [02:04<01:20,  8.96s/it]

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                                

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


 60%|██████    | 12/20 [02:08<01:00,  7.62s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                                                                      

 65%|██████▌   | 13/20 [02:16<00:53,  7.61s/it]

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                               | Model Prediction   | Ground Truth   |
| I needed to ship perishable items internationally, and ExpressWay Logistics handled the process flawlessly. They ensured that the items were properly packaged and transported in temperature-controlled conditions. | Positive           | Positive       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------

 70%|███████   | 14/20 [02:24<00:46,  7.74s/it]

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                              | Model Prediction   | Ground Truth   |
| While ExpressWay Logistics' pricing is competitive, their lack of accountability for delays and damages is disappointing. It's frustrating to pay for a service that fails to deliver on its promises consistently. | Negative           | Negative       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--

 75%|███████▌  | 15/20 [02:54<01:11, 14.33s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 80%|████████  | 16/20 [03:30<01:23, 20.97s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 85%|████████▌ | 17/20 [03:50<01:02, 20.69s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 90%|█████████ | 18/20 [04:05<00:37, 18.89s/it]

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                                

  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
 95%|█████████▌| 19/20 [04:08<00:14, 14.12s/it]

+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))


+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+


  _warn_prf(average, "true nor predicted", "F-score is", len(true_sum))
100%|██████████| 20/20 [04:11<00:00, 12.55s/it]

+----------+--------------------+----------------+
| Review   | Model Prediction   | Ground Truth   |
+----------+--------------------+----------------+





In [None]:
print(zero_shot_performance_2)


[0.8571428571428571, 0.8571428571428571, 1.0, 0.0, 0.0, 0.0, 1.0, 0.8571428571428571, 0.8571428571428571, 0.8571428571428571, 0.0, 1.0, 1.0, 0.9230769230769231, 0.8181818181818182, 0.8571428571428571, 0.8571428571428571, 0.8571428571428571, 0.9047619047619048, 0.0, 0.8571428571428571, 0.8571428571428571, 0.8571428571428571, 1.0, 0.0, 0.9166666666666666, 0.7, 0.8571428571428571, 0.8571428571428571, 1.0, 0.0, 0.0, 0.9090909090909091, 0.8000000000000002, 0.8571428571428571, 0.8571428571428571, 0.8571428571428571, 0.8571428571428571, 0.0, 0.0]


In [None]:
print(few_shot_performance_2)

[0.9047619047619048, 0.9047619047619048, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.9523809523809523, 0.9473684210526315, 0.0, 0.0, 1.0, 1.0, 1.0, 0.9047619047619048, 0.9523809523809523, 1.0, 1.0, 1.0, 0.9523809523809523, 0.9047619047619048, 1.0, 0.0, 1.0, 1.0, 1.0, 0.9230769230769231, 0.85, 0.0, 1.0, 1.0, 1.0, 1.0, 0.9047619047619048, 0.9047619047619048, 0.8421052631578947, 1.0, 0.0, 0.0]


In [None]:
zero_shot_mean_2 = statistics.mean(zero_shot_performance_2)
zero_shot_std_2 = statistics.stdev(zero_shot_performance_2)
print(f"Mean Zero Shot 20 evaluations: {zero_shot_mean}_2")
print(f"Standard Deviation Zero Shot 20 evaluations: {zero_shot_std_2}")

Mean Zero Shot 20 evaluations: 0.6857142857142857_2
Standard Deviation Zero Shot 20 evaluations: 0.3926589471823671


In [None]:
few_shot_mean_2 = statistics.mean(few_shot_performance_2)
few_shot_std_2 = statistics.stdev(few_shot_performance_2)
print(f"Mean Few Shot 20 evaluations: {few_shot_mean_2}")
print(f"Standard Deviation Few Shot 20 evaluations: {few_shot_std_2}")

Mean Few Shot 20 evaluations: 0.7212066223250434
Standard Deviation Few Shot 20 evaluations: 0.4239107484427898


##**Step 5: Observation and Insights and Business perspective**




**F1 Score Analysis**
<br>Mean Zero Shot F1 Score: 0.686 (approximately)
<br>Standard Deviation Zero Shot: 0.383
<br>Mean Few Shot F1 Score: 0.732 (approximately)
<br>Standard Deviation Few Shot: 0.413
<br>From the F1 scores, we can observe that the few-shot prompts perform slightly better on average compared to zero-shot prompts. The higher mean F1 score indicates better overall performance in classifying positive and negative reviews. However, the relatively high standard deviations in both scenarios suggest variability in the model's performance, indicating that there are instances where the model's classification accuracy fluctuates significantly.

**Percentage Breakdown of Positive and Negative Reviews**

In [None]:

positive_percentage = cs_reviews_df['label'].mean() * 100
negative_percentage = (1 - cs_reviews_df['label'].mean()) * 100

In [None]:
print(positive_percentage)

51.908396946564885


In [None]:
print(negative_percentage)

48.091603053435115


Based on the percentage breakdown, we can see that the company has 48% of negatively classified reviews, and it is important for the business to improve the customer satisfaction.

By accurately classifying customer reviews as positive or negative, ExpressWay Logistics can gain valuable insights into customer satisfaction. Understanding the sentiment of customer reviews can help the company identify areas of improvement and enhance overall service quality. For example, if negative reviews frequently mention delays in delivery, the company can take specific actions to address and mitigate these issues. Classifying and analyzing customer reviews enable the company to proactively address issues that are causing dissatisfaction. By tracking the sentiment over time, ExpressWay Logistics can detect patterns and trends in customer feedback. This allows the company to implement changes before minor issues escalate into significant problems, thereby improving customer retention and loyalty.

The analysis of positive and negative reviews through prompt-based classification using OpenAI has provided valuable insights into customer sentiment. The performance metrics indicate that few-shot prompts are slightly more effective than zero-shot prompts, though there is variability in both cases. The current model needs to be improved by adding more examples to classify more accurately the reviews.By leveraging these insights, ExpressWay Logistics can enhance customer satisfaction, proactively resolve issues, make informed strategic  decisions, and improve marketing strategies. Overall, the ability to accurately classify and analyze customer reviews is a powerful tool for driving business improvements and achieving better customer outcomes.