# **Project : A Case Study of ExpressWay Logistics**

**Business Overview:**

ExpressWay Logistics is a dynamic logistics service provider, committed to delivering efficient, reliable and cost-effective courier transportation and warehousing solutions. With a focus on speed, precision and customer satisfaction, we aim to be the go-to partner for our customers seeking seamless courier services. Our core service involves ensuring operational efficiency throughout our delivery and courier services, including inventory management, durable packaging and swift dispatch of couriers, real time tracking of shipments and on-time delivery of couriers as promised. We are committed to enhance our logistics and courier services and improve seamless connectivity for our customers.

**Current Challenge:**

ExpressWay Logistics faces numerous challenges in ensuring seamless deliveries and customer satisfaction. These challenges include managing various customer demands simultaneously, addressing delays in deliveries and ensuring products arrive intact and safe. Additionally, the company struggles with complexity of efficiently storing and handling a large volume of packages and ultimately meeting customer expectations. Moreover, maintaining a skilled workforce capable of handling various aspects of logistics operations presents its own set of challenges. Overcoming these obstacles requires a comprehensive approach that integrates innovative technology, strategic planning, and continuous improvement initiatives to ensure smooth operations and exceptional service delivery.

**Objective:**

Our primary objective is to conduct a sentiment analysis of user-generated reviews across various digital channels and platforms. By paying attention to their feedback, we want to find ways to make our services better - like handling different customer demands simultaneously, dealing with late deliveries, and keeping packages secured and intact. Through the application of prompt engineering methodologies and sentiment analysis, we'll figure out if sentiments expressed by users for our courier services are Positive or Negative. This will help us understand where we need to improve in order to meet customer expectations and keep them happy. With a focus on getting better all the time, we'll overcome the challenges at ExpressWay Logistics and make our services the best.

**Data Description:**

The dataset titled "courier-service_reviews.csv" is structured to facilitate sentiment analysis for courier service reviews. Here's a brief description of the data columns:

1. id: This column contains unique identifiers for each review entry. It helps in distinguishing and referencing individual reviews.
2. review: This column includes the actual text of the courier service reviews. The reviews are likely composed of customer opinions and experiences regarding different aspects of the services provided by ExpressWay Logistics.
3. sentiment: This column provides an additional layer of classification (positive and negative) for the mentioned reviews.

##**Step 1. Setup (2 Marks)**

(A) Writing/Creating the config.json file  (2 Marks)

### Installation

In [1]:
!pip install openai==1.2 tiktoken datasets session-info --quiet

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m219.9/219.9 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m7.7 MB/s[0m eta [36m0:00:0

### Imports

In [2]:
# Import all Python packages required to access the Azure Open AI API.
# Import additional packages required to access datasets and create examples.

from openai import AzureOpenAI
import json
import random
import tiktoken
import session_info

import pandas as pd
import numpy as np

from collections import Counter
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from tabulate import tabulate

In [3]:
# session_info.show()
session_info.show(excludes=['backports'])

### Authentication

**(A) Writing/Creating the config.json file (2 Marks)**

In [4]:
# Define your configuration information

config_data = {
    "AZURE_OPENAI_KEY": "bbecff9a69d34d729b845bf94cbb9dbb",
    "AZURE_OPENAI_ENDPOINT": "https://gen-ai-msft-april24-subratmuruni.openai.azure.com/",
    "AZURE_OPENAI_APIVERSION": "2024-05-01-preview",
    "CHATGPT_MODEL": "Subrat_Muruni-GL-Project-1"
}



In [5]:
# Write the configuration information into the config.json file
with open('config.json', 'w') as config_file:
    json.dump(config_data, config_file, indent=4)

print("Config file created successfully!")

Config file created successfully!


In [6]:
# read the config file into variable data
with open('config.json', 'r') as az_creds:
    data = az_creds.read()

In [7]:
# load data from the json config file
creds = json.loads(data)

In [8]:
# assign values from the config file to establish connection to OpenAI
client = AzureOpenAI(
    azure_endpoint=creds["AZURE_OPENAI_ENDPOINT"],
    api_key=creds["AZURE_OPENAI_KEY"],
    api_version=creds["AZURE_OPENAI_APIVERSION"]
)

In [9]:
# defines the mode being used for the task
chat_model_id = creds["CHATGPT_MODEL"]

In [10]:
# print GPT model being used
chat_model_id

'Subrat_Muruni-GL-Project-1'

### Utilities

In [11]:
# write a function to calculte the input tokens
def num_tokens_from_messages(messages):

    """
    Return the number of tokens used by a list of messages.
    Adapted from the Open AI cookbook token counter
    """

    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

    # Each message is sandwiched with <|start|>role and <|end|>
    # Hence, messages look like: <|start|>system or user or assistant{message}<|end|>

    tokens_per_message = 3 # token1:<|start|>, token2:system(or user or assistant), token3:<|end|>

    num_tokens = 0

    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

    return num_tokens

## Task : Sentiment Analysis

##**Step 2: Assemble Data (5 Marks)**

(A) Upload and Read csv File (2 Marks)

(B) Count Positive and Negative Sentiment Reviews (1 Marks)

(C) Split the Dataset (2 Marks)

**(A) Upload and read csv file (2 Marks)**

In [12]:
# Connect to google drive from Colab
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [13]:
 # Read CSV from google drive
 cs_reviews_df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/courier-service_reviews.csv')


In [14]:
# find general information of the dataframe
cs_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         131 non-null    int64 
 1   review     131 non-null    object
 2   sentiment  131 non-null    object
dtypes: int64(1), object(2)
memory usage: 3.2+ KB


In [15]:
#sample 5 rows of the data in the dataframe
cs_reviews_df.sample(5)

Unnamed: 0,id,review,sentiment
83,84,"At first, I was impressed by the speed and eff...",Positive
8,9,I appreciate the attention to detail that Expr...,Positive
96,97,"When it comes to packaging materials, ExpressW...",Positive
81,82,The delivery executive assigned by ExpressWay ...,Positive
68,69,What impresses me most about ExpressWay Logist...,Positive


**(B) Count Positive and Negative Sentiment Reviews (1 Marks)**

In [16]:
# count the positive and negative sentiments in the dataframe
cs_reviews_df.sentiment.value_counts()

Unnamed: 0_level_0,count
sentiment,Unnamed: 1_level_1
Positive,68
Negative,63


</b> The dataframe is slightly skewed with 68 postive sentiments and 63 negative sentiments </b>

In [17]:
# find the shape of the dataframe
cs_reviews_df.shape

(131, 3)

**(C) Split the Dataset (2 Marks)**

In [18]:
# split the data into examples and gold examples with a 80:20 split

cs_examples_df, cs_gold_examples_df = train_test_split(
    cs_reviews_df, #<- the full dataset
    test_size=0.2, #<- 20% random sample selected for gold examples
    random_state=42 #<- ensures that the splits are the same for every session
)

In [19]:
# shape of the resulting example and gold example data frames
(cs_examples_df.shape, cs_gold_examples_df.shape)

((104, 3), (27, 3))

To select gold examples for this session, we sample randomly from the test data using a `random_state=42`. This ensures that the examples from multiple runs of the sampling are the same (i.e., they are randomly selected but do not change between different runs of the notebook). Note that we are doing this only to keep execution times low for illustration. In practise, large number of gold examples facilitate robust estimates of model accuracy.

In [20]:
# select the columns review and sentiment that will be fed into the model
columns_to_select = ['review','sentiment']

In [21]:
# creates gold examples and writes it to a json file
gold_examples = (
        cs_gold_examples_df.loc[:, columns_to_select]
                                     .sample(21, random_state=42) #<- ensures that gold examples are the same for every session
                                     .to_json(orient='records')
)

In [22]:
# print gold example
gold_examples

'[{"review":"The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damage despite the delivery executive\'s efforts. The packaging seemed more than adequate to protect the contents during transit.","sentiment":"Positive"},{"review":"ExpressWay Logistics failed to meet my expectations. The delivery was delayed, and the customer support team was unresponsive and unhelpful when I tried to inquire about the status of my parcel.","sentiment":"Negative"},{"review":"ExpressWay Logistics\' incompetence resulted in a major inconvenience when my package was delivered to the wrong recipient. Despite providing accurate delivery information, the package ended up in the hands of someone else, and efforts to retrieve it were unsuccessful. When I contacted customer service for assistance, I was met with apathy and a lack of urgency. Their fa

In [23]:
# print the first example in the gold example list.
json.loads(gold_examples)[0]     #Json format

{'review': "The delivery executive assigned by ExpressWay Logistics was courteous and professional during the delivery process. They tried their best to handle the package with care.Unfortunately, the package arrived with slight damage despite the delivery executive's efforts. The packaging seemed more than adequate to protect the contents during transit.",
 'sentiment': 'Positive'}

##**Step 3: Derive Prompt (12 Marks)**

(A) Write Zero Shot System Message (3 Marks)

(B) Create Zero Shot Prompt (2 Marks)

(C) Write Few Shot System Message (3 Marks)

(D) Create Examples For Few shot prompte (2 Marks)

(E) Create Few Shot Prompt (2 Marks)

In [24]:
# user message for the zero shot and few shot prompt
user_message_template = """```{courier_service_review}```"""

**(A) Write Zero Shot System Message (3 Marks)**

In [25]:
# zero shot prompt system message
zero_shot_system_message = """
You are a helpful assistant tasked with sentiment analysis of customer reviews of a logistic company.
Classify the sentiment of customer reviews presented in the input as 'Positive' or 'Negative'.
Customer reviews will be delimited by triple backticks in the input.
Answer only 'Positive' or 'Negative'. Do not explain your answer.
"""
# Write zero shot system message here

**(B) Create Zero Shot Prompt (2 Marks)**

In [26]:
# zero shot prompt
zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]
# Create zero shot prompt to be input ready for completion function

In [27]:
# count the number of input tokens for zero shot prompt
num_tokens_from_messages(zero_shot_prompt)

72

**(C) Write Few Shot System Message (3 Marks)**

In [28]:
# few shot prmpt system message
few_shot_system_message = """
you are a helpful assistant tasked with sentiment analysis of customer reviews of a logistic company.
Classify the sentiment of customer reviews presented in the input as 'Positive' or 'Negative'.
Customer reviews will be delimited by triple backticks in the input.
Examples are provided to you for reference.
Answer only 'Positive' or 'Negative'. Do not explain your answer.
"""

Merely selecting random samples from the polarity subsets is not enough because the examples included in a prompt are prone to a set of known biases such as:
 - Majority label bias (frequent answers in predictions)
 - Recency bias (examples near the end of the prompt)


To avoid these biases, it is important to have a balanced set of examples that are arranged in random order. Let us create a Python function that generates bias-free examples:

In [29]:
# function to create examples and store them in a json file
def create_examples(dataset, n=4):

    """
    Return a JSON list of randomized examples of size 2n with two classes.
    Create subsets of each class, choose random samples from the subsets,
    merge and randomize the order of samples in the merged list.
    Each run of this function creates a different random sample of examples
    chosen from the training data.

    Args:
        dataset (DataFrame): A DataFrame with examples (review + label)
        n (int): number of examples of each class to be selected

    Output:
        randomized_examples (JSON): A JSON with examples in random order
    """

    positive_reviews = (dataset.sentiment == 'Positive')
    negative_reviews = (dataset.sentiment == 'Negative')
    columns_to_select = ['review', 'sentiment']

    positive_examples = dataset.loc[positive_reviews, columns_to_select].sample(n)
    negative_examples = dataset.loc[negative_reviews, columns_to_select].sample(n)

    examples = pd.concat([positive_examples, negative_examples])

    # sampling without replacement is equivalent to random shuffling

    randomized_examples = examples.sample(2*n, replace=False)

    return randomized_examples.to_json(orient='records')

**(D) Create Examples For Few shot prompte (2 Marks)**

In [30]:
# create 4 examples for the few shot prompt
examples =  create_examples(cs_examples_df, 2)
# Create Examples

In [31]:
# print 4 examples created.
json.loads(examples)

[{'review': "ExpressWay Logistics' commitment to transparency gives us confidence in their services. They provide clear and upfront pricing, so we know exactly what to expect. With ExpressWay Logistics, there are no hidden fees or surprises, just reliable service at a fair price.",
  'sentiment': 'Positive'},
 {'review': "ExpressWay Logistics' customer service representatives are polite but lack the authority to resolve issues effectively. Dealing with them feels like hitting a dead end, with problems often left unresolved.",
  'sentiment': 'Negative'},
 {'review': 'I am extremely disappointed with the service provided by ExpressWay Logistics. My parcel was delivered late, and the packaging was damaged, resulting in the loss of valuable items.',
  'sentiment': 'Negative'},
 {'review': "I had high hopes for ExpressWay Logistics, but they've consistently let me down. From missed delivery windows to lost packages, their incompetence knows no bounds. I'll be taking my business elsewhere.",

With the examples in place, we can now assemble a few-shot prompt. Since we will be using the few-shot prompt several times during evaluation, let us write a function to create a few-shot prompt (the logic of this function is depicted below).

In [32]:
# create function to build the few shot prompt
def create_prompt(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): system message with instructions for sentiment analysis
        examples (str): JSON string with list of examples
        user_message_template (str): string with a placeholder for courier service reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = [{'role':'system', 'content': system_message}]

    for example in json.loads(examples):
        example_review = example['review']
        example_sentiment = example['sentiment']

        few_shot_prompt.append(
            {
                'role': 'user',
                'content': user_message_template.format(
                    courier_service_review=example_review
                )
            }
        )

        few_shot_prompt.append(
            {'role': 'assistant', 'content': f"{example_sentiment}"}
        )

    return few_shot_prompt

**(E) Create Few Shot Prompt (2 Marks)**

In [33]:
# few shot prompt
few_shot_prompt = create_prompt(
    few_shot_system_message,
    examples,
    user_message_template
)
# Create Few shot prompt


In [34]:
# print few shot prompt
few_shot_prompt

[{'role': 'system',
  'content': "\nyou are a helpful assistant tasked with sentiment analysis of customer reviews of a logistic company.\nClassify the sentiment of customer reviews presented in the input as 'Positive' or 'Negative'.\nCustomer reviews will be delimited by triple backticks in the input.\nExamples are provided to you for reference.\nAnswer only 'Positive' or 'Negative'. Do not explain your answer.\n"},
 {'role': 'user',
  'content': "```ExpressWay Logistics' commitment to transparency gives us confidence in their services. They provide clear and upfront pricing, so we know exactly what to expect. With ExpressWay Logistics, there are no hidden fees or surprises, just reliable service at a fair price.```"},
 {'role': 'assistant', 'content': 'Positive'},
 {'role': 'user',
  'content': "```ExpressWay Logistics' customer service representatives are polite but lack the authority to resolve issues effectively. Dealing with them feels like hitting a dead end, with problems often

In [35]:
# number of input tokens for few shot prompt
num_tokens_from_messages(few_shot_prompt)

281

##**Step 4: Evaluate prompts (8 Marks)**

(A) Evaluate Zero Shot Prompt (2 Marks)

(B) Evaluate Few Shot Prompt (2 marks)

(C) Calculate Mean and Standard Deviation for Zero Shot Prompt and Few Shot Prompt (4 Marks)

Now we have two sets of prompts that we need to evaluate using gold labels. Since the few-shot prompt depends on the sample of examples that was drawn to make up the prompt, we expect some variability in evaluation. Hence, we evaluate each prompt multiple times to get a sense of the average and the variation around the average.

To reiterate, a choice on the prompt should account for variability due to the choice of the random sample. To aid repeated evaluation, we assemble an evaluation function .

In [36]:
# define a function for evaluating the prompts
def evaluate_prompt(prompt, gold_examples, user_message_template):

    """
    Return the micro-F1 score for predictions on gold examples.
    For each example, we make a prediction using the prompt. Gold labels and
    model predictions are aggregated into lists and compared to compute the
    F1 score.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for courier service review

    Output:
        micro_f1_score (float): Micro-F1 score computed by comparing model predictions
                                with ground truth
    """

    model_predictions, ground_truths, review_texts = [], [], []

    for example in json.loads(gold_examples):
        gold_input = example['review']
        user_input = [
            {
                'role':'user',
                'content': user_message_template.format(courier_service_review=gold_input)
            }
        ]

        try:
            response = client.chat.completions.create(
                model=chat_model_id,
                messages=prompt+user_input,
                temperature=0, # <- Note the low temperature (For a deterministic response)
                max_tokens=2 # <- Note how we restrict the output to not more than 2 tokens
            )

            prediction = response.choices[0].message.content
            # response = openai.ChatCompletion.create(
            #     deployment_id=chat_model_id,
            #     messages=prompt+user_input,
            #     temperature=0, # <- Note the low temperature(For a deterministic response)
            #     # max_tokens=2 # <- Note how we restrict the output to not more than 2 tokens
            # )

            # prediction = response['choices'][0]['message']['content']
            model_predictions.append(prediction.strip()) # <- removes extraneous white spaces
            ground_truths.append(example['sentiment'])
            review_texts.append(gold_input)

        except Exception as e:
            continue

    micro_f1_score = f1_score(ground_truths, model_predictions, average="micro")

    table_data = [[text, pred, truth] for text, pred, truth in zip(review_texts, model_predictions, ground_truths)]
    headers = ["Review", "Model Prediction", "Ground Truth"]
    print(tabulate(table_data, headers=headers, tablefmt="grid"))

    return micro_f1_score


Let us now use this function to do one evaluation of all the two prompts assembled so far, each time computing the Micro-F1 score.

**(A) Evaluate zero shot prompt (2 Marks)**

In [37]:
# evaluate zero shot prompt
evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

0.9047619047619048

**(B) Evaluate few shot prompt (2 Marks)**

In [38]:
# evaluate few shot prompt
evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

0.9047619047619048

However, this is just *one* choice of examples. We will need to run these evaluations with multiple choices of examples to get a sense of variability in F1 score for the few-shot prompt. As an example, let us run evaluations for the few-shot prompt 5 times.

In [39]:
# set the number of runs for evaluation
num_eval_runs = 5

In [40]:
# set lists for collecting the metrics for zero shot and few shot prompts
zero_shot_performance = []
few_shot_performance = []

In [41]:
# run 5 iterations of zero shot evaluation and few shot evaluation.
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(cs_examples_df)

    # Assemble the zero shot prompt with these examples
    zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]
    # zero_shot_prompt = create_prompt(zero_shot_system_message, examples, user_message_template)

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate zero shot prompt accuracy on gold examples
    zero_shot_micro_f1 = evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

    # Evaluate few shot prompt accuracy on gold examples
    few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    zero_shot_performance.append(zero_shot_micro_f1)
    few_shot_performance.append(few_shot_micro_f1)

  0%|          | 0/5 [00:00<?, ?it/s]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 20%|██        | 1/5 [00:08<00:34,  8.72s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 40%|████      | 2/5 [00:17<00:26,  8.76s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 60%|██████    | 3/5 [00:24<00:16,  8.07s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

 80%|████████  | 4/5 [00:31<00:07,  7.62s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      

100%|██████████| 5/5 [00:38<00:00,  7.72s/it]

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------+
| Review                                                                                                                                                                                                                                                                                                                                                                                      




**(C) Calculate Mean and Standard Deviation for Zero Shot Prompt and Few Shot Prompt (4 Marks)**

Compute the average (mean) and measure the variability (standard deviation) of the evaluation scores for both zero shot and few shot prompts.

In [42]:
# print the iterative scores for zero shot prompt
zero_shot_performance

[0.9047619047619048,
 0.9047619047619048,
 0.9047619047619048,
 0.9047619047619048,
 0.9047619047619048]

In [43]:
# print the mean and standard deviation of the zero shot prompt
np.array(zero_shot_performance).mean(), np.array(zero_shot_performance).std()
# Calculate for Zero Shot

(0.9047619047619048, 0.0)

In [44]:
# print the iterative scores for few shot prompt
few_shot_performance

[0.9523809523809523,
 1.0,
 0.9047619047619048,
 0.9047619047619048,
 0.9523809523809523]

In [45]:
# print the mean and standard deviation of the few shot prompt
np.array(few_shot_performance).mean(), np.array(few_shot_performance).std()
# Calculate for Few Shot

(0.9428571428571428, 0.03563483225498991)

##**Step 5: Observation and Insights and Business perspective (3 Marks)**

( Based on the projects, learner needs to share observations, learnings, insights and the business use case where these learnings can be beneficial.
Provide a breakdown of the percentage of positive and negative reviews. Additionally, explain how this classification can assist ExpressWay Logistics in addressing the issues identified. )


<font color='pink'>As a logistic company Expressway Logistics has a focus on speed, precision and customer satification.

<font color='pink'>Expressway logistics faces numerous challenges in ensuring seamless deliveries and cusotmer satification.
   - Managing variosu customer demands
   - Delays in deliveries
   - Product arrival intact and safe
   - Efficiently storing and handling large volume of packages
   - Meeting customer expectation </font>

<font color='pink'>Expressway understands that by identifying the sentiment of the customer reviews, they can improve service by:
   - Handling different customer demands simultaneously
   - dealing with late deliveries
   - keeping packages secure and intact
   - Finding areas of improvement </font>

Based on the data set collected by Expressway logistics, below are the observations.
1. Out of a total of <b><font color='red'>131</font></b> reviews, <b><font color='red'>52%</font></b> are positive and <b><font color='red'>48%</font></b> are negative.
2. If this data set represents the total reviews received by Expressway, then they have hit their mark <font color='red'>50%</font></b>of their time and hence a sentiment analysis model can help them immensly in improving their service.
3. It is a slightly skenewed data set and hence micro F1 seems like a good metrics to help find the effectiveness of the GenAI model.


Project insights and observations:

1. Using a OpenAI model is less labor intensive and low code maintenance for the Expressway Logistics data team.
2. Using zero shot prompting and few shot prompting, it was observed that zero shot prompt consumed <font color='green'> <b>72 tokens</font></b>  while few shot consumed <font color='green'> <b>281 tokens</font></b>. Few shot prompt token count was almost 4 times the zero shot token count because of the few shot examples in the prompt.
3. Based on the max token count which is <font color='green'> <b>4096</font></b> for GPT3.5 turbo, either prompting technique can be used, as we will not exceed the max token limit for the model. However, we need to find out which prompting technique is more efficient and also consider the budget of Expressway logistics.
4. Based on the evaluation of both prompting technique, zero shot prompt has an<font color='green'> <b> average Micro F1 score of 0.90 </font></b> and the Few shot prompt has an <font color='green'> <b>average Micro F1 score of 0.94</font></b> with deviation of 0.035 between iterations.
5. Based on #4, few shot prompting technique is a better method compared to zero shot prompting. Based on OpenAI pricing documentation <font color='green'> <b><u>($1.50 / 1M input tokens)</u></font></b> , the cost is very minuscal as we have only used 7% of the max token and hence Expressway should use the Few shot prompting technique.

Conclusion:

<font color='pink'>The few shot prompting method can be effectively used in this case by Expressway logistics to identify positive and negative sentiments. They can then continue to improve the areas based on the negative sentiment reviews.

Furthermore, using the same technique, Expressway logistics can use multilabelling classification to label the negative sentiments into various areas of grievances and address those in a focused manner.
</font>


