# **Project : A Case Study of InnovaTech Solutions**

**Business Overview:**

InnovaTech Solutions, a dynamic and forward-thinking technology company, has made significant strides in the computing industry with a focus on developing high-quality laptops. Established over a decade ago, InnovaTech has gained a reputation for its innovative approach and commitment to customer satisfaction, creating a significant footprint in both physical and online retail spaces.
InnovaTech has expanded its presence in the digital retail world, especially on e-commerce giants like Amazon. This strategic move has not only widened its customer base but also resulted in a large influx of customer feedback, primarily in the form of online reviews. The company's products, notably its range of laptops, have become popular choices on these platforms, leading to an abundance of valuable but underutilized customer data.

**Current Challenge:**

InnovaTech currently analyzes customer reviews using basic sentiment analysis tools, which only provide a superficial understanding of customer opinions. In the competitive landscape of the laptop market, a more detailed and aspect-oriented analysis is crucial. Understanding specific customer sentiments on different aspects of laptops, such as user screen, technical specifications, etc, which is vital for targeted product improvements.

**Objective:**

The primary goal is to conduct a comprehensive aspect-based sentiment analysis of customer reviews for InnovaTech’s laptops, specifically focusing on three critical aspects: the laptop screen, keyboard, and mousepad. These components have been identified as crucial determinants of customer satisfaction and product usability. Project aims to provide nuanced insights into specific areas of customer satisfaction, dissatisfaction, and neutral feedback.The ultimate goal is to enhance overall product quality and customer experience, solidifying InnovaTech's position as a leader in the laptop market.



**Data Description:**

The dataset titled "laptop_reviews.csv" is structured to facilitate aspect-based sentiment analysis for laptop reviews. Here's a brief description of the data columns:

1. id: This column contains unique identifiers for each review entry. It helps in distinguishing and referencing individual reviews
2. text: This column includes the actual text of the laptop reviews. The reviews are likely composed of customer opinions and experiences regarding different aspects of the laptops.
3. aspects:Contains structured information about specific aspects mentioned in each review like 'RAM', 'screen', 'keyboard', 'mousepad', and others relevant to laptop features.
4. category:Provide an additional layer of classification (positive, negative and neutral) for the mentioned aspects.

# 1. Setup

### 1.1 Installation

In [243]:
!pip install openai==0.28.1 tiktoken datasets session-info --quiet

### 1.2 Imports

1. Import all Python packages required to access the Azure Open AI API.
2. Import additional packages required to access datasets and create examples.

In [244]:
import openai
import json
import random
import tiktoken
import session_info

import pandas as pd
import numpy as np

from datasets import load_dataset
from collections import Counter
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

In [245]:
session_info.show()

### 1.3 Authentication

In [246]:
%%writefile config.json
{
  "AZURE_OPENAI_KEY":"866cfd8261f2498baed071cb4cf246c6",
  "AZURE_OPENAI_BASE":"https://shubhro-openai-demo-check.openai.azure.com/",
  "AZURE_OPENAI_APITYPE":"azure",
  "AZURE_OPENAI_APIVERSION":"2023-07-01-preview",
  "CHATGPT_MODEL":"shubhro-mls-deployment"
}

Overwriting config.json


In [247]:
with open('config.json', 'r') as az_creds:
    data = az_creds.read()

In [248]:
creds = json.loads(data)

In [249]:
openai.api_key = creds["AZURE_OPENAI_KEY"]
openai.api_base = creds["AZURE_OPENAI_BASE"]
openai.api_type = creds["AZURE_OPENAI_APITYPE"]
openai.api_version = creds["AZURE_OPENAI_APIVERSION"]

In [250]:
chat_model_id = creds["CHATGPT_MODEL"]

### 1.4 Utilities

Define token counter to keep track of the completion window available in the prompt.

In [251]:
def num_tokens_from_messages(messages):

    """
    Return the number of tokens used by a list of messages.
    Adapted from the Open AI cookbook token counter
    """

    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

    # Each message is sandwiched with <|start|>role and <|end|>
    # Hence, messages look like: <|start|>system or user or assistant{message}<|end|>

    tokens_per_message = 3 # token1:<|start|>, token2:system(or user or assistant), token3:<|end|>

    num_tokens = 0

    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

    return num_tokens

# Task: Aspect-Based Sentiment Analysis (ABSA)

### Step 1: Define objectives & Metrics

To evaluate model performance, we judge the accuracy of the aspects + sentiment assignnment per aspect.For example, if aspects identified by the LLM do not match the ground truth for a specific input, we count this prediction to be incorrect. A correct prediction is one where all the aspects are correctly idenfied and further the sentiment assignment for each aspect is also correctly identified

In [252]:
def compute_accuracy(gold_examples, model_predictions, ground_truths):

    """
    Return the accuracy score comparing the model predictions and ground truth
    for ABSA. We look for exact matches between the model predictions on all the
    aspects and sentiments for these aspects in the ground truth.

    Args:
        gold_examples (str): JSON string with list of gold examples
        model_predictions (List): Nested list of ABSA predictions
        ground_truths (List): Nested list of ABSA annotations

    Output:
        accuracy (float): Exact matches of model predictions and ground truths
    """
    # Initialize variables to keep track of correct and total predictions
    correct_predictions = 0
    total_predictions = len(gold_examples)

    # Iterate through each prediction and ground truth pair
    for pred, truth in zip(model_predictions, ground_truths):
        #print("pred from model_predictions --- ",pred)
        #print("truth from ground_truths ---- ",truth)
        if pred == truth:
            #print("entered for this compare in compute_accuracy")
            correct_predictions += 1

    # Calculate accuracy as the ratio of correct predictions to total predictions
    accuracy = correct_predictions / total_predictions

    return accuracy

### Step 2: Assemble Data

1. Use "laptop_review.csv" dataset.
2. Identify distribution of aspects in examples.
3. Identify distribution of aspects in gold examples.

In [253]:
# pandas function "read_csv" to read the reviews file.
aspect_based_laptop_reviews_df = pd.read_csv("laptop_reviews.csv")

In [254]:
aspect_based_laptop_reviews_df #snapshot

Unnamed: 0,id,text,aspects,category
0,1,The RAM is good. The design is decent.,"{'term':array(['RAM','design'],dtype=object),'...","{'category':array(['RAM','design'],dtype=objec..."
1,2,The screen is amazing. The design is impressiv...,"{'term':array(['screen','design','mousepad'],d...","{'category':array(['screen','design','mousepad..."
2,3,The GPU is adequate. The camera is average. Th...,"{'term':array(['GPU','camera','software','keyb...","{'category':array(['GPU','camera','software','..."
3,4,The RAM is terrible. The battery is fair. The ...,"{'term':array(['RAM','battery','design'],dtype...","{'category':array(['RAM','battery','design'],d..."
4,5,The GPU is terrible. The keyboard is poor. The...,"{'term':array(['GPU','keyboard','mousepad'],dt...","{'category':array(['GPU','keyboard','mousepad'..."
...,...,...,...,...
95,96,The camera is terrible. The hardware is adequa...,"{'term':array(['camera','hardware','mousepad']...","{'category':array(['camera','hardware','mousep..."
96,97,The screen is impressive. The keyboard is stan...,"{'term':array(['screen','keyboard'],dtype=obje...","{'category':array(['screen','keyboard'],dtype=..."
97,98,The software is excellent. The mousepad is dis...,"{'term':array(['software','mousepad','GPU'],dt...","{'category':array(['software','mousepad','GPU'..."
98,99,The mousepad is impressive. The design is grea...,"{'term':array(['mousepad','design','camera'],d...","{'category':array(['mousepad','design','camera..."


create examples and gold examples from this dataset, this curated dataset is stored in a format appropriate for reuse (e.g., JSON). To select gold examples for this session, I sample randomly from the test data using a random_state=27

In [255]:
laptop_reviews_examples_df, laptop_reviews_gold_examples_df = train_test_split(
    aspect_based_laptop_reviews_df, #<- the full dataset
    test_size=0.2, #<- 20% random sample selected for gold examples
    random_state=27 #<- ensures that the splits are the same for every session
)

In [256]:
(aspect_based_laptop_reviews_df.shape, laptop_reviews_examples_df.shape, laptop_reviews_gold_examples_df.shape)

((100, 4), (80, 4), (20, 4))

I am choosing aspects : the laptop screen, keyboard, and mousepad] as the primary goal (mentioned in "Objective") is to conduct a comprehensive aspect-based sentiment analysis of customer reviews for InnovaTech’s laptops, specifically focusing on three critical aspects: the laptop screen, keyboard, and mousepad.
To determine how many reviews are identified to contain the aspects mentioned in the task. One way to do this is to create a lookup index with each of the aspects as keys and a list of all reviews that are annotated with the aspect as the value.
for this I created a lookup index that is dictionary of aspects in the task.

In [257]:
examples_aspect_index = {
    'screen': [],
    'keyboard': [],
    'mousepad': []
}

gold_examples_aspect_index = {
    'screen': [],
    'keyboard': [],
    'mousepad': []
}

In [258]:
for id, category in zip(laptop_reviews_examples_df.id, laptop_reviews_examples_df.category):
    for key in examples_aspect_index.keys():
        #if key in category['category'].tolist():
        if key in category:
            examples_aspect_index[key].append(id)

In [259]:
# Dictionary to store first 10 values for each key
first_10_values = {}

for key, values in examples_aspect_index.items():
    # Get the first 10 values; if less than 10, get all
    first_10 = values[:10]
    first_10_values[key] = first_10

# first_10_values now contains the first 10 values for each key
print(first_10_values)

{'screen': [91, 94, 87, 21, 36, 22, 27, 15, 39, 2], 'keyboard': [94, 92, 87, 21, 46, 76, 75, 37, 29, 58], 'mousepad': [91, 19, 88, 92, 36, 13, 22, 27, 59, 2]}


Distribution of aspects in examples

In [260]:
for key in examples_aspect_index:
    print(f"Number of examples for aspect {key}: {len(examples_aspect_index[key])}")

Number of examples for aspect screen: 25
Number of examples for aspect keyboard: 25
Number of examples for aspect mousepad: 33


Distribution of aspects in gold examples

In [261]:
for id, category in zip(laptop_reviews_gold_examples_df.id, laptop_reviews_gold_examples_df.category):
    for key in gold_examples_aspect_index.keys():
        if key in category:
            gold_examples_aspect_index[key].append(id)

In [262]:
for key in gold_examples_aspect_index:
    print(f"Number of examples for aspect {key}: {len(gold_examples_aspect_index[key])}")

Number of examples for aspect screen: 9
Number of examples for aspect keyboard: 6
Number of examples for aspect mousepad: 9


In [263]:
columns_to_select = ['id', 'text', 'category']

In [264]:
gold_examples = json.loads((
        laptop_reviews_gold_examples_df.loc[:, columns_to_select]
                                           .sample(10, random_state=27) #<- ensures that gold examples are the same for every session
                                           .to_json(orient='records')
))

In [265]:
gold_examples[:2]

[{'id': 86,
  'text': 'The RAM is amazing. The keyboard is great. The hardware is adequate. The screen is amazing.',
  'category': "{'category':array(['RAM','keyboard','hardware','screen'],dtype=object),'polarity':array(['positive','positive','neutral','positive'],dtype=object)}"},
 {'id': 31,
  'text': 'The RAM is decent. The hardware is amazing.',
  'category': "{'category':array(['RAM','hardware'],dtype=object),'polarity':array(['neutral','positive'],dtype=object)}"}]

change the category from a string containing array to a string containing list.

In [266]:
for example in gold_examples:
    #example['category'] = example['category'].replace(",dtype=object", "")
    example['category'] = example['category'].replace("array(", "").replace(",dtype=object", "").replace(")", "")

In [292]:
gold_examples[:2]

[{'id': 86,
  'text': 'The RAM is amazing. The keyboard is great. The hardware is adequate. The screen is amazing.',
  'category': "{'category':['RAM','keyboard','hardware','screen'],'polarity':['positive','positive','neutral','positive']}"},
 {'id': 31,
  'text': 'The RAM is decent. The hardware is amazing.',
  'category': "{'category':['RAM','hardware'],'polarity':['neutral','positive']}"}]

### Step 3: Derive Prompt

#### Create prompts

In [268]:
user_message_template = """```{laptop_review}```"""

**1. Zero-shot prompt**

In [269]:
zero_shot_system_message = """
Perform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.
In each review there might be one or more of the following aspects: screen, keyboard, mousepad.
For each review presented as input:
- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.
- Assign a sentiment polarity (positive, negative or neutral) for each aspect

Arrange your response a JSON object with the following headers:
- category:[list of aspects]
- polarity:[list of corresponding polarities for each aspect]}
"""

In [270]:
zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

In [271]:
num_tokens_from_messages(zero_shot_prompt)

129

**2.Few-shot prompt**

In [272]:
few_shot_system_message = """
Perform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.
In each review there might be one or more of the following aspects: screen, keyboard, mousepad.
For each review presented as input:
- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.
- Assign a sentiment polarity (positive, negative or neutral) for each aspect

Arrange your response a JSON object with the following headers:
- category:[list of aspects]
- polarity:[list of corresponding polarities for each aspect]}
"""

In [273]:
def create_examples(dataset, n=10):

    """
    Return a JSON list with n random examples of each aspect in the
    input dataset.
    First create a dictionary with the aspects as keys and the ids of the
    reviews that contain this aspect as values.
    Then take a random sample of ids from each of these lists.

    Args:
        dataset (DataFrame): DataFrame with nested ABSA annotations
        n (int): Number of random examples selected for each aspect

    Output:
        examples (JSON): JSON list of examples
    """

    columns_to_select = ['id', 'text', 'category']
    example_ids = []

    aspect_index = {
        'screen': [], 'keyboard': [], 'mousepad': []
    }

    for id, category in zip(dataset.id, dataset.category):
        for key in aspect_index.keys():
            if key in category:
                aspect_index[key].append(id)

    for key in aspect_index:
        example_ids.extend(np.random.choice(aspect_index[key], n).tolist())

    examples = dataset.loc[dataset.id.isin(example_ids), columns_to_select]

    return examples.to_json(orient='records')

In [274]:
test_examples = create_examples(aspect_based_laptop_reviews_df)
test_examples

'[{"id":2,"text":"The screen is amazing. The design is impressive. The mousepad is bad.","category":"{\'category\':array([\'screen\',\'design\',\'mousepad\'],dtype=object),\'polarity\':array([\'positive\',\'positive\',\'negative\'],dtype=object)}"},{"id":3,"text":"The GPU is adequate. The camera is average. The software is poor. The keyboard is great.","category":"{\'category\':array([\'GPU\',\'camera\',\'software\',\'keyboard\'],dtype=object),\'polarity\':array([\'neutral\',\'neutral\',\'negative\',\'positive\'],dtype=object)}"},{"id":14,"text":"The keyboard is terrible. The software is terrible.","category":"{\'category\':array([\'keyboard\',\'software\'],dtype=object),\'polarity\':array([\'negative\',\'negative\'],dtype=object)}"},{"id":22,"text":"The mousepad is decent. The battery is average. The camera is amazing. The screen is disappointing.","category":"{\'category\':array([\'mousepad\',\'battery\',\'camera\',\'screen\'],dtype=object),\'polarity\':array([\'neutral\',\'neutral\'

In [275]:
def create_prompt_bkup(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): Instructions for the model to execute ABSA
        examples (JSON): JSON list of examples representative of each aspect
        user_message_template (str): string with a placeholder for restaurant reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = [{'role':'system', 'content': system_message}]

    for example in json.loads(examples):
        example_input = example['text']
        example_absa = example['category']

        few_shot_prompt.append(
            {
                'role': 'user',
                'content': user_message_template.format(
                    laptop_review=example_input
                )
            }
        )

        few_shot_prompt.append(
            {'role': 'assistant', 'content': f"{example_absa}"}
        )

    return few_shot_prompt

In [276]:
def create_prompt(system_message, examples, user_message_template):

    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): Instructions for the model to execute ABSA
        examples (JSON): JSON list of examples representative of each aspect
        user_message_template (str): string with a placeholder for restaurant reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """

    few_shot_prompt = [{'role':'system', 'content': system_message}]

    for example in json.loads(examples):
        example_input = example['text']
        #example_absa = example['category']
        example_absa = example['category'].replace("array(", "").replace(",dtype=object", "").replace(")", "")

        few_shot_prompt.append(
            {
                'role': 'user',
                'content': user_message_template.format(
                    laptop_review=example_input
                )
            }
        )

        few_shot_prompt.append(
            {'role': 'assistant', 'content': f"{example_absa}"}
        )

    return few_shot_prompt

In [277]:
examples = create_examples(aspect_based_laptop_reviews_df)
few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

In [278]:
few_shot_prompt[:10]

[{'role': 'system',
  'content': '\nPerform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.\nIn each review there might be one or more of the following aspects: screen, keyboard, mousepad.\nFor each review presented as input:\n- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.\n- Assign a sentiment polarity (positive, negative or neutral) for each aspect\n\nArrange your response a JSON object with the following headers:\n- category:[list of aspects]\n- polarity:[list of corresponding polarities for each aspect]}\n'},
 {'role': 'user',
  'content': '```The GPU is terrible. The keyboard is poor. The mousepad is decent.```'},
 {'role': 'assistant',
  'content': "{'category':['GPU','keyboard','mousepad'],'polarity':['negative','negative','neutral']}"},
 {'role': 'user',
  'content': '```The hardware is average. The mousepad is excellent.```'},
 {'role': 'assistant',
  'conten

In [279]:
num_tokens_from_messages(few_shot_prompt)

1189

#### Evaluate prompts

**1. Define Evaluation scorer**

In [280]:
def evaluate_prompt(prompt, gold_examples, user_message_template):

    """
    Return the accuracy score for predictions on gold examples.
    For each example, we make an ABSA prediction using the prompt.
    Gold labels and model predictions are aggregated into lists and presented to
    the compute_accuracy function.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for movie reviews

    Output:
        accuracy (float): Accuracy computed by comparing model predictions
                                with ground truth
    """

    model_predictions, ground_truths = [], []

    for example in gold_examples:
        user_input = [
            {
                'role':'user',
                'content': user_message_template.format(laptop_review=example['text'])
            }
        ]

        try:
            response = openai.ChatCompletion.create(
                deployment_id=chat_model_id,
                messages=prompt+user_input,
                temperature=0
            )

            #print("ACTUAL RESPONSE::::::: ",response['choices'][0]['message']['content'])
            #print("EDITED RESPONSE::::::: ",response['choices'][0]['message']['content'].replace("'", "\""))

            prediction = response['choices'][0]['message']['content'].replace("'", "\"")

            #print("PREDICTION Type - ",type(json.loads(prediction.strip().lower())))
            #print("Gold Example Category Type - ",type(json.loads(example['category'].strip().lower().replace("'", '"'))))

            model_predictions.append(json.loads(prediction.strip().lower()))
            #ground_truths.append(example['category'])
            ground_truths.append(json.loads(example['category'].strip().lower().replace("'", '"')))

        except Exception as e:
            continue

    #print("MODEL PREDICTION - ",model_predictions[0:10])
    #print("GROUND TRUTH - ",ground_truths[0:10])

    accuracy = compute_accuracy(gold_examples, model_predictions, ground_truths)

    return accuracy

**2. Evaluate zero shot prompt**

In [281]:
zero_shot_prompt

[{'role': 'system',
  'content': '\nPerform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.\nIn each review there might be one or more of the following aspects: screen, keyboard, mousepad.\nFor each review presented as input:\n- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.\n- Assign a sentiment polarity (positive, negative or neutral) for each aspect\n\nArrange your response a JSON object with the following headers:\n- category:[list of aspects]\n- polarity:[list of corresponding polarities for each aspect]}\n'}]

In [282]:
gold_examples

[{'id': 86,
  'text': 'The RAM is amazing. The keyboard is great. The hardware is adequate. The screen is amazing.',
  'category': "{'category':['RAM','keyboard','hardware','screen'],'polarity':['positive','positive','neutral','positive']}"},
 {'id': 31,
  'text': 'The RAM is decent. The hardware is amazing.',
  'category': "{'category':['RAM','hardware'],'polarity':['neutral','positive']}"},
 {'id': 90,
  'text': 'The mousepad is adequate. The design is good. The screen is good. The keyboard is standard.',
  'category': "{'category':['mousepad','design','screen','keyboard'],'polarity':['neutral','positive','positive','neutral']}"},
 {'id': 95,
  'text': 'The RAM is good. The mousepad is excellent. The software is bad.',
  'category': "{'category':['RAM','mousepad','software'],'polarity':['positive','positive','negative']}"},
 {'id': 83,
  'text': 'The hardware is disappointing. The design is decent. The GPU is fair. The mousepad is standard.',
  'category': "{'category':['hardware','d

In [283]:
evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

0.1

**3. Evaluate few shot prompt**

In [284]:
few_shot_prompt

[{'role': 'system',
  'content': '\nPerform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.\nIn each review there might be one or more of the following aspects: screen, keyboard, mousepad.\nFor each review presented as input:\n- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.\n- Assign a sentiment polarity (positive, negative or neutral) for each aspect\n\nArrange your response a JSON object with the following headers:\n- category:[list of aspects]\n- polarity:[list of corresponding polarities for each aspect]}\n'},
 {'role': 'user',
  'content': '```The GPU is terrible. The keyboard is poor. The mousepad is decent.```'},
 {'role': 'assistant',
  'content': "{'category':['GPU','keyboard','mousepad'],'polarity':['negative','negative','neutral']}"},
 {'role': 'user',
  'content': '```The hardware is average. The mousepad is excellent.```'},
 {'role': 'assistant',
  'conten

In [285]:
for item in few_shot_prompt:
    #print(f"Item: {item}, Type: {type(item)}")
    if item['role'] == 'assistant':
        '''
        # Extract the 'content' string
        content = item['content']

        # Find the start and end of the 'category' array
        category_start = content.find("['") + 1
        category_end = content.find("']", category_start) + 1

        # Extract the 'category' part and clean it
        category_str = content[category_start:category_end].replace("array(", "").replace(",dtype=object", "")

        print(type(category_str))
        print(category_str)

        '''

        item['content'] = item['content'].replace("array(", "").replace(",dtype=object", "").replace(")", "")
        #item['content'] = json.loads(item['content'].strip().lower().replace("'", '"'))

        #print(item['content'])

print(few_shot_prompt)

[{'role': 'system', 'content': '\nPerform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.\nIn each review there might be one or more of the following aspects: screen, keyboard, mousepad.\nFor each review presented as input:\n- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.\n- Assign a sentiment polarity (positive, negative or neutral) for each aspect\n\nArrange your response a JSON object with the following headers:\n- category:[list of aspects]\n- polarity:[list of corresponding polarities for each aspect]}\n'}, {'role': 'user', 'content': '```The GPU is terrible. The keyboard is poor. The mousepad is decent.```'}, {'role': 'assistant', 'content': "{'category':['GPU','keyboard','mousepad'],'polarity':['negative','negative','neutral']}"}, {'role': 'user', 'content': '```The hardware is average. The mousepad is excellent.```'}, {'role': 'assistant', 'content': "{'categor

In [286]:
evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

1.0

**4. In summary, compute the average (mean) and measure the variability (standard deviation) of the evaluation scores.**

In [287]:
num_eval_runs = 10

In [288]:
few_shot_performance = []

In [289]:
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(aspect_based_laptop_reviews_df)

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate prompt accuracy on gold examples
    few_shot_accuracy = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    few_shot_performance.append(few_shot_accuracy)

100%|██████████| 10/10 [00:42<00:00,  4.29s/it]


In [290]:
few_shot_performance

[1.0, 0.9, 1.0, 0.9, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

In [291]:
np.array(few_shot_performance).mean(), np.array(few_shot_performance).std()

(0.9800000000000001, 0.039999999999999994)

**sensitivity tests:**

bias check using a content-free input.
number of examples check.

*Check 1: Bias*

In [293]:
bias_test_predictions = []

In [294]:
for _ in tqdm(range(25)):

    user_input = [
        {
            'role':'user',
            'content': "```' '```" #<- content free test input
        }
    ]

    response = openai.ChatCompletion.create(
        deployment_id=chat_model_id,
        messages=few_shot_prompt+user_input,
        temperature=0,
        max_tokens=2
    )

    #print(response)

    prediction = response['choices'][0]['message']['content']

    bias_test_predictions.append(prediction.strip().lower())

100%|██████████| 25/25 [00:05<00:00,  4.56it/s]


In [295]:
Counter(bias_test_predictions)

Counter({"{'category": 25})

*Check 2: Sensitivity to number of examples*

In [298]:
sample_size_sensitivity_results = []
per_class_examples_choice = [2, 3]

In [299]:
for n in tqdm(per_class_examples_choice):

    for _ in range(10):

        examples = create_examples(laptop_reviews_examples_df, n)

        few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

        few_shot_micro_f1 = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

        sample_size_sensitivity_results.append({'num_examples': 2*n, 'micro_f1': few_shot_micro_f1})

100%|██████████| 2/2 [01:17<00:00, 38.77s/it]


In [300]:
few_shot_micro_f1

1.0

In [301]:
pd.DataFrame(sample_size_sensitivity_results).groupby('num_examples').agg(['mean', 'std'])

Unnamed: 0_level_0,micro_f1,micro_f1
Unnamed: 0_level_1,mean,std
num_examples,Unnamed: 1_level_2,Unnamed: 2_level_2
4,0.94,0.069921
6,0.96,0.05164


**----------------------------------------------------------------------------End-----------------------------------------------------------------------------------------**