# ~ Rhythm Shah

~ 1233960561

~ 03/04/2025

# Code Cell 1 (5%) - Import Required Libraries and Load Data

## Import all necessary libraries for data processing, NLP, and interaction with LLMs.

In [1]:
import numpy as numpy
import pandas as pd
import s3fs
import json
import boto3
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score, confusion_matrix
from botocore.exceptions import ClientError

## Load restaurant_reviews_az.csv.

In [2]:
df = pd.read_csv('restaurant_reviews_az.csv')[:2000]

In [3]:
df.shape

(2000, 9)

# Code Cell 2 (5%) - Data Preprocessing

## Remove 3-star reviews from the dataset.

In [4]:
df = df[df['stars'] != 3]

## Create a new column Sentiment where:
Reviews with 1 or 2 stars are labeled as 0 (Negative Sentiment).
Reviews with 4 or 5 stars are labeled as 1 (Positive Sentiment).

In [5]:
df['Sentiment'] = df['stars'].apply(lambda x: 1 if x >= 4 else 0)

## Create a dataset for this assignment by randomly selecting 50 positive reviews and 50 negative reviews

In [6]:
positive_reviews = df[df['Sentiment'] == 1].sample(n=50)
negative_reviews = df[df['Sentiment'] == 0].sample(n=50)

df1 = pd.concat([positive_reviews, negative_reviews]).reset_index(drop=True)
df1.head()

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date,Sentiment
0,cARyBadkK5vj8veqnkY5Hg,P-5Wo3jjX-tn6-3jZzqqIw,Ei5HBqe012ImhqEr2ZH2gg,5,0,0,0,Delicious. Wish we lived in the area we would ...,2020-06-13 18:48:12,1
1,KTtV66u7648dDrOAHmXzvA,_o-g7aXHfLfBs3cNkvDngA,-3-6BB10tIWNKGEF0Es2BA,5,0,0,0,We were craving food from here where you reall...,2021-08-05 00:47:14,1
2,eCx23JnRfj0TRc4HCDTSKw,s6fzF2pSnUebFyVhGjh_SQ,lhsQkb5nhf-Kd5OvgB9MNg,5,1,0,0,"Been skeptical about going out to eat, but fam...",2020-06-30 03:40:12,1
3,W2AtefJfe983yRvXfHxGhg,meQCVrrJxwdjS_1BFooTUg,gMpYdAe1lZWDuBRw2bPEhg,5,0,0,0,My first time trying their food. Absolutely de...,2021-04-20 01:53:18,1
4,rLf-518KWHtM0bh8jEkQeQ,pgUQjUl_2G5L9EreDzIjSw,UCMSWPqzXjd7QHq7v8PJjQ,4,6,2,5,This place has been on my list to try since my...,2020-03-19 21:16:35,1


In [7]:
df1.shape

(100, 10)

# Code Cell 3 (20%) - Perform Sentiment Analysis Using Zero-Shot Learning

## Use a Claude 3 Sonnet for zero-shot prompting.

In [8]:
client = boto3.client("bedrock-runtime", region_name = "us-east-1")

## Predict sentiment labels for the selected 100 reviews without providing any labeled training examples.

In [9]:
def LLM(data, model_id):
    results = []

    for i in data:
        messages = [{"role": "user", "content": f"In only one word classify the sentiment of '{i}' into either Positive or Negative only. Do not use any other words such as neutral for classification. Use only Positive or Negative"}]

        request_body = {"anthropic_version": "bedrock-2023-05-31", "messages": messages, "max_tokens": 512, "temperature": 0.2, "top_p": 1.0}

        try:
            response = client.invoke_model(modelId = model_id, contentType = 'application/json', body = json.dumps(request_body))
            result = json.loads(response['body'].read().decode())
            sentiment = result['content'][0]['text'].strip()
            results.append(sentiment)
        except Exception as e:
            print(f"Error processing text '{i}': {e}")
            results.append("Error")
    return results

I created a general code which is flexible to different models. I just need to pass the model id in the function parameter along with the text data. 

In [10]:
Claude3 = 'anthropic.claude-3-sonnet-20240229-v1:0'

df1['Claude3_Prediction'] = LLM(df1['text'], Claude3)

In [11]:
df1.head()

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date,Sentiment,Claude3_Prediction
0,cARyBadkK5vj8veqnkY5Hg,P-5Wo3jjX-tn6-3jZzqqIw,Ei5HBqe012ImhqEr2ZH2gg,5,0,0,0,Delicious. Wish we lived in the area we would ...,2020-06-13 18:48:12,1,Positive
1,KTtV66u7648dDrOAHmXzvA,_o-g7aXHfLfBs3cNkvDngA,-3-6BB10tIWNKGEF0Es2BA,5,0,0,0,We were craving food from here where you reall...,2021-08-05 00:47:14,1,Positive
2,eCx23JnRfj0TRc4HCDTSKw,s6fzF2pSnUebFyVhGjh_SQ,lhsQkb5nhf-Kd5OvgB9MNg,5,1,0,0,"Been skeptical about going out to eat, but fam...",2020-06-30 03:40:12,1,Positive
3,W2AtefJfe983yRvXfHxGhg,meQCVrrJxwdjS_1BFooTUg,gMpYdAe1lZWDuBRw2bPEhg,5,0,0,0,My first time trying their food. Absolutely de...,2021-04-20 01:53:18,1,Positive
4,rLf-518KWHtM0bh8jEkQeQ,pgUQjUl_2G5L9EreDzIjSw,UCMSWPqzXjd7QHq7v8PJjQ,4,6,2,5,This place has been on my list to try since my...,2020-03-19 21:16:35,1,Positive


In [12]:
df1['Claude3_Prediction'].value_counts()

Claude3_Prediction
Negative    51
Positive    49
Name: count, dtype: int64

## Evaluate model performance using precision, recall, f1, and accuracy.

In [13]:
def evaluate_model_performance(pred_col, true_col = "Sentiment"):
    y_true = df1[true_col]
    y_pred = df1[pred_col].map({"Positive": 1, "Negative": 0})
    
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    
    print(f"Accuracy: {accuracy * 100:.2f}%")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1-Score: {f1:.2f}")

I created a general code for calculate the model metrics.

In [14]:
evaluate_model_performance("Claude3_Prediction")

Accuracy: 97.00%
Precision: 0.98
Recall: 0.96
F1-Score: 0.97


# Code Cell 4 (20%) - Perform Sentiment Analysis Using Few-Shot Learning

## Select a few examples for few-shot learning

In [15]:
few_shot_prefix = """
Examples:
1. "Amazing food, great service!" -> Positive
2. "Terrible experience, rude staff." -> Negative
3. "It was okay, nothing special." -> Positive
4. "Slow delivery, cold meal." -> Negative
5. "The atmosphere was cozy and inviting." -> Positive
6. "Overpriced and underwhelming quality." -> Negative
7. "Friendly team, quick response time." -> Positive
8. "Broken product, no refund offered." -> Negative
9. "Exceeded my expectations, highly recommend!" -> Positive
10. "Disappointing, wouldn’t come back." -> Negative
"""

## Use the example(s) to guide the LLM in classifying sentiment for the selected 100 reviews.

In [16]:
df1['Claude3_FewShot_Prediction'] = LLM(few_shot_prefix + df1['text'], Claude3)

In [17]:
df1.Claude3_FewShot_Prediction.value_counts()

Claude3_FewShot_Prediction
Negative    51
Positive    49
Name: count, dtype: int64

## Evaluate model performance using precision, recall, f1, and accuracy.

In [18]:
evaluate_model_performance("Claude3_FewShot_Prediction")

Accuracy: 99.00%
Precision: 1.00
Recall: 0.98
F1-Score: 0.99


# Code Cell 5 (20%) - Experiment with Multiple LLMs

## Checking for working models

In [19]:
bedrock = boto3.client('bedrock', region_name='us-east-1')
response = bedrock.list_foundation_models()

for model in response['modelSummaries']:
    print(f"Model Name: {model['modelName']}, Model ID: {model['modelId']}, Provider: {model['providerName']}")

Model Name: Titan Text Large, Model ID: amazon.titan-tg1-large, Provider: Amazon
Model Name: Titan Image Generator G1, Model ID: amazon.titan-image-generator-v1:0, Provider: Amazon
Model Name: Titan Image Generator G1, Model ID: amazon.titan-image-generator-v1, Provider: Amazon
Model Name: Titan Image Generator G1 v2, Model ID: amazon.titan-image-generator-v2:0, Provider: Amazon
Model Name: Titan Text G1 - Premier, Model ID: amazon.titan-text-premier-v1:0, Provider: Amazon
Model Name: Nova Pro, Model ID: amazon.nova-pro-v1:0:300k, Provider: Amazon
Model Name: Nova Pro, Model ID: amazon.nova-pro-v1:0, Provider: Amazon
Model Name: Nova Lite, Model ID: amazon.nova-lite-v1:0:300k, Provider: Amazon
Model Name: Nova Lite, Model ID: amazon.nova-lite-v1:0, Provider: Amazon
Model Name: Nova Canvas, Model ID: amazon.nova-canvas-v1:0, Provider: Amazon
Model Name: Nova Reel, Model ID: amazon.nova-reel-v1:0, Provider: Amazon
Model Name: Nova Micro, Model ID: amazon.nova-micro-v1:0:128k, Provider: A

In [20]:
# Initialize Bedrock client
client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Full list of models from your input
models_to_test = [
    {"name": "Titan Text Large", "id": "amazon.titan-tg1-large", "provider": "Amazon"},
    {"name": "Titan Image Generator G1", "id": "amazon.titan-image-generator-v1:0", "provider": "Amazon"},
    {"name": "Titan Image Generator G1", "id": "amazon.titan-image-generator-v1", "provider": "Amazon"},
    {"name": "Titan Image Generator G1 v2", "id": "amazon.titan-image-generator-v2:0", "provider": "Amazon"},
    {"name": "Titan Text G1 - Premier", "id": "amazon.titan-text-premier-v1:0", "provider": "Amazon"},
    {"name": "Nova Pro", "id": "amazon.nova-pro-v1:0:300k", "provider": "Amazon"},
    {"name": "Nova Pro", "id": "amazon.nova-pro-v1:0", "provider": "Amazon"},
    {"name": "Nova Lite", "id": "amazon.nova-lite-v1:0:300k", "provider": "Amazon"},
    {"name": "Nova Lite", "id": "amazon.nova-lite-v1:0", "provider": "Amazon"},
    {"name": "Nova Canvas", "id": "amazon.nova-canvas-v1:0", "provider": "Amazon"},
    {"name": "Nova Reel", "id": "amazon.nova-reel-v1:0", "provider": "Amazon"},
    {"name": "Nova Micro", "id": "amazon.nova-micro-v1:0:128k", "provider": "Amazon"},
    {"name": "Nova Micro", "id": "amazon.nova-micro-v1:0", "provider": "Amazon"},
    {"name": "Titan Text Embeddings v2", "id": "amazon.titan-embed-g1-text-02", "provider": "Amazon"},
    {"name": "Titan Text G1 - Lite", "id": "amazon.titan-text-lite-v1:0:4k", "provider": "Amazon"},
    {"name": "Titan Text G1 - Lite", "id": "amazon.titan-text-lite-v1", "provider": "Amazon"},
    {"name": "Titan Text G1 - Express", "id": "amazon.titan-text-express-v1:0:8k", "provider": "Amazon"},
    {"name": "Titan Text G1 - Express", "id": "amazon.titan-text-express-v1", "provider": "Amazon"},
    {"name": "Titan Embeddings G1 - Text", "id": "amazon.titan-embed-text-v1:2:8k", "provider": "Amazon"},
    {"name": "Titan Embeddings G1 - Text", "id": "amazon.titan-embed-text-v1", "provider": "Amazon"},
    {"name": "Titan Text Embeddings V2", "id": "amazon.titan-embed-text-v2:0:8k", "provider": "Amazon"},
    {"name": "Titan Text Embeddings V2", "id": "amazon.titan-embed-text-v2:0", "provider": "Amazon"},
    {"name": "Titan Multimodal Embeddings G1", "id": "amazon.titan-embed-image-v1:0", "provider": "Amazon"},
    {"name": "Titan Multimodal Embeddings G1", "id": "amazon.titan-embed-image-v1", "provider": "Amazon"},
    {"name": "SDXL 1.0", "id": "stability.stable-diffusion-xl-v1:0", "provider": "Stability AI"},
    {"name": "SDXL 1.0", "id": "stability.stable-diffusion-xl-v1", "provider": "Stability AI"},
    {"name": "J2 Grande Instruct", "id": "ai21.j2-grande-instruct", "provider": "AI21 Labs"},
    {"name": "J2 Jumbo Instruct", "id": "ai21.j2-jumbo-instruct", "provider": "AI21 Labs"},
    {"name": "Jurassic-2 Mid", "id": "ai21.j2-mid", "provider": "AI21 Labs"},
    {"name": "Jurassic-2 Mid", "id": "ai21.j2-mid-v1", "provider": "AI21 Labs"},
    {"name": "Jurassic-2 Ultra", "id": "ai21.j2-ultra", "provider": "AI21 Labs"},
    {"name": "Jurassic-2 Ultra", "id": "ai21.j2-ultra-v1:0:8k", "provider": "AI21 Labs"},
    {"name": "Jurassic-2 Ultra", "id": "ai21.j2-ultra-v1", "provider": "AI21 Labs"},
    {"name": "Jamba-Instruct", "id": "ai21.jamba-instruct-v1:0", "provider": "AI21 Labs"},
    {"name": "Jamba 1.5 Large", "id": "ai21.jamba-1-5-large-v1:0", "provider": "AI21 Labs"},
    {"name": "Jamba 1.5 Mini", "id": "ai21.jamba-1-5-mini-v1:0", "provider": "AI21 Labs"},
    {"name": "Claude Instant", "id": "anthropic.claude-instant-v1:2:100k", "provider": "Anthropic"},
    {"name": "Claude Instant", "id": "anthropic.claude-instant-v1", "provider": "Anthropic"},
    {"name": "Claude", "id": "anthropic.claude-v2:0:18k", "provider": "Anthropic"},
    {"name": "Claude", "id": "anthropic.claude-v2:0:100k", "provider": "Anthropic"},
    {"name": "Claude", "id": "anthropic.claude-v2:1:18k", "provider": "Anthropic"},
    {"name": "Claude", "id": "anthropic.claude-v2:1:200k", "provider": "Anthropic"},
    {"name": "Claude", "id": "anthropic.claude-v2:1", "provider": "Anthropic"},
    {"name": "Claude", "id": "anthropic.claude-v2", "provider": "Anthropic"},
    {"name": "Claude 3 Sonnet", "id": "anthropic.claude-3-sonnet-20240229-v1:0:28k", "provider": "Anthropic"},
    {"name": "Claude 3 Sonnet", "id": "anthropic.claude-3-sonnet-20240229-v1:0:200k", "provider": "Anthropic"},
    {"name": "Claude 3 Sonnet", "id": "anthropic.claude-3-sonnet-20240229-v1:0", "provider": "Anthropic"},
    {"name": "Claude 3 Haiku", "id": "anthropic.claude-3-haiku-20240307-v1:0:48k", "provider": "Anthropic"},
    {"name": "Claude 3 Haiku", "id": "anthropic.claude-3-haiku-20240307-v1:0:200k", "provider": "Anthropic"},
    {"name": "Claude 3 Haiku", "id": "anthropic.claude-3-haiku-20240307-v1:0", "provider": "Anthropic"},
    {"name": "Claude 3 Opus", "id": "anthropic.claude-3-opus-20240229-v1:0:12k", "provider": "Anthropic"},
    {"name": "Claude 3 Opus", "id": "anthropic.claude-3-opus-20240229-v1:0:28k", "provider": "Anthropic"},
    {"name": "Claude 3 Opus", "id": "anthropic.claude-3-opus-20240229-v1:0:200k", "provider": "Anthropic"},
    {"name": "Claude 3 Opus", "id": "anthropic.claude-3-opus-20240229-v1:0", "provider": "Anthropic"},
    {"name": "Claude 3.5 Sonnet", "id": "anthropic.claude-3-5-sonnet-20240620-v1:0", "provider": "Anthropic"},
    {"name": "Claude 3.5 Sonnet v2", "id": "anthropic.claude-3-5-sonnet-20241022-v2:0", "provider": "Anthropic"},
    {"name": "Claude 3.7 Sonnet", "id": "anthropic.claude-3-7-sonnet-20250219-v1:0", "provider": "Anthropic"},
    {"name": "Claude 3.5 Haiku", "id": "anthropic.claude-3-5-haiku-20241022-v1:0", "provider": "Anthropic"},
    {"name": "Command", "id": "cohere.command-text-v14:7:4k", "provider": "Cohere"},
    {"name": "Command", "id": "cohere.command-text-v14", "provider": "Cohere"},
    {"name": "Command R", "id": "cohere.command-r-v1:0", "provider": "Cohere"},
    {"name": "Command R+", "id": "cohere.command-r-plus-v1:0", "provider": "Cohere"},
    {"name": "Command Light", "id": "cohere.command-light-text-v14:7:4k", "provider": "Cohere"},
    {"name": "Command Light", "id": "cohere.command-light-text-v14", "provider": "Cohere"},
    {"name": "Embed English", "id": "cohere.embed-english-v3:0:512", "provider": "Cohere"},
    {"name": "Embed English", "id": "cohere.embed-english-v3", "provider": "Cohere"},
    {"name": "Embed Multilingual", "id": "cohere.embed-multilingual-v3:0:512", "provider": "Cohere"},
    {"name": "Embed Multilingual", "id": "cohere.embed-multilingual-v3", "provider": "Cohere"},
    {"name": "Llama 3 8B Instruct", "id": "meta.llama3-8b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3 70B Instruct", "id": "meta.llama3-70b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.1 8B Instruct", "id": "meta.llama3-1-8b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.1 70B Instruct", "id": "meta.llama3-1-70b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.2 11B Instruct", "id": "meta.llama3-2-11b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.2 90B Instruct", "id": "meta.llama3-2-90b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.2 1B Instruct", "id": "meta.llama3-2-1b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.2 3B Instruct", "id": "meta.llama3-2-3b-instruct-v1:0", "provider": "Meta"},
    {"name": "Llama 3.3 70B Instruct", "id": "meta.llama3-3-70b-instruct-v1:0", "provider": "Meta"},
    {"name": "Mistral 7B Instruct", "id": "mistral.mistral-7b-instruct-v0:2", "provider": "Mistral AI"},
    {"name": "Mixtral 8x7B Instruct", "id": "mistral.mixtral-8x7b-instruct-v0:1", "provider": "Mistral AI"},
    {"name": "Mistral Large (24.02)", "id": "mistral.mistral-large-2402-v1:0", "provider": "Mistral AI"},
    {"name": "Mistral Small (24.02)", "id": "mistral.mistral-small-2402-v1:0", "provider": "Mistral AI"}
]

# Test input
test_input = "I like this!"
prompt = f"In only one word classify the sentiment of '{test_input}' into either Positive or Negative only. Do not use any other words for classification. Use only Positive or Negative"

def test_model(model):
    # Customize request body based on provider
    if "anthropic" in model["id"]:
        request_body = {
            "anthropic_version": "bedrock-2023-05-31",
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": 10,  # Small output expected
            "temperature": 0.1,
            "top_p": 1.0
        }
    elif "amazon" in model["id"] and "embed" in model["id"]:
        # Embedding models expect different input
        request_body = {"inputText": test_input}
    elif "amazon" in model["id"] and "image" in model["id"]:
        # Image models need image-specific input; skip for text test
        return "Skipped", "Image generation model, not applicable for text sentiment"
    else:
        # Generic format for most text models (Amazon, Meta, Mistral, AI21, Cohere)
        request_body = {
            "prompt": prompt,
            "max_tokens": 10,
            "temperature": 0.1
        }

    try:
        response = client.invoke_model(
            modelId=model["id"],
            contentType='application/json',
            body=json.dumps(request_body)
        )
        result = json.loads(response['body'].read().decode())

        # Extract output based on provider response structure
        if "anthropic" in model["id"]:
            output = result['content'][0]['text'].strip()
        elif "amazon" in model["id"] and "embed" in model["id"]:
            return "Skipped", "Embedding model, not applicable for sentiment"
        elif "amazon" in model["id"]:
            output = result.get('results', [{}])[0].get('outputText', 'Unknown').strip()
        elif "meta" in model["id"] or "mistral" in model["id"]:
            output = result.get('generation', result.get('text', 'Unknown')).strip()
        elif "ai21" in model["id"]:
            output = result.get('completions', [{}])[0].get('data', {}).get('text', 'Unknown').strip()
        elif "cohere" in model["id"] and "embed" not in model["id"]:
            output = result.get('generations', [{}])[0].get('text', 'Unknown').strip()
        else:
            output = result.get('text', 'Unknown').strip()

        # Validate response
        if output in ["Positive", "Negative"]:
            return "Success", output
        else:
            return "Invalid", f"Unexpected output: {output}"
    except ClientError as e:
        return "Error", str(e)
    except Exception as e:
        return "Error", f"Unexpected error: {str(e)}"

# Run tests
print("Testing Bedrock Models (March 5, 2025):")
print("-" * 80)
res=[]
for model in models_to_test:
    status, details = test_model(model)
    print(f"Model: {model['name']:<25} | ID: {model['id']:<40} | Status: {status:<10} | Details: {details}")
    if status == "Success":
        res.append(model['name'])

Testing Bedrock Models (March 5, 2025):
--------------------------------------------------------------------------------
Model: Titan Text Large          | ID: amazon.titan-tg1-large                   | Status: Error      | Details: An error occurred (AccessDeniedException) when calling the InvokeModel operation: You don't have access to the model with the specified model ID.
Model: Titan Image Generator G1  | ID: amazon.titan-image-generator-v1:0        | Status: Skipped    | Details: Image generation model, not applicable for text sentiment
Model: Titan Image Generator G1  | ID: amazon.titan-image-generator-v1          | Status: Skipped    | Details: Image generation model, not applicable for text sentiment
Model: Titan Image Generator G1 v2 | ID: amazon.titan-image-generator-v2:0        | Status: Skipped    | Details: Image generation model, not applicable for text sentiment
Model: Titan Text G1 - Premier   | ID: amazon.titan-text-premier-v1:0           | Status: Error      | Detail

In [21]:
print("Models working:")
print(res)

Models working:
['Claude 3 Sonnet', 'Claude 3.5 Sonnet']


## Only 2 models are working which are Claude 3 Sonnet and Claude 3.5 Sonnet

## Select two distinct LLMs (e.g., Claude, LLaMA) for sentiment analysis.

### Model 1: Claude 3 Sonnet

In [22]:
df1['Claude3_Prediction'].value_counts()

Claude3_Prediction
Negative    51
Positive    49
Name: count, dtype: int64

In [23]:
evaluate_model_performance("Claude3_Prediction")

Accuracy: 97.00%
Precision: 0.98
Recall: 0.96
F1-Score: 0.97


### Model 2: Claude 3.5 Sonnet

In [24]:
Claude3_5 = 'anthropic.claude-3-5-sonnet-20240620-v1:0'

df1['Claude3_5_Prediction'] = LLM(df1['text'], Claude3_5)

In [25]:
df1['Claude3_5_Prediction'].value_counts()

Claude3_5_Prediction
Positive    50
Negative    50
Name: count, dtype: int64

In [26]:
evaluate_model_performance("Claude3_5_Prediction")

Accuracy: 98.00%
Precision: 0.98
Recall: 0.98
F1-Score: 0.98


### Display the output

In [27]:
print("All Claude3 Predictions:")
print(df1['Claude3_Prediction'].tolist())

print("\nAll Claude3.5 Predictions:")
print(df1['Claude3_5_Prediction'].tolist())

All Claude3 Predictions:
['Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Negative', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Negative', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Positive', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'Negative', 'N

In [28]:
comparison = pd.DataFrame({'Claude3': df1['Claude3_Prediction'].value_counts(), 'Claude3.5': df1['Claude3_5_Prediction'].value_counts()})
print("\nComparison of Prediction Counts:")
print(comparison)


Comparison of Prediction Counts:
          Claude3  Claude3.5
Negative       51         50
Positive       49         50


## Text Cell 6 (20%) - Discussion and Observations

### Compare and contrast the performance of zero-shot and few-shot learning.

Few-shot learning slightly outperforms zero-shot in my case, leveraging a few examples to achieve near-perfect results (99% accuracy, 1 error) compared to zero-shot’s strong but imperfect performance (97% accuracy, 3 errors). Zero-shot remains highly effective for a no-data scenario, while few-shot excels with minimal supervision, making it the better choice when a small labeled set is feasible.

### Identify cases where LLM predictions differ from actual labels.

#### Claude 3 (Zero-Shot):
* 1 False Positive: 1 sample predicted "Positive" but actually "Negative".
* 2 False Negatives: 2 samples predicted "Negative" but actually "Positive".

Total: 3 mismatches.

#### Claude 3 (Few-Shot):
* 1 False Negative: 1 sample predicted "Negative" but actually "Positive".

Total: 1 mismatch.

#### Claude 3.5:
* 1 False Positive: 1 sample predicted "Positive" but actually "Negative".
* 1 False Negative: 1 sample predicted "Negative" but actually "Positive".

Total: 2 mismatches.

### Analyze potential reasons for misclassifications.

General reasons for bad model performance:
* Ambiguity: Sentiment often depends on context (e.g., "cool" can be positive or negative), which LLMs might misjudge without full context.
* Sarcasm/Irony: All models may struggle with non-literal language (e.g., "Great job breaking it" → Negative).
* Data Bias: Training data or few-shot examples might over-represent clear-cut cases, leaving edge cases vulnerable.
* Negation Handling: Phrases like "not terrible" (Positive) might be misread as Negative due to "terrible."

Potential reasons for misclassifications in my models:                                                                    
* Claude 3 (Zero-Shot): Misclassifies due to overgeneralization (1 FP) and missing subtle positives (2 FN), inherent to no-data reliance.
* Claude 3 (Few-Shot): Minimal errors (1 FN) from unrepresentative examples or edge cases, showing the power of few-shot tuning.
* Claude 3.5: Balanced errors (1 FP, 1 FN) suggest refinement but not full immunity to ambiguity or subtlety.

### Discuss the differences in outputs from various LLMs

* Claude 3 (Zero-Shot): Outputs are less precise (3 errors), over-predicting negatives and missing positives due to no task-specific tuning.
* Claude 3 (Few-Shot): Outputs are the most accurate (1 error), with perfect precision, shaped by example-driven specialization.
* Claude 3.5: Outputs improve over zero-shot (2 errors), balancing precision and recall but not matching few-shot’s precision.

## Text Cell 7 (5%) - Acknowledge if you have used any GenAI tools or collaborated with others.

I have used Chatgpt and Grok AI for interpretations and to resolve any bugs/errors in the code.

## HTML output (5%) - Convert the notebook to an HTML file ensuring all outputs are visible

In [1]:
!jupyter nbconvert "LA6_Shah_Rhythm.ipynb" --to html

[NbConvertApp] Converting notebook LA6_Shah_Rhythm.ipynb to html
[NbConvertApp] Writing 416455 bytes to LA6_Shah_Rhythm.html
