# LLM Groq

This notebook uses the Groq API to generate different LLMs' (Llama, Qwen, GPT) predictions on the test set.

## 0. Setting

In [4]:
# Install dependencies
! pip install -r requirements.txt



Load libraries

In [None]:
import json
import os
import time


from groq import Groq
import numpy as np
import pandas as pd
from dotenv import load_dotenv

load_dotenv()


True

To use the LLM model provided by Groq, **you must obtain a Google API key from [here](https://aistudio.google.com/api-keys)￼ and place it in the `/.env` file as GEMINI_API_KEY=XXXXX. Without a valid API key**, the Gemini model cannot be called successfully.

## 1. Define Prompting Function

To make the notebook cleaner and improve how prompts and outputs are formatted, it’s better to define a prompt generator.
Therefore, I created a prompt generator that produces prompts based on the provided arguments.

In [13]:
class PromptGenerator:
    def __init__(self, few_shot:bool, cot:bool, binary:bool =False) -> None: 
        self.few_shot = few_shot
        self.cot = cot
        self.binary = binary

    def generate_general_instruction(self, batch_size:int) -> str:
        '''
            Create general instruction for sentiment analysis task
        '''
        if self.binary:
            sentiment_scale = """3.  **Sentiment Scale:** 0 = Negative and 1 =  Positive."""
        else:
            sentiment_scale = """3.  **Sentiment Scale:** Use a 5-point star rating (0 = Very Negative, 4 = Very Positive)."""
        
        general_instruction = f"""
            Analyze the sentiment for the {batch_size} Amazon product reviews provided below.
            The unique index for each review is provided in the '<review id="...">' tag.

            # --- INSTRUCTIONS & CONSTRAINTS ---
            1.  **Strict Output:** Your final output MUST be a single, valid JSON object containing a 'reviews' array.
            2.  **Indexing:** The 'index' field in your JSON output MUST correspond exactly to the 'id' extracted from the <review id="..."> tag.
            {sentiment_scale}
            4.  **No Explanation:** Do NOT include any introductory text, explanation, your thought process, or any Markdown fences (like ```json or ```) outside of the required JSON object.

        """
        
        return general_instruction
    
    def generate_cot_instruction(self) -> str:
        '''
            Create Chain-of-Thought instruction for sentiment analysis task
        '''
        if self.binary:
            scale = """4. Assign the final sentiment rating (0 or 1)."""
        else:
            scale = """4. Assign the final sentiment rating (0, 1, 2, 3, or 4)."""
            
        cot_instruction = f"""
            # --- CHAIN OF THOUGHT (CoT) PROCESS ---
            For each review, you MUST perform a Chain-of-Thought process and enclose it in a <CoT> XML tag. This process helps ensure accuracy. Your reasoning must follow these steps:
            <CoT>
            1. Identify the main sentiment/emotion (e.g., happiness, frustration, disappointment).
            2. List specific positive aspects (+ve) and negative aspects (-ve) mentioned in the review.
            3. Evaluate the overall net sentiment, giving appropriate weight to pros and cons.
            {scale}
            </CoT>
            
            You MUST include this <CoT> reasoning for each review in your response.
            """
        
        return cot_instruction

    def generate_few_shot_examples(self) -> list:
        '''
            Create few-shot examples for sentiment analysis task.
            I randomly selected some examples from the training set to illustrate both binary and multi-class sentiment analysis.
        '''
        if self.binary:
            few_shot_examples = [
                # --- Example 1 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'1\\'>So glad I could get my deodorant online at Amazon. This has a great scent too.</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '1', "sentiment_rating": "1"}}""",
                },
                # --- Example 2 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'5\\'>It is not organic , it's made in china, left my hair dry ... returning .</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '5', "sentiment_rating": "0"}}""",
                },
                # --- End of Few-Shot Examples ---
            ]
        else:
            few_shot_examples = [
                # --- Example 1 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'1\\'>So glad I could get my deodorant online at Amazon. This has a great scent too.</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '1', "sentiment_rating": "4"}}""",
                },
                # --- Example 2 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'2\\'>extremely metallic, two coats does the trick. however, the chemical smell is EXTREMELY strong. you need to open a window and run a fan while applying.</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '2', "sentiment_rating": "3"}}""",
                },
                # --- Example 3 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'3\\'>Very, very thin,, not to absorbent</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '3', "sentiment_rating": "2"}}""",
                },
                # --- Example 4 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'4\\'>Relatively short and not good for kinky hair.</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '4', "sentiment_rating": "1"}}""",
                },
                # --- Example 5 ---
                {
                    "role":"user",
                    "content":"""<review id=\\'5\\'>It is not organic , it's made in china, left my hair dry ... returning .</review>""",
                },
                {
                    "role":"assistant",
                    "content":"""{{"index": '5', "sentiment_rating": "0"}}""",
                },
                # --- End of Few-Shot Examples ---
            ]
        return few_shot_examples


    def generate_final_instruction(self, text_batch: str) -> str:
        '''
            Create review test sets as final instruction for sentiment analysis task
        '''
        final_instruction = f"""
                --- REVIEWS START ---
                {text_batch}
                --- REVIEWS END ---
            """
        return final_instruction

    def gen_query(self, batch_size: int, text_batch: str) -> list:
        '''
            Generate the full query for sentiment analysis task
        '''
        general_instruction = self.generate_general_instruction(batch_size)
        
        cot_instruction = ''
        if self.cot:
            cot_instruction = self.generate_cot_instruction()
        
        final_instruction = self.generate_final_instruction(text_batch)

        if self.few_shot:
            instructions_query = [
                {
                    "role":"user",
                    "content":general_instruction + cot_instruction,
                }
            ]
            few_shot_examples = self.generate_few_shot_examples()
            review_query = [
                {
                    "role":"user",
                    "content":final_instruction,
                }
            ]
            return instructions_query + few_shot_examples + review_query
        else:
            return [
                {
                    "role":"user",
                    "content":general_instruction + cot_instruction + final_instruction,
                }
            ]
        
    def generate_output_schema(self) -> dict:
        '''
            Generate the output schema for sentiment analysis task
        '''

        if self.binary:
            sentiment_enum = ["0", "1"]
        else:
            sentiment_enum = ["0", "1", "2", "3", "4"]

        response_format = {
            'type': "json_schema",
            'json_schema': {
                'name': "product_review",
                'schema': {
                    'type': "object",
                    'properties': {
                    'index': { 'type': "string" },
                    'sentiment_rating': { 
                        'type': "string",
                        'enum': sentiment_enum
                        },
                    },
                'required': ["index", "sentiment_rating"],
                'additionalProperties': False
                }
            }
        }
        return response_format
    
    def  generate_system_query(self) -> str:
        '''
            Create system instruction for sentiment analysis task
        '''
        if self.binary:
            content = "2.  **Content:** For each review, provide the sentiment as a string representation of an integer: either 0 (negative) or 1 (positive)."
        else:
            content = "2.  **Content:** For each review, provide the sentiment as a string representation of an integer from 0 (very negative) to 4 (very positive)."
        system_instruction = f"""
            You are an expert sentiment analyst for Amazon product reviews. Your task is to process a batch of reviews and output the results as a single JSON object.
            1.  **Indexing:** The 'reviews' array MUST contain the same number of items as the input reviews, and each item's 'index' MUST correspond exactly to the review's sequential position.
            {content}
            3.  **No Explanation:** DO NOT include any introductory text, explanation, or any Markdown fences (like ```json or ```) outside of the required JSON object.
            """
        
        system_query = [{"role": "system", "content": system_instruction}]
        return system_query

def predict_sentiments_groq(sample_text: list[str]
                            , chunk_size: int
                            , model: str
                            , few_shot: bool=False
                            , cot: bool=False
                            , binary: bool=False
                            , response_format: bool=True) -> list[dict]:
    '''
        Predict sentiment ratings for a list of reviews using Gemini API.
    '''
    client_groq = Groq()
    all_predictions = []
    responses = []

     # chunk prediction to avoid rate limiting and output size inconsistency
    for i in range(0, len(sample_text), chunk_size):
        # Get a slice of the reviews
        batch = sample_text[i:i + chunk_size]
        text_batch = ''
        for ind, text in enumerate(batch):
            # Use a clear XML tag for each review and its index
            text_batch += f"<review id='{i + ind}'>{text}</review>\n"
        
        
        # format prompt, instructions, and output schema
        prompt_generator = PromptGenerator(few_shot=few_shot, cot=cot, binary=binary)
        query = prompt_generator.gen_query(batch_size=len(batch), text_batch=text_batch)
        system_query = prompt_generator.generate_system_query()

        # only llama 4 and gpt oss models support response schema
        if response_format:
            response_format = prompt_generator.generate_output_schema()
        messages = system_query + query
        
        # try to generate content if error occurs wait and retry
        try_count = 0
        while try_count < 10:
            try:
                try_count += 1
                if response_format:
                    chat_completion = client_groq.chat.completions.create(
                        messages=messages,
                        response_format=response_format,
                        model=model,
                    )
                else:
                    chat_completion = client_groq.chat.completions.create(
                        messages=messages,
                        model=model,
                    )
                break  # Exit the retry loop if successful
            except Exception as e:
                print(f"Error occurred: {e}. Retrying in 60 seconds...")
                time.sleep(60)   

        # Extract the content from the response and store it for manual inspection if needed
        content = chat_completion.choices[0].message.content
        responses.append(content)


        # Parse the response content. 
        # Without prompt formatting, we need to extract the JSON part from the response by text cleaning.
        try:
            if response_format:
                response_data = json.loads(content)
            else:
                response_data = json.loads(content.split('</think>')[-1].replace('```json', '').replace('```', '').strip())
            # response_data = json.loads(c)
            # response_data = response_data['reviews']

            # save predictions
            key = list(response_data.keys())[0]
            predictons = response_data[key]
            print(len(predictons))
            
            all_predictions.extend(predictons)

        except Exception as e:
            print(chat_completion.choices[0].message.content)
            print(f"Error parsing response: {e}")
            continue
        
        time.sleep(60)  # To avoid rate limiting

    return all_predictions, responses

## 2. Prompting

## meta-llama/llama-4-maverick-17b-128e-instruct

In [None]:
test_llama = pd.read_parquet("data/test_10k_5.parquet")
sample_text = test_llama.text.to_list()

### 0-Shot

In [None]:

model = "meta-llama/llama-4-maverick-17b-128e-instruct"
chunk_size = 50

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=False,
                                         cot=False,
                                         binary=False,
                                         response_format=True) 
test_llama['pred_0s_2'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)

50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50


### Few-Shot

In [None]:
model = "meta-llama/llama-4-maverick-17b-128e-instruct"
chunk_size = 60
all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                          chunk_size=chunk_size, 
                                          model=model, 
                                          few_shot=True, 
                                          cot=False, 
                                          binary=False,
                                          response_format=True) 

test_llama['pred_5s'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)

60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
40


np.float64(0.746)

### Chain-of-Thought

In [None]:
model = "meta-llama/llama-4-maverick-17b-128e-instruct"
chunk_size = 60
all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                          chunk_size=chunk_size, 
                                          model=model, 
                                          few_shot=False, 
                                          cot=True, 
                                          binary=False,
                                          response_format=True) 

test_llama['pred_cot'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)

np.float64(0.739)

### Chain-of-thought + Few-Shot

In [18]:
model = "meta-llama/llama-4-maverick-17b-128e-instruct"
chunk_size = 60
all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                          chunk_size=chunk_size, 
                                          model=model, 
                                          few_shot=True, 
                                          cot=True, 
                                          binary=False,
                                          response_format=True) 
test_llama['pred_cot_5s'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)

60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
40


In [None]:
test_llama.to_csv("results/test_5_llm_llama_4_128e.csv")

## openai/gpt-oss-120b

In [19]:
test_gpt = pd.read_parquet("data/test_10k_5.parquet")
sample_text = test_gpt.text.to_list()

In [None]:
## GPT-oss
model = "openai/gpt-oss-120b"
chunk_size = 50

all_predictions = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=False,
                                         cot=False,
                                         binary=False,
                                         response_format=True) 
test_gpt['pred_gpt_0s'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)
test_gpt.to_csv("results/test_5_llm_gpt_oss_120b.csv", index=False)

50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50


### Few-Shot

In [None]:
model = "openai/gpt-oss-120b"
chunk_size = 50

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=True,
                                         cot=False,
                                         binary=False,
                                         response_format=True) 
test_gpt['pred_gpt_5s'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)
test_gpt.to_csv("results/test_5_llm_gpt_oss_120b.csv", index=False)


np.float64(0.704)

### Chain-of-thought

In [None]:
model = "openai/gpt-oss-120b"
chunk_size = 50

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=False,
                                         cot=True,
                                         binary=False,
                                         response_format=True) 
test_gpt['pred_gpt_cot'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)
test_gpt.to_csv("results/test_5_llm_gpt_oss_120b.csv", index=False)

50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50


### Chain-of-thought + Few-Shot

In [None]:
model = "openai/gpt-oss-120b"
chunk_size = 50

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=True,
                                         cot=True,
                                         binary=False,
                                         response_format=True)

test_gpt['pred_gpt_cot_5s'] = pd.DataFrame(all_predictions)['sentiment_rating'].astype(int)
test_gpt.to_csv("results/test_5_llm_gpt_oss_120b.csv", index=False)

50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50


## qwen/qwen3-32b

In [22]:
test_qwen = pd.read_parquet("data/test_10k_5.parquet") # Note that all of the test data for 5-class model are the same
sample_text = test_qwen.text.to_list()

### 0-Shot

In [23]:
model = "qwen/qwen3-32b"
chunk_size = 60

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=False,
                                         cot=False,
                                         binary=False,
                                         response_format=False) 


60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
40


In [None]:
# Sometimes the response is not properly formatted, so we need to do some text cleaning
all_pred = pd.DataFrame()
for i in all_responses:
    content = i.split('</think>')[-1].replace('```json', '').replace('```', '').strip()
    json_content = json.loads(content)
    try:
        df = pd.DataFrame(json_content['reviews'])
    except Exception as e:
        df = pd.DataFrame(json_content)
        # break
    all_pred = pd.concat([all_pred, df], ignore_index=True)

if 'sentiment_rate' in all_pred.columns:
    all_pred['sentiment_rating'] = np.where(all_pred['sentiment_rating'].notna(),
                                        all_pred['sentiment_rating'],
                                            all_pred['sentiment_rate'])

test_qwen['pred_qwen3_0s'] = all_pred['sentiment_rating'].astype(int)
test_qwen.to_csv("results/test_5_llm_qwen3_32b.csv")


### Few-Shot

In [None]:
model = "qwen/qwen3-32b"
chunk_size = 60

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=True,
                                         cot=False,
                                         binary=False,
                                         response_format=False) 

# Sometimes the response is not properly formatted, so we need to do some text cleaning
all_pred = pd.DataFrame()
for i in all_responses:
    content = i.split('</think>')[-1].replace('```json', '').replace('```', '').strip()
    json_content = json.loads(content)
    try:
        df = pd.DataFrame(json_content['reviews'])
    except Exception as e:
        df = pd.DataFrame(json_content)
        # break
    all_pred = pd.concat([all_pred, df], ignore_index=True)

if 'sentiment_rate' in all_pred.columns:
    all_pred['sentiment_rating'] = np.where(all_pred['sentiment_rating'].notna(),
                                        all_pred['sentiment_rating'],
                                            all_pred['sentiment_rate'])

test_qwen['pred_qwen3_5s'] = all_pred['sentiment_rating'].astype(int)
test_qwen.to_csv("results/test_5_llm_qwen3_32b.csv", index=False)

60
60
60
60
60
<think>
Okay, let's tackle this batch of reviews. First, I need to go through each one and determine the sentiment on a scale from 0 to 4. Let's start with review 300. The user mentions a part that popped off but got it back on and likes the cooling effect and massage feel. There's a mix of negative and positive points, but overall, the positive aspects seem to outweigh. Maybe a 3?

Review 301 is positive, talking about the scent being nice and lasting. The user prefers it over expensive ones. That sounds like a 4. 

Review 302 lists three good points without negative. Definitely a 4. 

Review 303 is a strong positive with the user being happy and mentioning quality. 4 makes sense here. 

Review 304 says the girlfriend loves them and they're worth the price. Positive, so 4. 

Review 305 praises the tool for dance classes and effectiveness. 4 again. 

Review 306 expresses disappointment Sally's doesn't sell them. That's a 3, maybe slightly negative but still positive. 

R

### Chain-of-thought

In [None]:
model = "qwen/qwen3-32b"
chunk_size = 60

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=False,
                                         cot=True,
                                         binary=False,
                                         response_format=False) 

# Sometimes the response is not properly formatted, so we need to do some text cleaning
all_pred = pd.DataFrame()
for i in all_responses:
    content = i.split('</think>')[-1].replace('```json', '').replace('```', '').strip()
    json_content = json.loads(content)
    try:
        df = pd.DataFrame(json_content['reviews'])
    except Exception as e:
        df = pd.DataFrame(json_content)
        # break
    all_pred = pd.concat([all_pred, df], ignore_index=True)

if 'sentiment_rate' in all_pred.columns:
    all_pred['sentiment_rating'] = np.where(all_pred['sentiment_rating'].notna(),
                                        all_pred['sentiment_rating'],
                                            all_pred['sentiment_rate'])

test_qwen['pred_qwen3_cot'] = all_pred['sentiment_rating'].astype(int)
test_qwen.to_csv("results/test_5_llm_qwen3_32b.csv", index=False)

Error occurred: Connection error.. Retrying in 60 seconds...
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
40


np.float64(0.654)

### Chain-of-thought + Few-shot

In [None]:
model = "qwen/qwen3-32b"
chunk_size = 60

all_predictions, all_responses = predict_sentiments_groq(sample_text=sample_text, 
                                         chunk_size=chunk_size, 
                                         model=model,
                                         few_shot=True,
                                         cot=True,
                                         binary=False,
                                         response_format=False) 

# Sometimes the response is not properly formatted, so we need to do some text cleaning
all_pred = pd.DataFrame()
for i in all_responses:
    content = i.split('</think>')[-1].replace('```json', '').replace('```', '').strip()
    json_content = json.loads(content)
    try:
        df = pd.DataFrame(json_content['reviews'])
    except Exception as e:
        df = pd.DataFrame(json_content)
        # break
    all_pred = pd.concat([all_pred, df], ignore_index=True)

if 'sentiment_rate' in all_pred.columns:
    all_pred['sentiment_rating'] = np.where(all_pred['sentiment_rating'].notna(),
                                        all_pred['sentiment_rating'],
                                            all_pred['sentiment_rate'])

test_qwen['pred_qwen3_cot_5s'] = all_pred['sentiment_rating'].astype(int)
test_qwen.to_csv("results/test_5_llm_qwen3_32b.csv", index=False)

60
60
60
60
60
<think>
Okay, I need to process these Amazon reviews and assign each a sentiment rating from 0 to 4. Let me start by going through each review one by one, using the CoT steps as instructed.

Starting with review 300. The user mentions that the product popped off with little force initially but was able to fix it. There are positive aspects like it stays cold, feels good for facial massage, and works after being fixed. The main sentiment is mixed, but since they continue using it and mention benefits, I'll rate it a 3.

Review 301 talks about a nice smell that's not overpowering and a good value compared to expensive products. The sentiment is positive. Rating 4.

Review 302 is three positive points about the product's performance. Definitely positive. Rating 4.

Review 303 praises the chapstick for lasting, high melting point, quality, and value. Positive. Rating 4.

Review 304 mentions the girlfriend loves them and they're worth the price. Positive. Rating 4.

Review 30