# Task 1: Rating Prediction via Prompting

## Goal
Predict 1â€“5 star ratings from Yelp review text using LLM prompting only (no ML model training).

## Requirements
- Dataset: Yelp Reviews (sample ~200)
- Output: Strict JSON `{"predicted_stars": int, "explanation": str}`
- Strategies: Zero-Shot, Reasoning, Few-Shot

In [1]:
import os
import json
import random
import pandas as pd
from datasets import load_dataset
from sklearn.metrics import accuracy_score
import google.generativeai as genai
from dotenv import load_dotenv

# Load API Key
load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel("gemini-1.5-flash")

## 1. Load Dataset
We use the `yelp_review_full` dataset from Hugging Face and select a sample of 200 reviews from the test split.

In [2]:
dataset = load_dataset("yelp_review_full", split="test")
samples = dataset.select(range(200))
print(f"Loaded {len(samples)} samples.")

Loaded 200 samples.


## 2. Define Prompts

In [4]:
def get_gemini_response(prompt):
    try:
        response = model.generate_content(prompt)
        text = response.text.replace("```json", "").replace("```", "").strip()
        return json.loads(text)
    except Exception:
        return None

def prompt_zero_shot(text):
    return f"""
    You are a sentiment analysis expert. Analyze the review and predict rating (1-5).
    Review: "{text}"
    Output JSON: {{"predicted_stars": <int>, "explanation": "<str>"}}
    """

def prompt_reasoning(text):
    return f"""
    Analyze sentiment, tone, complaints, and praise. First determine if happy/neutral/angry.
    Then decide rating 1-5.
    Review: "{text}"
    Output JSON: {{"predicted_stars": <int>, "explanation": "<str>"}}
    """

def prompt_few_shot(text, examples):
    ex_text = "\n".join([f'Review: "{e["text"]}"\nJSON: {{"predicted_stars": {e["label"]}}}' for e in examples])
    return f"""
    Examples:
    {ex_text}
    
    Classify:
    Review: "{text}"
    Output JSON: {{"predicted_stars": <int>, "explanation": "<str>"}}
    """

## 3. Evaluation Loop
Iterating through samples and querying the model with all 3 prompts.

In [6]:
# (See run_evaluation.py for full execution logic to save API costs in interactive mode)
# Here we load the results if pre-computed
if os.path.exists("evaluation_results.csv"):
    df = pd.read_csv("evaluation_results.csv")
    print("Loaded pre-computed results.")
    print(df.head())
else:
    print("No results found. Run the evaluation script.")

No results found. Run the evaluation script.


## 4. Results & Analysis

In [None]:
if 'df' in locals():
    # Accuracy Calculation
    acc1 = accuracy_score(df[df['p1_valid']]['actual_stars'], df[df['p1_valid']]['p1_pred'])
    acc2 = accuracy_score(df[df['p2_valid']]['actual_stars'], df[df['p2_valid']]['p2_pred'])
    acc3 = accuracy_score(df[df['p3_valid']]['actual_stars'], df[df['p3_valid']]['p3_pred'])
    
    print(f"Zero-Shot Accuracy: {acc1:.2f}")
    print(f"Reasoning Accuracy: {acc2:.2f}")
    print(f"Few-Shot Accuracy:  {acc3:.2f}")