# ðŸ“˜ Notebook: `03_genai_classification.ipynb`

## Overview

In previous notebooks, we:
- Generated a synthetic dataset
- Trained and evaluated a classical Logistic Regression model

In this notebook, we take a different perspective:
- No training phase is required
- The model relies entirely on prompt engineering and reasoning
- Predictions are generated dynamically via an external LLM

We will:
1. Define an ambiguous fruit sample
2. Query a local LLM (Llama 3 via Ollama)
3. Analyze the modelâ€™s reasoning, confidence, and latency

This experiment helps illustrate when and why GenAI may (or may not) be appropriate for classification tasks.


## LLM-Based Classification (Zero-Shot)

In this section, we classify a single fruit instance using a **Large Language Model**.

Key characteristics of this approach:
- The LLM has **no access to the dataset**
- All domain knowledge is provided explicitly via the prompt
- The model performs **zero-shot reasoning**
- The output includes both a prediction and an explanation

This setup allows us to compare interpretability, flexibility, and latency against the classical ML approach.


In [1]:
import pandas as pd
import requests
import json
import time

# --- 1. LOAD DATA AND CONTEXT ---
# Load the processed dataset used by the classical model
df = pd.read_csv('../data/processed/fruits_dataset.csv')

# Select a challenging or ambiguous fruit
# We intentionally choose an "uncertain" case:
# intermediate weight and medium skin roughness
mysterious_fruit = {'weight_g': 175, 'skin_roughness': 6.0}

print(f"Analyzing fruit: {mysterious_fruit}")

# --- APPROACH 1: USING LLAMA 3 AS A CLASSIFIER ---

def query_llama(weight, roughness):
    # Special Docker URL to access Ollama running on the host machine
    url = "http://host.docker.internal:11434/api/generate"
    
    # Prompt Engineering:
    # We explicitly describe the synthetic universe and its rules
    prompt = (
        f"You are an expert agronomist. There are two possible fruits:\n"
        f"1. Avocado: Usually weighs around 200g and has rough skin (8/10 scale).\n"
        f"2. Mango: Usually weighs around 150g and has smooth skin (3/10 scale).\n\n"
        f"I have a fruit that weighs {weight}g and has a skin roughness of {roughness}/10.\n"
        f"Based on this information, which fruit is it most likely to be?\n"
        f"Respond using this exact JSON format: "
        f"{{'fruit': 'name', 'confidence': 'high/medium/low', 'reason': 'brief explanation'}}."
    )
    
    payload = {
        "model": "llama3",  # Any model available in Ollama can be used here
        "prompt": prompt,
        "stream": False,
        # Llama 3 supports native JSON mode, which is ideal for programmatic usage
        "format": "json"
    }
    
    try:
        response = requests.post(url, json=payload)
        if response.status_code == 200:
            return json.loads(response.json()['response'])
        else:
            return f"Error: {response.text}"
    except Exception as e:
        return (
            "Connection error. Is Ollama running on the host machine?\n"
            f"Details: {e}"
        )

# --- EXECUTION ---
print("Querying Llama 3 (this may take a few seconds)...")
start_time = time.time()
llama_result = query_llama(
    mysterious_fruit['weight_g'],
    mysterious_fruit['skin_roughness']
)
end_time = time.time()

print(f"\nResponse time: {end_time - start_time:.2f} seconds")
print("--- AI OPINION ---")
print(json.dumps(llama_result, indent=2))


Analyzing fruit: {'weight_g': 175, 'skin_roughness': 6.0}
Querying Llama 3 (this may take a few seconds)...

Response time: 38.45 seconds
--- AI OPINION ---
{
  "fruit": "Avocado",
  "confidence": "high",
  "reason": "The weight of 175g is closer to the average avocado weight (200g) than the mango weight (150g), and the skin roughness of 6.0/10 is also more typical of avocados (8/10) than mangoes (3/10)."
}


## Results and Observations

The LLM returns:
- A predicted fruit type
- A qualitative confidence level
- A short natural-language explanation

In addition, we measure the response time, which is a critical factor when considering GenAI systems in production environments.

This experiment demonstrates that:
- LLMs can act as reasoning-based classifiers
- Predictions are explainable but non-deterministic
- Latency and external dependencies must be carefully considered

In the next steps, this approach can be compared directly against the classical model in terms of:
- Accuracy
- Consistency
- Performance
