<a href="https://colab.research.google.com/github/Shalvigour/LLM_COMPARATOR/blob/main/llm_comparative_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Comparative Analysis of Large Language Model (LLM) Platforms

This Jupyter Notebook provides a framework for comparing various commonly used Large Language Model (LLM) platforms. The goal is to help you understand the differences in their performance (latency, token usage), ease of use, and model availability for your specific applications.

We will explore the following platforms:

1.  **OpenAI**: Known for state-of-the-art models like GPT-4o, GPT-4, and GPT-3.5.
2.  **Hugging Face Inference API**: A hub for open-source LLMs and tools, offering flexibility to use a vast array of models.
3.  **Groq**: Specializes in high-speed, low-latency inference for certain LLMs.
4.  **Ollama**: For running open-source LLMs locally, providing privacy and cost control.
5.  **Google Gemini (via Google AI Studio/Vertex AI)**: Google's multimodal LLM family.
6.  **Anthropic (Claude)**: Known for its focus on safety and constitutional AI.
7.  **Mistral AI**: Popular for its powerful and efficient open-source models, often with competitive performance.

## Key Aspects for Comparative Analysis

When comparing these platforms, we will primarily focus on:

* **Performance (Speed & Quality)**:
    * **Latency**: How quickly does the model respond (Time to First Token & Time to Last Token)?
    * **Token Usage**: Number of input and output tokens, which directly impacts cost.
    * **Qualitative Output**: The quality, coherence, and relevance of the generated response (requires manual review).
* **Cost**: While not directly calculated in code, token usage data will inform cost estimations (refer to official pricing pages).
* **Ease of Use & Integration**: API simplicity and library support.
* **Model Availability & Flexibility**: Proprietary vs. open-source models, fine-tuning capabilities.

**Disclaimer**: Latency measurements can be affected by network conditions, API server load, and the specific model chosen. For robust benchmarks, multiple runs and averaging are recommended.

## 1. Setup and Environment Configuration

First, we need to install all the necessary Python libraries for interacting with each LLM platform. We will also set up environment variables for API keys to ensure security.

In [None]:
# Install necessary libraries
!pip install openai huggingface_hub groq google-generativeai anthropic mistralai ollama python-dotenv
import os
import time
import pandas as pd
from dotenv import load_dotenv
from google.colab import userdata # Import userdata

# Load environment variables from a .env file
# Create a file named `.env` in the same directory as this notebook.
# Add your API keys to the .env file like this:
# OPENAI_API_KEY="your_openai_api_key_here"
# HF_TOKEN="your_huggingface_token_here"
# GROQ_API_KEY="your_groq_api_key_here"
# GOOGLE_API_KEY="your_google_api_key_here"
# ANTHROPIC_API_KEY="your_anthropic_api_key_here"
# MISTRAL_API_KEY="your_mistral_api_key_here"

load_dotenv()



False

### Standardized Prompt for Comparison

To ensure a fair comparison, we will use the exact same prompt for all LLMs.

In [None]:
test_prompt = "Explain the concept of Artificial General Intelligence (AGI) in one paragraph, focusing on its potential impact on society and daily life."
MAX_TOKENS = 150 # Max tokens for generated response (to keep output length somewhat consistent)
TEMPERATURE = 0.7 # Creativity/randomness of the response

## 2. LLM Platform Implementations

Below, we provide Python functions to interact with each LLM platform. Each function will attempt to query the model, measure latency, and extract token usage information where available.

### 2.1. OpenAI

**Description**: OpenAI offers a suite of powerful models, including the widely popular GPT series. It's known for its strong performance across various tasks.

**API Key Required**: Yes (`OPENAI_API_KEY`). Obtain from [platform.openai.com](https://platform.openai.com/).

**Installation**: `pip install openai` (already included in the initial setup).

In [None]:
from openai import OpenAI
from google.colab import userdata

def query_openai(prompt: str, model: str = "gpt-4o-mini"): # You can try "gpt-4o" if you have access
    """Queries OpenAI's chat completion API."""
    api_key = userdata.get('OPENAI_API_KEY')
    if not api_key:
        return "API Key Missing: Please set OPENAI_API_KEY environment variable.", None, None

    client = OpenAI(api_key=api_key)
    start_time = time.time()
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=MAX_TOKENS,
            temperature=TEMPERATURE,
        )
        latency = time.time() - start_time
        content = response.choices[0].message.content
        tokens_info = {
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens
        }
        return content, latency, tokens_info
    except Exception as e:
        return f"Error querying OpenAI: {e}", None, None

# Example Usage (uncomment to run individually)
print("\n--- OpenAI Test ---")
openai_response, openai_latency, openai_tokens = query_openai(test_prompt)
if openai_response and "Error:" not in openai_response:
    print(f"Response: {openai_response[:200]}...")
    print(f"Latency: {openai_latency:.4f} seconds")
    print(f"Tokens: {openai_tokens}")
else:
    print(openai_response)


--- OpenAI Test ---
Response: Artificial General Intelligence (AGI) refers to a type of artificial intelligence that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparabl...
Latency: 4.3404 seconds
Tokens: {'prompt_tokens': 33, 'completion_tokens': 150, 'total_tokens': 183}


### 2.2. Hugging Face Inference API

**Description**: Hugging Face is an open-source AI community providing a vast hub of pre-trained models. Their Inference API allows you to use many of these models without local setup.

**API Key Required**: Yes (`HF_TOKEN`). Obtain from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).

**Installation**: `pip install huggingface_hub` (already included).

In [None]:
from huggingface_hub import InferenceClient
from google.colab import userdata # Import userdata

def query_huggingface(prompt: str, repo_id: str = "HuggingFaceH4/zephyr-7b-beta"): # Or another suitable text generation model
    """Queries a model via Hugging Face Inference API."""
    # hf_token = os.getenv("HF_TOKEN") # Comment out or remove this line
    hf_token = userdata.get('HF_TOKEN') # Read from Colab secrets
    if not hf_token:
        return "API Token Missing: Please set HF_TOKEN in Colab secrets.", None, None

    # Note: For simple text_generation, token counts might not be directly returned in the response object
    # as consistently as with other providers. For precise token counts, you might need to use a tokenizer
    # on the input/output text or explore `chat_completion` endpoints if the model supports it and returns usage.
    client = InferenceClient(model=repo_id, token=hf_token)
    start_time = time.time()
    try:
        response = client.text_generation(prompt=prompt, max_new_tokens=MAX_TOKENS, temperature=TEMPERATURE)
        latency = time.time() - start_time
        tokens_info = {"info": "Token counts not directly available via simple text_generation, estimate based on length."}
        return response, latency, tokens_info
    except Exception as e:
        return f"Error querying Hugging Face: {e}", None, None

# Example Usage (uncomment to run individually)
print("\n--- Hugging Face Test (Zephyr) ---")
hf_response, hf_latency, hf_tokens = query_huggingface(test_prompt)
if hf_response and "Error:" not in hf_response:
    print(f"Response: {hf_response[:200]}...")
    if hf_latency is not None: # Add check for None
        print(f"Latency: {hf_latency:.4f} seconds")
    if hf_tokens is not None: # Add check for None
        print(f"Tokens: {hf_tokens}")
else:
    print(hf_response)


--- Hugging Face Test (Zephyr) ---
Response:  Use clear and concise language, avoiding technical jargon where possible. Provide examples to illustrate your points....
Latency: 1.5059 seconds
Tokens: {'info': 'Token counts not directly available via simple text_generation, estimate based on length.'}


### 2.3. Groq

**Description**: Groq is a hardware company that has developed a Language Processing Unit (LPU) designed for extremely fast LLM inference. It offers several open-source models with very low latency.

**API Key Required**: Yes (`GROQ_API_KEY`). Obtain from [console.groq.com](https://console.groq.com/).

**Installation**: `pip install groq` (already included).

In [None]:
from groq import Groq
from google.colab import userdata # Import userdata

def query_groq(prompt: str, model: str = "llama3-8b-8192"): # Or "mixtral-8x7b-32768"
    """Queries Groq's API."""
    # api_key = os.getenv("GROQ_API_KEY") # Comment out or remove this line
    api_key = userdata.get('GROQ_API_KEY') # Read from Colab secrets
    if not api_key:
        return "API Key Missing: Please set GROQ_API_KEY in Colab secrets.", None, None

    client = Groq(api_key=api_key)
    start_time = time.time()
    try:
        chat_completion = client.chat.completions.create(
            messages=[
                {"role": "user", "content": prompt}
            ],
            model=model,
            max_tokens=MAX_TOKENS,
            temperature=TEMPERATURE,
        )
        latency = time.time() - start_time
        tokens_info = {
            "prompt_tokens": chat_completion.usage.prompt_tokens,
            "completion_tokens": chat_completion.usage.completion_tokens,
            "total_tokens": chat_completion.usage.total_tokens
        }
        return chat_completion.choices[0].message.content, latency, tokens_info
    except Exception as e:
        return f"Error querying Groq: {e}", None, None

# Example Usage (uncomment to run individually)
print("\n--- Groq Test (Llama 3 8B) ---")
groq_response, groq_latency, groq_tokens = query_groq(test_prompt)
if groq_response and "Error:" not in groq_response:
    print(f"Response: {groq_response[:200]}...")
    print(f"Latency: {groq_latency:.4f} seconds")
    print(f"Tokens: {groq_tokens}")
else:
    print(groq_response)


--- Groq Test (Llama 3 8B) ---
Response: Artificial General Intelligence (AGI) refers to a hypothetical AI system that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks, much like human intelligence...
Latency: 2.3682 seconds
Tokens: {'prompt_tokens': 37, 'completion_tokens': 150, 'total_tokens': 187}


### 2.4. Ollama (Local)

**Description**: Ollama allows you to run large language models locally on your own machine. This is great for privacy, cost control (after initial hardware investment), and development without relying on external APIs.

**API Key Required**: No, but requires a local Ollama server running.

**Setup Steps**:
1.  **Download and Install Ollama**: Visit [ollama.com](https://ollama.com/) and follow the instructions for your OS.
2.  **Pull a Model**: Open your terminal and run, for example, `ollama pull llama3` or `ollama pull mistral`.
3.  **Ensure Ollama Server is Running**: Typically, Ollama starts automatically in the background after installation, but you can explicitly run `ollama serve` in your terminal if you encounter connection issues.

**Installation**: `pip install ollama` (already included).

In [None]:
import ollama

def query_ollama(prompt: str, model: str = "llama3"): # Ensure this model is pulled locally
    """Queries a local Ollama model."""
    start_time = time.time()
    try:
        response = ollama.chat(
            model=model,
            messages=[{'role': 'user', 'content': prompt}],
            options={'num_predict': MAX_TOKENS, 'temperature': TEMPERATURE} # Ollama options for max_tokens, temperature
        )
        latency = time.time() - start_time
        content = response['message']['content']

        # Ollama's Python library response for chat currently doesn't expose
        # precise token usage directly in the same way as OpenAI/Groq for `ollama.chat()`.
        # You might need to use `ollama.generate()` or estimate based on text length for cost analysis.
        tokens_info = {"info": "Token counts not directly available via ollama.chat(). Estimate based on text length or use /api/generate for details."}
        return content, latency, tokens_info
    except Exception as e:
        return f"Error querying Ollama: {e}. Make sure Ollama server is running and model '{model}' is pulled (e.g., `ollama pull {model}`).", None, None

# Example Usage (uncomment to run individually)
print("\n--- Ollama Test (Llama 3) ---")
ollama_response, ollama_latency, ollama_tokens = query_ollama(test_prompt)
if ollama_response and "Error:" not in ollama_response:
    print(f"Response: {ollama_response[:200]}...")
    print(f"Latency: {ollama_latency:.4f} seconds")
    print(f"Tokens: {ollama_tokens}")
else:
    print(ollama_response)


--- Ollama Test (Llama 3) ---
Response: Error querying Ollama: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download. Make sure Ollama server is running and model 'llama3' i...


TypeError: unsupported format string passed to NoneType.__format__

### 2.5. Google Gemini (via Google AI Studio / Vertex AI)

**Description**: Google Gemini is Google's family of multimodal LLMs, offering powerful capabilities for text, image, and other data types.

**API Key Required**: Yes (`GOOGLE_API_KEY`). Obtain from [ai.google.dev/gemini-api/](https://ai.google.dev/gemini-api/).

**Installation**: `pip install google-generativeai` (already included).

In [None]:
import google.generativeai as genai
from google.colab import userdata # Import userdata
import os # Keep os for consistency with other parts of the notebook
import time # Ensure time is imported for latency calculation

# Assume MAX_TOKENS and TEMPERATURE are defined globally or passed as arguments
# For this example, let's define them if not already:
MAX_TOKENS = 1000
TEMPERATURE = 0.7
test_prompt = "Explain the concept of Artificial General Intelligence (AGI) in one paragraph, focusing on its potential impact on society and daily life."


def query_google_gemini(prompt: str, model_name: str = "gemini-2.0-flash-lite"):
    """Queries Google Gemini via Google AI Studio/Vertex AI."""
    api_key = userdata.get('GOOGLE_API_KEY') # Read from Colab secrets
    if not api_key:
        return "API Key Missing: Please set GOOGLE_API_KEY in Colab secrets.", None, None

    genai.configure(api_key=api_key)
    model = genai.GenerativeModel(model_name)

    generation_config = {
        "max_output_tokens": MAX_TOKENS,
        "temperature": TEMPERATURE,
    }

    start_time = time.time()
    try:
        response = model.generate_content(prompt, generation_config=generation_config)
        latency = time.time() - start_time

        content = "N/A - No response generated."
        tokens_info = {}

        if response.candidates:
            # Get the finish reason from the first candidate
            finish_reason = response.candidates[0].finish_reason.name if response.candidates[0].finish_reason else "UNKNOWN"

            if finish_reason == "STOP":
                # Model completed successfully
                content = response.text
                if hasattr(response, 'usage_metadata'):
                    tokens_info = {
                        "prompt_token_count": response.usage_metadata.prompt_token_count,
                        "candidates_token_count": response.usage_metadata.candidates_token_count,
                        "total_token_count": response.usage_metadata.total_token_count
                    }
                else:
                    tokens_info = {"info": "Token usage_metadata not available in response (STOP)."}
            else:
                # Model stopped due to a reason other than normal completion
                content = f"Generation stopped early. Finish reason: {finish_reason}. "
                if response.candidates[0].safety_ratings:
                    safety_issues = [
                        f"{sr.category.name}: {sr.probability.name}"
                        for sr in response.candidates[0].safety_ratings
                        if sr.probability.name != 'NEGLIGIBLE' and sr.probability.name != 'LOW' # Filter out low/negligible
                    ]
                    if safety_issues:
                        content += f"Safety issues detected: {'; '.join(safety_issues)}."
                    else:
                        content += "No specific safety issues indicated despite early stop."
                else:
                    content += "No safety ratings available."

                # Still try to get token info if available, even for early stops
                if hasattr(response, 'usage_metadata'):
                    tokens_info = {
                        "prompt_token_count": response.usage_metadata.prompt_token_count,
                        "candidates_token_count": response.usage_metadata.candidates_token_count,
                        "total_token_count": response.usage_metadata.total_token_count
                    }
                else:
                    tokens_info = {"info": "Token usage_metadata not available in response (EARLY_STOP)."}

        else:
            # No candidates were generated at all (e.g., prompt blocked by safety)
            content = "No candidates generated. Prompt or response likely blocked."
            if hasattr(response, 'prompt_feedback') and response.prompt_feedback.block_reason:
                content += f" Prompt blocked due to: {response.prompt_feedback.block_reason.name}."
            tokens_info = {"info": "No candidates, check prompt_feedback."}

        return content, latency, tokens_info

    except Exception as e:
        return f"Error querying Google Gemini: {e}", None, None

# Example Usage
print("\n--- Google Gemini Test (gemini-2.5-flash) ---")
gemini_response, gemini_latency, gemini_tokens = query_google_gemini(test_prompt)

# Improved printing to handle None values gracefully
if "Error:" not in gemini_response:
    print(f"Response: {gemini_response[:200]}...")
    print(f"Latency: {gemini_latency:.4f} seconds" if gemini_latency is not None else "Latency: N/A")
    print(f"Tokens: {gemini_tokens}")
else:
    print(gemini_response)


--- Google Gemini Test (gemini-2.5-flash) ---
Response: Artificial General Intelligence (AGI) represents a hypothetical level of AI with human-level cognitive abilities, capable of understanding, learning, and applying knowledge across a wide range of task...
Latency: 1.9239 seconds
Tokens: {'prompt_token_count': 25, 'candidates_token_count': 114, 'total_token_count': 139}


### 2.6. Anthropic (Claude)

**Description**: Anthropic's Claude models are known for their safety, helpfulness, and longer context windows, often preferred for tasks requiring careful reasoning.

**API Key Required**: Yes (`ANTHROPIC_API_KEY`). Obtain from [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys).

**Installation**: `pip install anthropic` (already included).

In [None]:
import anthropic
from google.colab import userdata # Import userdata

def query_anthropic_claude(prompt: str, model: str = "claude-sonnet-4-20250514"): # Or "claude-3-sonnet-20240229", "claude-3-haiku-20240307"
    """Queries Anthropic's Claude API."""
    # api_key = os.getenv("ANTHROPIC_API_KEY") # Comment out or remove this line
    api_key = userdata.get('ANTHROPIC_API_KEY') # Read from Colab secrets
    if not api_key:
        return "API Key Missing: Please set ANTHROPIC_API_KEY in Colab secrets.", None, None

    client = anthropic.Anthropic(api_key=api_key)
    start_time = time.time()
    try:
        message = client.messages.create(
            model=model,
            max_tokens=MAX_TOKENS,
            temperature=TEMPERATURE,
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        latency = time.time() - start_time
        tokens_info = {
            "input_tokens": message.usage.input_tokens,
            "output_tokens": message.usage.output_tokens,
            "total_tokens": message.usage.input_tokens + message.usage.output_tokens
        }
        return message.content[0].text, latency, tokens_info
    except Exception as e:
        return f"Error querying Anthropic Claude: {e}", None, None

# Example Usage (uncomment to run individually)
print("\n--- Anthropic Claude Test (Opus) ---")
claude_response, claude_latency, claude_tokens = query_anthropic_claude(test_prompt)
if claude_response and "Error:" not in claude_response:
    print(f"Response: {claude_response[:200]}...")
    print(f"Latency: {claude_latency:.4f} seconds")
    print(f"Tokens: {claude_tokens}")
else:
    print(claude_response)


--- Anthropic Claude Test (Opus) ---
Response: Error querying Anthropic Claude: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Pla...


TypeError: unsupported format string passed to NoneType.__format__

### 2.7. Mistral AI

**Description**: Mistral AI is a French AI company gaining popularity for its powerful and efficient open-source models (like Mixtral) and competitive proprietary models (Mistral Small, Large).

**API Key Required**: Yes (`MISTRAL_API_KEY`). Obtain from [console.mistral.ai](https://console.mistral.ai/).

**Installation**: `pip install mistralai` (already included).

In [None]:
import time
# Keep the import as per your working code
from mistralai import Mistral
# Remove the problematic import: from mistralai.models.chat import ChatMessage
from google.colab import userdata # Import userdata
import os # Keep os for consistency if you also use .env locally

# Assuming MAX_TOKENS, TEMPERATURE, and test_prompt are defined globally or elsewhere
# For this snippet, let's include them for completeness if not already present:
MAX_TOKENS = 150
TEMPERATURE = 0.7
test_prompt = "Explain the concept of Artificial General Intelligence (AGI) in one paragraph, focusing on its potential impact on society and daily life."


def query_mistral(prompt: str, model: str = "mistral-large-latest"):
    """Queries Mistral AI's API."""
    api_key = userdata.get('MISTRAL_API_KEY') # Read from Colab secrets
    if not api_key:
        return "API Key Missing: Please set MISTRAL_API_KEY in Colab secrets.", None, None

    # Initialize the client using the exact syntax from your working code
    client = Mistral(api_key=api_key)

    # Define messages using a dictionary directly, as per your working code
    messages = [
        {
            "role": "user",
            "content": prompt,
        },
    ]

    start_time = time.time()
    try:
        # Use the 'complete' method as per your working code
        chat_response = client.chat.complete(
            model=model,
            messages=messages,
            max_tokens=MAX_TOKENS,
            temperature=TEMPERATURE,
        )
        latency = time.time() - start_time

        # Token usage information should still be available under chat_response.usage
        tokens_info = {
            "prompt_tokens": chat_response.usage.prompt_tokens,
            "completion_tokens": chat_response.usage.completion_tokens,
            "total_tokens": chat_response.usage.total_tokens
        }
        return chat_response.choices[0].message.content, latency, tokens_info
    except Exception as e:
        return f"Error querying Mistral AI: {e}", None, None

# Example Usage
print("\n--- Mistral AI Test (Mistral Large) ---")
mistral_response, mistral_latency, mistral_tokens = query_mistral(test_prompt)

# Robust printing to handle None values gracefully
if mistral_response and "Error:" not in mistral_response:
    print(f"Response: {mistral_response[:200]}...")
    print(f"Latency: {mistral_latency:.4f} seconds" if mistral_latency is not None else "Latency: N/A")
    print(f"Tokens: {mistral_tokens}")
else:
    print(mistral_response)


--- Mistral AI Test (Mistral Large) ---
Response: Artificial General Intelligence (AGI) refers to the hypothetical ability of an intelligent agent to understand, learn, and apply knowledge across a wide range of tasks at a level equal to or even surp...
Latency: 11.4806 seconds
Tokens: {'prompt_tokens': 31, 'completion_tokens': 150, 'total_tokens': 181}


## 3. Aggregated Comparative Analysis

Now, let's run the standardized prompt across all platforms and gather the results in a structured format.

In [None]:
results = {}

print("--- Starting Comparative Queries ---")

print("Querying OpenAI (gpt-4o-mini)...")
results["OpenAI (gpt-4o-mini)"] = query_openai(test_prompt, model="gpt-4o-mini")

print("Querying Hugging Face (Zephyr-7b-beta)...")
results["Hugging Face (zephyr-7b-beta)"] = query_huggingface(test_prompt, repo_id="HuggingFaceH4/zephyr-7b-beta")

print("Querying Groq (Llama 3 8B)...")
results["Groq (llama3-8b-8192)"] = query_groq(test_prompt, model="llama3-8b-8192")

print("Querying Ollama (llama3) - ensure server is running locally...")
results["Ollama (llama3)"] = query_ollama(test_prompt, model="llama3")

print("Querying Google Gemini (gemini-1.5-flash-latest)...") # Updated model name
results["Google Gemini (gemini-1.5-flash-latest)"] = query_google_gemini(test_prompt, model_name="gemini-1.5-flash-latest")

print("Querying Anthropic Claude (claude-3-haiku-20240307)...") # Using Haiku for typically faster response
results["Anthropic Claude (Haiku)"] = query_anthropic_claude(test_prompt, model="claude-3-haiku-20240307")

print("Querying Mistral AI (mistral-small-latest)...") # Using small for typically faster response
results["Mistral AI (mistral-small-latest)"] = query_mistral(test_prompt, model="mistral-small-latest")

print("--- Comparative Queries Complete ---")

--- Starting Comparative Queries ---
Querying OpenAI (gpt-4o-mini)...
Querying Hugging Face (Zephyr-7b-beta)...
Querying Groq (Llama 3 8B)...
Querying Ollama (llama3) - ensure server is running locally...
Querying Google Gemini (gemini-1.5-flash-latest)...
Querying Anthropic Claude (claude-3-haiku-20240307)...
Querying Mistral AI (mistral-small-latest)...
--- Comparative Queries Complete ---


### Displaying Results

We'll organize the collected data into a Pandas DataFrame for easier viewing and comparison.

In [None]:
comparison_data = []

for platform, (response, latency, tokens_info) in results.items():
    response_excerpt = response[:100].replace('\n', ' ') + "..." if response and isinstance(response, str) and "Error:" not in response else "N/A - Error or No Response"
    latency_str = f"{latency:.4f} s" if latency is not None else "N/A"

    # Add checks for tokens_info being None before accessing keys
    if tokens_info is not None:
        input_tokens = tokens_info.get('prompt_tokens', tokens_info.get('input_tokens', 'N/A'))
        output_tokens = tokens_info.get('completion_tokens', tokens_info.get('output_tokens', tokens_info.get('candidates_token_count', 'N/A')))
        total_tokens = tokens_info.get('total_tokens', tokens_info.get('total_token_count', 'N/A'))

        if input_tokens == 'N/A' and 'info' in tokens_info:
            tokens_summary = tokens_info['info']
        else:
            tokens_summary = f"In: {input_tokens}, Out: {output_tokens}, Total: {total_tokens}"
    else:
        tokens_summary = "N/A - Token info not available due to error"
        input_tokens = 'N/A'
        output_tokens = 'N/A'
        total_tokens = 'N/A'


    comparison_data.append({
        "Platform": platform,
        "Latency": latency_str,
        "Input Tokens": input_tokens,
        "Output Tokens": output_tokens,
        "Total Tokens": total_tokens,
        "Tokens Info": tokens_summary,
        "Response Excerpt": response_excerpt,
        "Full Response": response if response and isinstance(response, str) and "Error:" not in response else f"Error: {response}"
    })

df_comparison = pd.DataFrame(comparison_data)
df_comparison = df_comparison.set_index("Platform")

print("\n--- Summary of Results ---")
display(df_comparison[['Latency', 'Tokens Info', 'Response Excerpt']])

print("\n--- Full Responses for Qualitative Analysis ---")
for index, row in df_comparison.iterrows():
    print(f"\n### {index}\n")
    print(f"Latency: {row['Latency']}")
    print(f"Tokens Info: {row['Tokens Info']}")
    print(f"Response:\n{row['Full Response']}")
    print("\n" + "-"*80 + "\n")


--- Summary of Results ---


Unnamed: 0_level_0,Latency,Tokens Info,Response Excerpt
Platform,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
OpenAI (gpt-4o-mini),4.8944 s,"In: 33, Out: 150, Total: 183",Artificial General Intelligence (AGI) refers t...
Hugging Face (zephyr-7b-beta),2.9884 s,Token counts not directly available via simple...,"Use clear and concise language, and provide e..."
Groq (llama3-8b-8192),6.0363 s,"In: 37, Out: 150, Total: 187",Artificial General Intelligence (AGI) refers t...
Ollama (llama3),,N/A - Token info not available due to error,Error querying Ollama: Failed to connect to Ol...
Google Gemini (gemini-1.5-flash-latest),3.3684 s,"In: N/A, Out: 133, Total: 158",Artificial General Intelligence (AGI) refers t...
Anthropic Claude (Haiku),,N/A - Token info not available due to error,Error querying Anthropic Claude: Error code: 4...
Mistral AI (mistral-small-latest),1.5995 s,"In: 30, Out: 150, Total: 180",Artificial General Intelligence (AGI) refers t...



--- Full Responses for Qualitative Analysis ---

### OpenAI (gpt-4o-mini)

Latency: 4.8944 s
Tokens Info: In: 33, Out: 150, Total: 183
Response:
Artificial General Intelligence (AGI) refers to a type of artificial intelligence that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to that of a human being. Unlike narrow AI, which excels in specific tasks, AGI would be capable of reasoning, problem-solving, and adapting to new situations with a degree of flexibility and creativity. The potential impact of AGI on society and daily life could be profound, leading to unprecedented advancements in fields such as healthcare, education, and transportation, while also raising ethical concerns about job displacement, autonomy, and decision-making. As AGI systems could enhance productivity and innovation, their integration into society could transform how we work, interact, and address complex global challenges,

-----------------

## 4. Qualitative Analysis

Beyond quantitative metrics like latency and token count, the *quality* of the generated response is paramount. This section requires your manual review and critical assessment.

**Review the "Full Responses" printed above for each platform and consider the following:**

* **Accuracy**: Is the explanation of AGI correct and factual?
* **Completeness**: Does it cover the key aspects mentioned in the prompt (concept, potential impact)?
* **Coherence & Readability**: Is the language natural, easy to understand, and well-structured?
* **Conciseness**: Does it adhere to the "one paragraph" constraint effectively?
* **Nuance**: Does it acknowledge both positive and negative potential impacts?
* **Bias**: Does the response exhibit any noticeable biases?

**Your Observations (add your notes here):**

**OpenAI (gpt-3.5-turbo)**:
-

**Hugging Face (zephyr-7b-beta)**:
-

**Groq (llama3-8b-8192)**:
-

**Ollama (llama3)**:
-

**Google Gemini (gemini-pro)**:
-

**Anthropic Claude (Haiku)**:
-

**Mistral AI (mistral-small-latest)**:
-

## 5. Cost Analysis

Cost is a critical factor for production applications. While we have retrieved token counts, you will need to refer to each platform's official pricing documentation to calculate the actual cost based on your anticipated usage.

**General Pricing Model**: Most commercial LLM APIs charge per token, often with different rates for input (prompt) tokens and output (completion) tokens. Prices are typically quoted per 1,000 or 1,000,000 tokens.

**Key Steps for Cost Estimation**:

1.  **Retrieve Token Counts**: Use the `Input Tokens` and `Output Tokens` from our results.
2.  **Check Current Pricing**: Visit the official pricing pages for each platform.
    * **OpenAI**: [openai.com/pricing](https://openai.com/pricing)
    * **Hugging Face Inference API**: Pricing depends on the underlying provider; check the specific model's page or Hugging Face paid Inference Endpoints.
    * **Groq**: [groq.com/pricing](https://groq.com/pricing)
    * **Ollama**: Free to use locally (costs are your hardware and electricity).
    * **Google Gemini**: [ai.google.dev/gemini-api/docs/pricing](https://ai.google.dev/gemini-api/docs/pricing)
    * **Anthropic**: [anthropic.com/pricing](https://anthropic.com/pricing)
    * **Mistral AI**: [mistral.ai/pricing](https://mistral.ai/pricing)
3.  **Calculate Estimated Cost**: `(Input Tokens / 1,000,000) * Input Price Per Million + (Output Tokens / 1,000,000) * Output Price Per Million`

**Example Cost Calculation (Hypothetical for Illustration):**

If OpenAI GPT-3.5-turbo costs $0.50 per 1M input tokens and $1.50 per 1M output tokens, and your query used 50 input tokens and 100 output tokens:

Input Cost = $(50 / 1,000,000) * 0.50 = $0.000025
Output Cost = $(100 / 1,000,000) * 1.50 = $0.00015
Total Cost = $0.000175 per query

## 6. Conclusion and Next Steps

This comparative analysis provides a snapshot of different LLM platforms based on a single prompt. Your findings will likely vary based on the specific models chosen, prompt complexity, `max_tokens`, and `temperature` settings.

**Summary of Findings (Your interpretation):**

* Which platform offered the lowest latency for this task?
* Which platform offered the most comprehensive token usage reporting?
* Which platform provided the highest quality response according to your qualitative assessment?
* What are the perceived trade-offs between speed, quality, and cost for these platforms?

**Further Analysis Ideas:**

* **Vary Prompt Length/Complexity**: Test with very short and very long prompts.
* **Specific Tasks**: Run benchmarks for tasks critical to your application (e.g., summarization, code generation, sentiment analysis).
* **Error Handling**: Implement more robust error handling and retry mechanisms.
* **Streaming Responses**: Evaluate time to first token (TTFT) for streaming APIs, which is crucial for real-time user experiences.
* **Concurrency**: Test how each API performs under concurrent requests.
* **Cost Optimization**: Explore strategies like prompt compression, caching, and choosing smaller models where appropriate.