# Lumentra Voice AI - Groq Native Tool Calling Testing

Test Groq LLM with native tool calling for voice AI applications.

**Current Production Stack:**
- SignalWire: Telephony (SIP/WebRTC)
- Deepgram: Speech-to-Text (Nova-2)
- Groq: LLM with Native Tool Calling
- Cartesia: Text-to-Speech (Sonic)

**Available Groq Models:**
1. `llama-3.1-8b-instant` - Fast, balanced (default)
2. `gpt-oss-20b-128k` - Extended context
3. `qwen3-32b-131k` - Complex reasoning

In [None]:
# Install Groq SDK
!pip install -q groq python-dotenv

import os
from groq import Groq
import json
import time

# Set your Groq API key
# Option 1: Use Colab secrets
try:
    from google.colab import userdata
    GROQ_API_KEY = userdata.get('GROQ_API_KEY')
    print("Using GROQ_API_KEY from Colab secrets")
except:
    # Option 2: Set directly
    GROQ_API_KEY = os.environ.get('GROQ_API_KEY', 'YOUR_KEY_HERE')
    print("Using GROQ_API_KEY from environment")

client = Groq(api_key=GROQ_API_KEY)
print("Groq client initialized")

## 1. Tool Definitions

Define tools for the voice agent using Groq's native tool calling format.

In [None]:
# Voice Agent Tool Definitions
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "check_availability",
            "description": "Check available appointment slots for a specific date",
            "parameters": {
                "type": "object",
                "properties": {
                    "date": {
                        "type": "string",
                        "description": "Date in YYYY-MM-DD format"
                    },
                    "service_type": {
                        "type": "string",
                        "description": "Type of service (optional)"
                    }
                },
                "required": ["date"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "create_booking",
            "description": "Create a new booking for the customer",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_name": {"type": "string", "description": "Customer's full name"},
                    "customer_phone": {"type": "string", "description": "Customer's phone number"},
                    "date": {"type": "string", "description": "Booking date (YYYY-MM-DD)"},
                    "time": {"type": "string", "description": "Booking time (HH:MM)"},
                    "service_type": {"type": "string", "description": "Type of service"}
                },
                "required": ["customer_name", "customer_phone", "date", "time"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_contact",
            "description": "Look up a contact by phone number or name",
            "parameters": {
                "type": "object",
                "properties": {
                    "phone": {"type": "string", "description": "Phone number to search"},
                    "name": {"type": "string", "description": "Name to search"}
                },
                "required": []
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "transfer_to_human",
            "description": "Transfer the call to a human staff member",
            "parameters": {
                "type": "object",
                "properties": {
                    "reason": {"type": "string", "description": "Reason for transfer"}
                },
                "required": ["reason"]
            }
        }
    }
]

print(f"Defined {len(TOOLS)} tools for voice agent")

## 2. Groq Chat with Native Tool Calling

In [None]:
SYSTEM_PROMPT = """You are Luna, the AI voice assistant for Stellar Auto Service.

You help callers with:
- Booking service appointments
- Checking availability
- Looking up customer records
- General inquiries about services

Keep responses concise and natural for voice conversation.
Use tools when appropriate to fulfill customer requests."""

def chat_with_groq(
    user_message: str,
    history: list = None,
    model: str = "llama-3.1-8b-instant"
) -> dict:
    """Chat with Groq using native tool calling."""
    start = time.time()
    
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    
    if history:
        messages.extend(history)
    
    messages.append({"role": "user", "content": user_message})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        tools=TOOLS,
        tool_choice="auto",
        max_tokens=500,
        temperature=0.7
    )
    
    latency_ms = (time.time() - start) * 1000
    choice = response.choices[0]
    
    result = {
        "model": model,
        "latency_ms": latency_ms,
        "usage": {
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens
        }
    }
    
    if choice.message.tool_calls:
        tool_call = choice.message.tool_calls[0]
        result["action"] = "tool_call"
        result["tool"] = tool_call.function.name
        result["arguments"] = json.loads(tool_call.function.arguments)
        result["text"] = None
    else:
        result["action"] = "response"
        result["tool"] = None
        result["arguments"] = None
        result["text"] = choice.message.content
    
    return result

print("chat_with_groq function defined")

## 3. Test Tool Calling

In [None]:
# Test messages that should trigger different behaviors
test_messages = [
    "Hi there!",  # Should get conversational response
    "What times are available tomorrow?",  # Should call check_availability
    "I'd like to book an oil change for tomorrow at 2pm, my name is John Smith",  # Should call create_booking
    "Can you look up my account? My number is 555-123-4567",  # Should call get_contact
    "I need to speak to a manager about a complaint",  # Should call transfer_to_human
]

print("Testing Groq Native Tool Calling")
print("=" * 60)

for msg in test_messages:
    result = chat_with_groq(msg)
    print(f"\nInput: {msg}")
    print(f"Action: {result['action']}")
    if result['tool']:
        print(f"Tool: {result['tool']}")
        print(f"Args: {json.dumps(result['arguments'], indent=2)}")
    else:
        print(f"Response: {result['text'][:100]}..." if len(result.get('text', '') or '') > 100 else f"Response: {result.get('text')}")
    print(f"Latency: {result['latency_ms']:.1f}ms")
    print(f"Tokens: {result['usage']['prompt_tokens']}+{result['usage']['completion_tokens']}")

## 4. Model Comparison

In [None]:
# Compare different Groq models
MODELS = [
    "llama-3.1-8b-instant",
    # "gpt-oss-20b-128k",  # Uncomment if available
    # "qwen3-32b-131k",    # Uncomment if available
]

test_msg = "I want to book an appointment for tomorrow at 3pm for brake service. My name is Sarah."

print("Model Comparison")
print("=" * 60)

for model in MODELS:
    try:
        result = chat_with_groq(test_msg, model=model)
        print(f"\nModel: {model}")
        print(f"Action: {result['action']}")
        if result['tool']:
            print(f"Tool: {result['tool']}")
            print(f"Args: {result['arguments']}")
        print(f"Latency: {result['latency_ms']:.1f}ms")
        print(f"Tokens: {result['usage']['prompt_tokens']}+{result['usage']['completion_tokens']}")
    except Exception as e:
        print(f"\nModel: {model}")
        print(f"Error: {e}")

## 5. Complete Pipeline with Tool Execution

In [None]:
def execute_tool(tool_name: str, arguments: dict) -> dict:
    """Simulate tool execution (in production, these call actual APIs)."""
    if tool_name == "check_availability":
        return {
            "available": True,
            "slots": ["9:00 AM", "11:00 AM", "2:00 PM", "4:00 PM"],
            "date": arguments.get("date", "tomorrow")
        }
    elif tool_name == "create_booking":
        return {
            "success": True,
            "confirmation_code": "SA-2024-12345",
            "details": arguments
        }
    elif tool_name == "get_contact":
        return {
            "found": True,
            "name": "John Smith",
            "tier": "VIP",
            "last_visit": "2024-01-15"
        }
    elif tool_name == "transfer_to_human":
        return {
            "transferred": True,
            "department": "Customer Service"
        }
    return {"error": "Unknown tool"}

def voice_pipeline(user_message: str, history: list = None) -> dict:
    """Complete voice AI pipeline with Groq native tool calling."""
    total_start = time.time()
    
    # Step 1: Get LLM response with potential tool call
    llm_result = chat_with_groq(user_message, history)
    
    if llm_result["action"] == "tool_call":
        # Step 2: Execute the tool
        tool_result = execute_tool(llm_result["tool"], llm_result["arguments"])
        
        # Step 3: Get final response with tool result
        tool_response_msg = f"Tool {llm_result['tool']} returned: {json.dumps(tool_result)}"
        
        final_history = (history or []) + [
            {"role": "user", "content": user_message},
            {"role": "assistant", "content": f"[Calling {llm_result['tool']}]"},
            {"role": "user", "content": tool_response_msg}
        ]
        
        final_result = chat_with_groq(
            "Please provide a natural response to the customer based on the tool result.",
            final_history
        )
        
        total_latency = (time.time() - total_start) * 1000
        
        return {
            "text": final_result.get("text", "Your request has been processed."),
            "tool_called": llm_result["tool"],
            "tool_args": llm_result["arguments"],
            "tool_result": tool_result,
            "llm_latency_ms": llm_result["latency_ms"],
            "total_latency_ms": total_latency
        }
    else:
        total_latency = (time.time() - total_start) * 1000
        return {
            "text": llm_result["text"],
            "tool_called": None,
            "tool_args": None,
            "tool_result": None,
            "llm_latency_ms": llm_result["latency_ms"],
            "total_latency_ms": total_latency
        }

# Test complete pipeline
print("Complete Voice Pipeline Test")
print("=" * 60)

pipeline_tests = [
    "Hello, I need some help",
    "Do you have anything available tomorrow morning?",
    "Great, book me for 11am. Name is Mike Jones, number 555-9876",
    "Can you transfer me to someone about my warranty?"
]

history = []
for msg in pipeline_tests:
    result = voice_pipeline(msg, history)
    print(f"\nUser: {msg}")
    print(f"Luna: {result['text']}")
    if result['tool_called']:
        print(f"[Tool: {result['tool_called']} -> {result['tool_result']}]")
    print(f"Latency: {result['total_latency_ms']:.1f}ms")
    
    # Update history
    history.append({"role": "user", "content": msg})
    history.append({"role": "assistant", "content": result['text']})

## 6. Latency Benchmark

In [None]:
import random

benchmark_messages = [
    "Hi",
    "Hello there",
    "I want to book an appointment",
    "What times are available?",
    "Do you have anything tomorrow?",
    "Can I book for 2pm?",
    "What are your hours?",
    "How much does an oil change cost?",
    "I need to reschedule",
    "Thanks, bye!",
] * 5  # 50 messages

random.shuffle(benchmark_messages)

print(f"Running benchmark with {len(benchmark_messages)} messages...")
print("=" * 60)

latencies = []
tool_calls = 0

for i, msg in enumerate(benchmark_messages):
    result = chat_with_groq(msg)
    latencies.append(result['latency_ms'])
    if result['tool']:
        tool_calls += 1
    
    if (i + 1) % 10 == 0:
        print(f"Processed {i + 1}/{len(benchmark_messages)}")

print("\n" + "=" * 60)
print("BENCHMARK RESULTS (Groq API - llama-3.1-8b-instant)")
print("=" * 60)
print(f"Total messages: {len(benchmark_messages)}")
print(f"Tool calls: {tool_calls} ({100*tool_calls/len(benchmark_messages):.1f}%)")
print(f"\nLatency Statistics:")
print(f"  Mean: {sum(latencies)/len(latencies):.1f}ms")
print(f"  Min: {min(latencies):.1f}ms")
print(f"  Max: {max(latencies):.1f}ms")
print(f"  P50: {sorted(latencies)[len(latencies)//2]:.1f}ms")
print(f"  P95: {sorted(latencies)[int(len(latencies)*0.95)]:.1f}ms")

## Summary

**Production Voice AI Stack:**

| Component | Service | Latency |
|-----------|---------|--------|
| Telephony | SignalWire | ~50ms |
| STT | Deepgram Nova-2 | ~100ms |
| LLM | Groq (llama-3.1-8b) | ~200ms |
| TTS | Cartesia Sonic | ~100ms |

**Total End-to-End: ~450ms**

```
User Speech -> SignalWire -> Deepgram STT -> Groq LLM (Native Tools)
                                                  |
                                    +-------------+-------------+
                                    |             |             |
                              Direct Response  Tool Call    Transfer
                                    |             |             |
                                    +------+------+             |
                                           |                    |
                                    Cartesia TTS <- - - - - - - +
                                           |
                                    SignalWire -> User
```

**Advantages over FunctionGemma approach:**
- No local GPU required
- Single LLM call for routing + response
- Better context understanding
- Simpler architecture
- Native tool calling support