# Unit 2 Assignment: Building a Mixture of Experts (MoE) Router

**Topic:** Advanced Architecture using Groq API  
**Tools:** Python, Groq API, Dotenv

---

## Objective

Build a **Smart Customer Support Router** using a Mixture of Experts (MoE) architecture.

Instead of one generalist AI, we route each query to the most suitable expert:
- **Technical Expert** → Bug reports, code errors
- **Billing Expert** → Refunds, charges, subscriptions
- **General Expert** → Fallback for casual chat

```mermaid
graph TD
    User[User Query] --> Router[Router LLM\ntemperature=0]
    Router -->|technical| Tech[Technical Expert]
    Router -->|billing| Bill[Billing Expert]
    Router -->|general| Gen[General Expert]
    Router -->|tool| Tool[Tool Function\nmock fetch]
    Tech & Bill & Gen & Tool --> Response[Final Answer]
```


## Section 1: Setup and Imports

In [7]:
%pip install groq python-dotenv --upgrade --quiet


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [13]:
import os
import getpass
from groq import Groq
from dotenv import load_dotenv

load_dotenv()

if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API Key: ")

client = Groq(api_key=os.environ["GROQ_API_KEY"])

# mixtral-8x7b-32768 was decommissioned — using the recommended replacement
MODEL = "llama-3.3-70b-versatile"

print("Groq client initialized successfully.")
print(f"Model: {MODEL}")


Groq client initialized successfully.
Model: llama-3.3-70b-versatile


## Section 2: Define Expert Configurations (`MODEL_CONFIG`)

Each expert is a **different system prompt** on the same base model (`laama`).  
This is exactly how production MoE systems simulate specialisation without fine-tuning.


In [14]:
MODEL_CONFIG = {
    "technical": {
        "model": MODEL,
        "system_prompt": (
            "You are a Senior Software Engineer and debugger. "
            "You are rigorous, precise, and code-focused. "
            "When given a bug report or technical question, diagnose the root cause, "
            "explain it clearly, and provide a corrected code snippet where applicable. "
            "Always mention the programming concept involved (e.g., off-by-one error, "
            "null pointer, type mismatch)."
        ),
    },
    "billing": {
        "model": MODEL,
        "system_prompt": (
            "You are an empathetic and professional Billing Support Specialist. "
            "You handle refund requests, duplicate charges, and subscription issues. "
            "Always acknowledge the customer's frustration first, then explain the "
            "company's refund policy clearly, and outline the exact steps the customer "
            "should follow to resolve the issue. Be concise and reassuring."
        ),
    },
    "general": {
        "model": MODEL,
        "system_prompt": (
            "You are a friendly and helpful general-purpose customer support assistant. "
            "Answer casual questions politely and concisely. If the query seems to be "
            "technical or billing-related, gently let the user know you can route them "
            "to the right specialist."
        ),
    },
}

print("MODEL_CONFIG defined with experts:", list(MODEL_CONFIG.keys()))


MODEL_CONFIG defined with experts: ['technical', 'billing', 'general']


## Section 3: The Router Function

`route_prompt(user_input)` is the **brain** of the MoE system.  
It uses `temperature=0` for deterministic, consistent classification.  
The strict prompt forces the model to return **only** the category name.


In [15]:
VALID_CATEGORIES = ["technical", "billing", "general", "tool"]

def route_prompt(user_input: str) -> str:
    """
    Classifies the user's query into one of: technical, billing, general, tool.
    Uses temperature=0 for deterministic output.
    Returns ONLY the lowercase category string.
    """
    routing_prompt = (
        "Classify the following customer support message into exactly one of these "
        "categories: [technical, billing, general, tool].\n\n"
        "Rules:\n"
        "- 'technical': bug reports, code errors, software/hardware problems.\n"
        "- 'billing': charges, refunds, subscriptions, payments.\n"
        "- 'tool': requests for live/current data such as crypto prices, stock prices, "
        "weather, exchange rates.\n"
        "- 'general': everything else.\n\n"
        "Return ONLY the single category word in lowercase. No punctuation, no explanation.\n\n"
        f"Message: {user_input}"
    )

    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": routing_prompt}],
        temperature=0,
        max_tokens=10,
    )

    category = response.choices[0].message.content.strip().lower()

    # Safety fallback: if the model returns something unexpected, default to 'general'
    if category not in VALID_CATEGORIES:
        category = "general"

    return category


print("route_prompt() defined.")


route_prompt() defined.


## Section 4: The Orchestrator Function

`process_request(user_input)` ties everything together:
1. Routes the query using `route_prompt()`
2. Selects the matching system prompt from `MODEL_CONFIG`
3. Calls the expert LLM with `temperature=0.7` 
4. Returns the final response


In [16]:
def process_request(user_input: str) -> str:
    """
    Main orchestrator for the MoE Customer Support Router.
    1. Routes the query to the correct category.
    2. Dispatches to the matching expert (or tool function).
    3. Returns the expert's response.
    """
    # Step 1: Classify the intent
    category = route_prompt(user_input)
    print(f"  [Router] -> Category: '{category}'")

    # Step 2: Tool intercept (bonus) — handled before LLM call
    if category == "tool":
        return _handle_tool_request(user_input)

    # Step 3: Retrieve expert config
    expert = MODEL_CONFIG[category]
    system_prompt = expert["system_prompt"]
    model = expert["model"]

    # Step 4: Call the expert LLM
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": user_input},
        ],
        temperature=0.7,
    )

    answer = response.choices[0].message.content.strip()
    return answer


print("process_request() defined.")


process_request() defined.


## Section 5: Test the MoE System

Three test queries to validate that the router correctly dispatches to each expert.


In [None]:
test_queries = [
    # Expected: technical
    "My Python script is throwing an IndexError on line 5. Here is the code: my_list = [1,2,3]; print(my_list[5])",
    # Expected: billing
    "I was charged twice for my subscription this month. I need a refund immediately.",
    # Expected: general
    "Hi there! What are your support hours?",
]
for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*60}")
    print(f"Test {i}: {query}")
    print("-" * 60)
    response = process_request(query)
    print(f"  [Expert Response]\n{response}")
    print("=" * 60)



Test 1: My Python script is throwing an IndexError on line 5. Here is the code: my_list = [1,2,3]; print(my_list[5])
------------------------------------------------------------
  [Router] -> Category: 'technical'
  [Expert Response]
**IndexError Diagnosis**

The issue in your code is an **off-by-one error**, combined with an **out-of-bounds error**. In Python, list indices start at 0 and end at `len(list) - 1`. 

In your case, `my_list` has 3 elements, so the valid indices are 0, 1, and 2. When you try to access `my_list[5]`, you're attempting to access an index that doesn't exist, resulting in an `IndexError`.

**Corrected Code**

To fix this issue, you should ensure that the index you're trying to access is within the bounds of the list. Here's an example of how to do this:

```python
my_list = [1, 2, 3]

# Check if the index is within bounds before accessing
index = 5
if index < len(my_list):
    print(my_list[index])
else:
    print(f"Index {index} is out of bounds for list of le

---

## Section 6 (Bonus): Tool Use Expert for Live Data

For queries like *"What is the current price of Bitcoin?"*, instead of hallucinating a number, we route to a **Tool Function** that (mock) fetches real-time data.

This is the foundation of **Function Calling / Tool Use** in production LLM systems.


In [18]:
import re

# --- Mock Tool: Crypto Price Fetcher ---
MOCK_PRICES = {
    "bitcoin":  "$94,230.15",
    "ethereum": "$3,412.88",
    "solana":   "$187.42",
    "dogecoin": "$0.1823",
}

def fetch_crypto_price(coin: str) -> str:
    """Mock function that 'fetches' the current crypto price."""
    coin = coin.lower().strip()
    price = MOCK_PRICES.get(coin, "price unavailable (coin not in mock database)")
    return f"[TOOL] Current price of {coin.capitalize()}: {price}  (mock data)"


def _handle_tool_request(user_input: str) -> str:
    """
    Parses the user input to extract the coin name,
    then calls the mock fetch function.
    """
    known_coins = list(MOCK_PRICES.keys())
    user_lower = user_input.lower()

    for coin in known_coins:
        if coin in user_lower:
            return fetch_crypto_price(coin)

    # Fallback: try to extract ANY word after "price of" / "cost of"
    match = re.search(r"price of (\w+)|cost of (\w+)", user_lower)
    if match:
        coin = (match.group(1) or match.group(2)).strip()
        return fetch_crypto_price(coin)

    return "[TOOL] Could not identify the asset. Please specify a coin name (e.g., Bitcoin, Ethereum)."


# Add 'tool' entry to MODEL_CONFIG (for documentation purposes — dispatch is in orchestrator)
MODEL_CONFIG["tool"] = {
    "model": "tool_function",
    "system_prompt": "Routes to a live data fetching tool instead of an LLM.",
}

print("Tool use expert registered.")


Tool use expert registered.


In [19]:
tool_queries = [
    "What is the current price of Bitcoin?",
    "How much does Ethereum cost right now?",
    "Tell me the price of Solana today.",
]

for query in tool_queries:
    print(f"\nQuery : {query}")
    response = process_request(query)
    print(f"Result: {response}")



Query : What is the current price of Bitcoin?
  [Router] -> Category: 'tool'
Result: [TOOL] Current price of Bitcoin: $94,230.15  (mock data)

Query : How much does Ethereum cost right now?
  [Router] -> Category: 'tool'
Result: [TOOL] Current price of Ethereum: $3,412.88  (mock data)

Query : Tell me the price of Solana today.
  [Router] -> Category: 'tool'
Result: [TOOL] Current price of Solana: $187.42  (mock data)


---

## Summary

| Component | Role | Key Parameter |
|-----------|------|---------------|
| **Router** (`route_prompt`) | Classifies query intent | `temperature=0` — deterministic |
| **Technical Expert** | Debugs code errors | `temperature=0.7` — precise |
| **Billing Expert** | Handles refunds/charges | `temperature=0.7` — empathetic |
| **General Expert** | Fallback for casual chat | `temperature=0.7` — friendly |
| **Tool Expert** | Fetches live data (mock) | No LLM call — direct function |

**Key Insight:** MoE doesn't require multiple models. Specialisation is achieved through **System Prompt engineering** — a zero-cost technique.
