# Week 2 Exercise – Technical Q&A Prototype (iyanuashiri)

Full prototype of the Week 1 technical question/answerer with:
- **Gradio UI** – text input, model selector, expertise selector
- **Streaming** – answers stream in as they’re generated
- **System prompt for expertise** – dropdown to change assistant focus (technical tutor, Python, etc.)
- **Switch between models** – GPT (OpenRouter) and Llama (Ollama)

## Imports and environment

In [21]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr

load_dotenv(override=True)
try:
    from decouple import config
    OPEN_ROUTER_API_KEY = config("OPEN_ROUTER_API_KEY")
except Exception:
    OPEN_ROUTER_API_KEY = os.getenv("OPEN_ROUTER_API_KEY")




MODEL_GPT = "gpt-4o-mini"
MODEL_LLAMA = "llama3.2"
OPEN_ROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OLLAMA_BASE_URL = "http://localhost:11434/v1"

# OpenRouter: OpenAI library with OpenRouter's compatible API (base_url + api_key)
openrouter = OpenAI(base_url=OPEN_ROUTER_BASE_URL, api_key=OPEN_ROUTER_API_KEY)

# Ollama: same OpenAI library, local endpoint
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key="ollama")

## Expertise options (system prompt)

System prompt variants so the assistant can focus on different areas.

In [23]:
EXPERTISE_OPTIONS = {
    "Technical tutor": "You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs.",
    "Python & data science": "You are an expert in Python and data science. Explain concepts clearly with examples where helpful.",
    "Software engineering": "You are a software engineering expert. Focus on design, best practices, and clean code.",
    "LLMs & AI": "You are an expert in large language models and AI engineering. Explain APIs, prompting, and tooling.",
}

def messages_for(question: str, expertise_key: str = "Technical tutor"):
    """Build message list for the LLM with chosen expertise (system prompt)."""
    system_prompt = EXPERTISE_OPTIONS.get(expertise_key, EXPERTISE_OPTIONS["Technical tutor"])
    user_prompt = "Please give a detailed explanation to the following question: " + question
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ]


def explain(question: str, model: str = "gpt", expertise_key: str = "Technical tutor") -> str:
    """Get a full explanation from the chosen model. model is 'gpt' or 'llama'."""
    messages = messages_for(question, expertise_key)
    if model == "gpt":
        response = openrouter.chat.completions.create(model=MODEL_GPT, messages=messages)
        return response.choices[0].message.content or ""
    elif model == "llama":
        response = ollama.chat.completions.create(model=MODEL_LLAMA, messages=messages)
        return response.choices[0].message.content or ""
    else:
        raise ValueError(f"Unknown model: {model}. Use 'gpt' or 'llama'.")


def explain_streaming(question: str, model: str = "gpt", expertise_key: str = "Technical tutor"):
    """Stream explanation chunks. For GPT streams tokens; for Llama yields full reply once."""
    messages = messages_for(question, expertise_key)
    if model != "gpt":
        full = explain(question, model, expertise_key)
        yield full
        return
    stream = openrouter.chat.completions.create(model=MODEL_GPT, messages=messages, stream=True)
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        if delta:
            yield delta

## Gradio UI with streaming

Generator that yields accumulated response so the answer streams into the UI. Model and expertise are selected in the interface.

In [24]:
def gradio_stream(question: str, model: str, expertise_key: str):
    """Stream response for Gradio: yields accumulated text so the UI updates as tokens arrive."""
    if not (question or "").strip():
        yield "Please enter a technical question."
        return
    if not OPEN_ROUTER_API_KEY and model == "GPT":
        yield "OpenRouter API key not set. Add OPEN_ROUTER_API_KEY to .env or use Llama."
        return
    model_key = "gpt" if model == "GPT" else "llama"
    response = ""
    for chunk in explain_streaming(question, model=model_key, expertise_key=expertise_key):
        response += chunk
        yield response

In [25]:
with gr.Blocks(title="Technical Q&A", theme=gr.themes.Soft()) as demo:
    gr.Markdown("## Technical question answerer\nAsk a question; choose model and expertise. Response streams below.")
    with gr.Row():
        question = gr.Textbox(
            label="Technical question",
            placeholder="e.g. Explain what yield from does in Python...",
            lines=3,
        )
    with gr.Row():
        model = gr.Dropdown(
            choices=["GPT", "Llama"],
            value="GPT",
            label="Model",
        )
        expertise_key = gr.Dropdown(
            choices=list(EXPERTISE_OPTIONS.keys()),
            value="Technical tutor",
            label="Expertise (system prompt)",
        )
    submit_btn = gr.Button("Get answer")
    answer = gr.Markdown(label="Answer")

    submit_btn.click(
        fn=gradio_stream,
        inputs=[question, model, expertise_key],
        outputs=answer,
        queue=True,
    )

demo.launch()

* Running on local URL:  http://127.0.0.1:7866
* To create a public link, set `share=True` in `launch()`.


