# Minimal OpenRouter LLM Demo

This notebook shows **the absolute basics** of calling an LLM via the [OpenRouter](https://openrouter.ai) API using Python.

**What you'll need**
1. A file named `.env` in the same folder as this notebook with a line like:
   ```bash
   OPENROUTER_API_KEY=sk-or-v1_...
   ```
2. An internet connection (calls go to `https://openrouter.ai/api/v1`).

We'll keep things minimal: install deps, load the API key from `.env`, make a single non-streaming request, then a streaming request. We also show how to change models and parameters.


In [29]:
# (1) Install minimal dependencies (uncomment to run in a fresh environment)
%pip install requests python-dotenv --quiet

Note: you may need to restart the kernel to use updated packages.


In [30]:
# (2) Load your API key from .env and set up headers
import os, requests
from dotenv import load_dotenv
load_dotenv()

API_KEY = os.getenv("OPENROUTER_API_KEY")
assert API_KEY, "Missing OPENROUTER_API_KEY in your .env file"

BASE_URL = "https://openrouter.ai/api/v1/chat/completions"

# Optional headers help attribute your app in OpenRouter's dashboards
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    # Optional but recommended (replace with your course URL or localhost)
    "HTTP-Referer": "http://localhost",  # shown in OpenRouter rankings
    "X-Title": "ML Class LLM Demo",      # a friendly app title
}

## Non‑streaming: single request → single JSON response
This mirrors the standard OpenAI-style Chat Completions interface. Change the `model` to any model you have access to in OpenRouter (e.g., `openai/gpt-4o-mini`, `anthropic/claude-3.5-sonnet`, `google/gemini-1.5-pro`).

In [31]:
payload = {
    "model": "openai/gpt-4o-mini",  # pick any model from https://openrouter.ai/models
    "messages": [
        {"role": "system", "content": "You are a concise teaching assistant."},
        {"role": "user", "content": "Explain gradient descent in one sentence."},
    ],
    # Optional knobs (common across providers via OpenRouter):
    "temperature": 0.7,
    "max_tokens": 256,
}

resp = requests.post(BASE_URL, headers=HEADERS, json=payload, timeout=60)
resp.raise_for_status()
data = resp.json()
text = data["choices"][0]["message"]["content"]
print(text)

# (Optional) usage metadata
data.get("usage", {})

Gradient descent is an optimization algorithm that iteratively adjusts parameters in the direction of the steepest decrease of a loss function to minimize it.


{'prompt_tokens': 25,
 'completion_tokens': 27,
 'total_tokens': 52,
 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0},
 'completion_tokens_details': {'reasoning_tokens': 0}}

## Minimal helper function
A tiny helper to keep classroom examples tidy.

In [32]:
def chat(prompt, model="openai/gpt-4o-mini", system="You are helpful and concise.", **params):
    payload = {
        "model": model,
        "messages": [
            {"role": "system", "content": system},
            {"role": "user", "content": prompt},
        ],
    }
    payload.update(params)
    r = requests.post(BASE_URL, headers=HEADERS, json=payload, timeout=60)
    r.raise_for_status()
    out = r.json()
    return out["choices"][0]["message"]["content"].strip()

print(chat("Name three common activation functions."))

Three common activation functions are:

1. **Sigmoid:** \( f(x) = \frac{1}{1 + e^{-x}} \) - Outputs values between 0 and 1.

2. **ReLU (Rectified Linear Unit):** \( f(x) = \max(0, x) \) - Outputs zero for negative values and the input value for positive values.

3. **Tanh (Hyperbolic Tangent):** \( f(x) = \tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}} \) - Outputs values between -1 and 1.


## Streaming (SSE): print tokens as they arrive
Set `stream=True` to receive an SSE stream and print chunks as they come in. Ignore comment heartbeats that start with `:`.

In [33]:
import json, sys

stream_payload = {
    "model": "openai/gpt-4o-mini",
    "messages": [
        {"role": "system", "content": "You are a concise teaching assistant."},
        {"role": "user", "content": "In two bullet points, what is overfitting?"},
    ],
    "stream": True,
    "temperature": 0.2,
}

with requests.post(BASE_URL, headers=HEADERS, json=stream_payload, stream=True, timeout=300) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        # SSE comment/heartbeat lines start with ':' — ignore them
        if line.startswith(b":"):
            continue
        if line.startswith(b"data: "):
            chunk = line[len(b"data: "):].decode("utf-8")
            if chunk.strip() == "[DONE]":
                break
            event = json.loads(chunk)
            delta = event["choices"][0].get("delta", {})
            content = delta.get("content")
            if content:
                print(content, end="", flush=True)
print()

- Overfitting occurs when a machine learning model learns the training data too well, capturing noise and outliers rather than the underlying pattern, leading to poor generalization on unseen data.  
- It typically results in high accuracy on the training set but significantly lower accuracy on the validation or test set.


## Switching models & parameters
Swap the `model` string for any model you have access to (see [openrouter.ai/models](https://openrouter.ai/models)). Common knobs like `temperature`, `max_tokens`, `top_p`, and `frequency_penalty` are supported.

> Tip: If a model doesn't support a certain parameter, OpenRouter simply ignores it rather than failing.

In [34]:
print(chat(
    "Write a 1‑sentence analogy for convolution in CNNs.",
    model="anthropic/claude-3.5-sonnet",  # try another provider/model
    temperature=0.5,
))

Convolution in CNNs is like sliding a flashlight (filter) across a dark image, illuminating different patterns one section at a time to detect specific features.


---
**That’s it — minimal and classroom‑friendly.**

**Safety note**: Don’t commit your `.env` file to version control. In a shared environment, consider using per‑student keys or a small proxy service so keys aren’t exposed on client machines.


## Applied example: LLM-assisted cohort interpretation

In real workflows we often blend numeric outputs with language. Below we build a tiny synthetic cohort, summarize it with Python, then hand the summary to the LLM for a clinical-style briefing. This shows a pattern you can reuse: let code do the number crunching while the model communicates findings for humans.


In [35]:
import json
from statistics import mean, pstdev

cohort = [
    {"id": "P01", "age": 68, "bmi": 32.4, "systolic_bp": 148, "hr": 92, "hba1c": 8.4},
    {"id": "P02", "age": 54, "bmi": 27.1, "systolic_bp": 134, "hr": 78, "hba1c": 6.4},
    {"id": "P03", "age": 61, "bmi": 29.8, "systolic_bp": 142, "hr": 101, "hba1c": 7.5},
    {"id": "P04", "age": 47, "bmi": 24.3, "systolic_bp": 126, "hr": 73, "hba1c": 5.8},
    {"id": "P05", "age": 72, "bmi": 35.6, "systolic_bp": 155, "hr": 88, "hba1c": 8.9},
    {"id": "P06", "age": 59, "bmi": 30.2, "systolic_bp": 138, "hr": 96, "hba1c": 7.1},
]

def summarize(field):
    values = [p[field] for p in cohort]
    return {
        "mean": round(mean(values), 1),
        "stdev": round(pstdev(values), 1),
        "min": min(values),
        "max": max(values),
    }

summary = {
    "n_patients": len(cohort),
    "metrics": {field: summarize(field) for field in ["age", "bmi", "systolic_bp", "hr", "hba1c"]},
    "flags": {
        "hypertensive_ids": [p["id"] for p in cohort if p["systolic_bp"] >= 140],
        "tachycardic_ids": [p["id"] for p in cohort if p["hr"] >= 100],
        "poor_glycemic_control_ids": [p["id"] for p in cohort if p["hba1c"] >= 7.0],
    },
}

print(json.dumps(summary, indent=2))


{
  "n_patients": 6,
  "metrics": {
    "age": {
      "mean": 60.2,
      "stdev": 8.3,
      "min": 47,
      "max": 72
    },
    "bmi": {
      "mean": 29.9,
      "stdev": 3.6,
      "min": 24.3,
      "max": 35.6
    },
    "systolic_bp": {
      "mean": 140.5,
      "stdev": 9.4,
      "min": 126,
      "max": 155
    },
    "hr": {
      "mean": 88,
      "stdev": 9.8,
      "min": 73,
      "max": 101
    },
    "hba1c": {
      "mean": 7.3,
      "stdev": 1.1,
      "min": 5.8,
      "max": 8.9
    }
  },
  "flags": {
    "hypertensive_ids": [
      "P01",
      "P03",
      "P05"
    ],
    "tachycardic_ids": [
      "P03"
    ],
    "poor_glycemic_control_ids": [
      "P01",
      "P03",
      "P05",
      "P06"
    ]
  }
}


In [36]:
analysis_prompt = f"""You are a biomedical machine-learning assistant. Translate the cohort summary into guidance for a clinical data science team.
Return JSON with three keys:
  - patient_takeaways: bullet-point insights about the cohort.
  - model_watchouts: risks or biases to monitor if we train models on this cohort.
  - next_actions: suggested follow-up analyses or data you would request.

Cohort summary (JSON):
{json.dumps(summary, indent=2)}
"""

raw_reply = chat(
    analysis_prompt,
    model="openai/gpt-4o-mini",
    temperature=0.3,
    max_tokens=400,
)

try:
    briefing = json.loads(raw_reply)
except json.JSONDecodeError:
    print("Model reply (not valid JSON):")
    print(raw_reply)
else:
    for section in ("patient_takeaways", "model_watchouts", "next_actions"):
        print(section.replace("_", " ").title() + ":")
        for bullet in briefing.get(section, []):
            print(" -", bullet)


Model reply (not valid JSON):
```json
{
  "patient_takeaways": [
    "The cohort consists of 6 patients with a mean age of 60.2 years, indicating a predominantly older population.",
    "The average BMI is 29.9, suggesting that most patients are overweight.",
    "Systolic blood pressure averages at 140.5 mmHg, indicating a significant presence of hypertension in the cohort.",
    "The mean heart rate is 88 bpm, with one patient identified as tachycardic.",
    "The average HbA1c level is 7.3%, reflecting poor glycemic control in several patients."
  ],
  "model_watchouts": [
    "The small cohort size (n=6) may lead to overfitting and unreliable model predictions.",
    "There is a risk of bias due to the age distribution, as the cohort is predominantly older, which may not generalize to younger populations.",
    "The presence of multiple health flags (hypertension, tachycardia, poor glycemic control) may introduce confounding factors that complicate model training."
  ],
  "next_act