# LLM API ‚Äî Getting Started

This notebook shows how to call the **GPT-OSS** model hosted on `hub.qazcode.ai`.

The server uses the **OpenAI-compatible API**, so you can use the `openai` Python library or plain `requests`.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
API_KEY = os.environ.get('API_KEY')
HUB_URL = os.environ.get('HUB_URL')

if not API_KEY:
    raise ValueError("API_KEY is not set in environment variables")

In [3]:
from openai import OpenAI

client = OpenAI(
    base_url=HUB_URL,
    api_key=API_KEY,  # replace with your key
)

MODEL = "oss-120b"

In [4]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Hello! Who are you?"}
    ],
)

print(response.choices[0].message.content)

Hello! I‚Äôm ChatGPT, an AI language model created by OpenAI. I can help you with a wide range of tasks‚Äîanswering questions, explaining concepts, brainstorming ideas, drafting or editing text, solving math problems, coding, learning new topics, planning trips, and much more. 

What are you curious about or working on today? Let me know how I can assist you!


In [5]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {
            "role": "system",
            "content": "You are a medical diagnosis assistant. Given patient symptoms, suggest the most probable diagnosis with an ICD-10 code."
        },
        {
            "role": "user",
            "content": "Patient presents with fever, dry cough, and shortness of breath lasting 5 days."
        }
    ],
)

print(response.choices[0].message.content)

**Most Probable Diagnosis:**  
**COVID‚Äë19, virus identified** ‚Äì *ICD‚Äë10‚ÄëCM code: **U07.1***  

---

### Rationale
| Clinical Feature | Typical Association |
|------------------|----------------------|
| **Fever** (5‚ÄØdays) | Common in acute viral infections, especially SARS‚ÄëCoV‚Äë2 |
| **Dry (non‚Äëproductive) cough** | Characteristic of early COVID‚Äë19; less typical for bacterial pneumonia |
| **Shortness of breath** (dyspnea) | Frequently reported when the infection involves the lower respiratory tract |
| **Duration‚ÄØ‚âà‚ÄØ5‚ÄØdays** | Fits the usual incubation‚Äëto‚Äësymptom window of COVID‚Äë19 (2‚Äë14‚ÄØdays) and the acute phase of viral pneumonia |

Given the current epidemiology (ongoing community transmission of SARS‚ÄëCoV‚Äë2 in most regions) and the classic triad of fever, dry cough, and dyspnea, COVID‚Äë19 is the leading diagnosis.  

---

### Important Differentials (with ICD‚Äë10 codes)

| Condition | ICD‚Äë10 | Why it‚Äôs considered |
|-----------|--------|-

In [6]:
import requests

response = requests.post(
    f"{HUB_URL}/chat/completions",
    headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}",
    },
    json={
        "model": "oss-120b",
        "messages": [{"role": "user", "content": "Hello!"}],
    },
)

print(response.json()["choices"][0]["message"]["content"])

Hi there! How can I help you today?


In [8]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="oss-120b",
    api_key=API_KEY,
    base_url=HUB_URL,
    temperature=0
)

print(llm.invoke("Hello!").content)

Hello! üëã How can I assist you today?


In [9]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="ai-forever/ru-en-RoSBERTa",
    openai_api_base="http://localhost:8081/v1",
    openai_api_key="unused",
    check_embedding_ctx_length=False,
)

In [11]:
import shutil
from pathlib import Path
from langchain_chroma import Chroma

# Set to True to remove old DB (fixes "readonly database" or start fresh); set back to False after.
CLEAR_CHROMA_DB = True

persist_dir = (Path.cwd() / "chroma_langchain_db").resolve()
if CLEAR_CHROMA_DB and persist_dir.exists():
    shutil.rmtree(persist_dir)
    print("Removed existing chroma_langchain_db")

persist_dir.mkdir(parents=True, exist_ok=True)

vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings,
    persist_directory=str(persist_dir),
)

Removed existing chroma_langchain_db


In [12]:
import json
from pathlib import Path
from langchain_core.documents import Document

data_dir = Path("../extracted_data")

docs = []
for json_file in sorted(data_dir.glob("*.json")):
    with open(json_file) as f:
        data = json.load(f)
    # identified_symptoms is a list; page_content must be a string for LangChain/embedders
    symptoms = data["identified_symptoms"]
    page_content = "\n".join(symptoms) if isinstance(symptoms, list) else str(symptoms)
    docs.append(Document(
        page_content=page_content,
        metadata={
            "gt": data["gt"],
        },
    ))

print(f"Loaded {len(docs)} documents")
docs[0]
print(f"Total characters: {len(docs[0].page_content)}")

Loaded 221 documents
Total characters: 456


In [13]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split blog post into {len(all_splits)} sub-documents.")

Split blog post into 220 sub-documents.


In [14]:
BATCH_SIZE = 1000
document_ids = []
for i in range(0, len(all_splits), BATCH_SIZE):
    batch = all_splits[i : i + BATCH_SIZE]
    document_ids.extend(vector_store.add_documents(documents=batch))
    print(f"Added batch {i // BATCH_SIZE + 1} ({len(batch)} docs)")

print(f"\nTotal documents added: {len(document_ids)}")

Added batch 1 (220 docs)

Total documents added: 220


In [15]:
print(vector_store._collection.count())

220


In [16]:
# Top 20 candidates by similarity score (same idea as docs; feed these to the agent via retrieve_context)
query = "–ö–æ–≥–¥–∞ –∏–¥—ë–º, –µ–≥–æ –∫–∞—á–∞–µ—Ç, –±—É–¥—Ç–æ –ø–æ–ª —É–ø–ª—ã–≤–∞–µ—Ç, –ø–∞—Ä—É —Ä–∞–∑ —á—É—Ç—å –Ω–µ —É–ø–∞–ª."
results = vector_store.similarity_search_with_score(query, k=20)
for res, score in results:
    print(f"* [SIM={score:.3f}] {res.page_content[:300]}... [{res.metadata}]")

* [SIM=0.651] —Ç—è–Ω–µ—Ç –Ω–∏–∑ –∂–∏–≤–æ—Ç–∞, –±–æ–ª—å—à–µ —Å–ø—Ä–∞–≤–∞
–∫–æ–ª—å–Ω—ë—Ç, –æ—Ç–¥–∞—ë—Ç –≤ –ø–æ—è—Å–Ω–∏—Ü—É
–º–µ—Å–µ—á–Ω—ã–µ –∑–∞–¥–µ—Ä–∂–∏–≤–∞—é—Ç—Å—è –Ω–∞ 9 –¥–Ω–µ–π
–∫–æ—Ä–∏—á–Ω–µ–≤–∞—Ç–æ–µ –º–∞–∑–∞–Ω–∏–µ
–±–æ–ª—å —É—Å–∏–ª–∏–≤–∞–µ—Ç—Å—è –ø–æ—Å–ª–µ —Ö–æ–¥—å–±—ã –∏–ª–∏ –±–ª–∏–∑–æ—Å—Ç–∏
–∂–∏–≤–æ—Ç –±—É–¥—Ç–æ –Ω–∞–¥—É–≤–∞–µ—Ç—Å—è
—Ç–æ—à–Ω–æ—Ç–∞
–æ–¥–Ω–æ—Ä–∞–∑–æ–≤–æ–µ —Å–∫—Ä—É—á–∏–≤–∞–Ω–∏–µ –∂–∏–≤–æ—Ç–∞ (—Å–ø–∞—Å–µ–Ω–∏–µ) –≤ —Ç–µ—á–µ–Ω–∏–µ ~10 –º–∏–Ω—É—Ç... [{'start_index': 0, 'gt': 'N83.1'}]
* [SIM=0.653] —Ç–æ—à–Ω–æ—Ç–∞
–±–æ–ª—å –≤ –∂–∏–≤–æ—Ç–µ
–≥–æ–ª–æ–≤–Ω–∞—è –±–æ–ª—å
—Å–ª–∞–±–æ—Å—Ç—å
–¥–∏–∞—Ä–µ—è
–±–æ–ª—å –≤ —Å–ø–∏–Ω–µ –∏ –º—ã—à—Ü–∞—Ö... [{'gt': 'Z29.2', 'start_index': 0}]
* [SIM=0.685] –≤–∑–¥—É—Ç–∏–µ –∂–∏–≤–æ—Ç–∞
—Ç—è–∂–µ—Å—Ç—å –≤–Ω–∏–∑—É –∂–∏–≤–æ—Ç–∞
–¥–∏—Å–∫–æ–º—Ñ–æ—Ä—Ç –ø—Ä–∏ —Ö–æ–¥—å–±–µ
–æ–¥–µ–∂–¥–∞ –¥–∞–≤–∏—Ç –≤ –∂–∏–≤–æ—Ç–µ
–æ–¥—ã—à–∫–∞, –æ—Å–æ–±–µ–Ω–Ω–æ –≤ –ø–æ–ª–æ–∂–µ–Ω–∏–∏ –ª—ë–∂–∞
—Ç–æ—à–Ω–æ—Ç–∞
—Ä–≤–æ—Ç–∞
–æ—á–µ–Ω—å –º–∞–ª–æ–µ –∫–æ–ª–∏—á–µ—Å—Ç–≤–æ –º–æ—á–∏ (

In [17]:
from pydantic import BaseModel, Field
from typing import Annotated

class AgentResponse(BaseModel):
    ICD_10_code: Annotated[
        list[str],
        Field(
            description="ICD-10 codes that identify diseases or conditions.",
            min_length=3,
            max_length=3
        )
    ]

In [18]:
from langchain.tools import tool

@tool
def get_top3_protocols(query: str) -> str:
    """Retrieve the top 3 most relevant clinical protocol excerpts for the given patient symptoms query. Call this first with the user's symptoms."""
    results = vector_store.similarity_search_with_score(query, k=3)
    parts = []
    for i, (doc, score) in enumerate(results, 1):
        parts.append(f"[Match {i}, score={score:.3f}] Source: {doc.metadata}\nContent: {doc.page_content}")
    return "\n\n---\n\n".join(parts)

from langchain.agents import create_agent

tools = [get_top3_protocols]
prompt = (
    "You are a medical diagnosis assistant. "
    "You MUST call get_top3_protocols first with the patient's symptoms (the user message) to retrieve the top 3 relevant clinical protocol excerpts. "
    "Based only on that retrieved context, respond with exactly 3 ICD-10 codes (most probable diagnoses, ranked by likelihood). "
    "Your final response must be a valid JSON object with this exact structure: {\"ICD_10_code\": [\"code1\", \"code2\", \"code3\"]}. "
    "Example: {\"ICD_10_code\": [\"R42\", \"G43.1\", \"F41.0\"]}. No other text, only the JSON."
)
agent = create_agent(llm, tools, system_prompt=prompt)

In [19]:
import json
import re
query = "–ö–æ–≥–¥–∞ –∏–¥—ë–º, –µ–≥–æ –∫–∞—á–∞–µ—Ç, –±—É–¥—Ç–æ –ø–æ–ª —É–ø–ª—ã–≤–∞–µ—Ç, –ø–∞—Ä—É —Ä–∞–∑ —á—É—Ç—å –Ω–µ —É–ø–∞–ª."
result = agent.invoke({"messages": [{"role": "user", "content": query}]})
last_message = result["messages"][-1]
content = last_message.content if hasattr(last_message, "content") else str(last_message)
# Extract JSON (strip markdown code block if present)
raw = re.sub(r"^```(?:json)?\s*", "", content.strip()).rstrip("`")
try:
    parsed = AgentResponse.model_validate_json(raw)
except Exception:
    start = raw.find("{")
    if start != -1:
        depth, end = 0, start
        for i, c in enumerate(raw[start:], start):
            if c == "{": depth += 1
            elif c == "}": depth -= 1
            if depth == 0: end = i; break
        parsed = AgentResponse.model_validate_json(raw[start:end+1])
    else:
        raise
print("Structured output:", parsed)
print("Top 3 ICD-10 codes:", parsed.ICD_10_code)

Structured output: ICD_10_code=['S22.1', 'I06.9', 'R42']
Top 3 ICD-10 codes: ['S22.1', 'I06.9', 'R42']
