# OpenAI API Fundamentals

## Learning objectives

- Understand how OpenAI's Python SDK authenticates and selects models.
- Practice crafting prompts for neurosurgical reasoning, patient education, and documentation.
- Capture structured outputs (JSON) that can flow into operative notes or registries.
- Explore embeddings, images, audio, and vision workflows that turn LLMs into multi-modal teammates.
- Adopt safety measures so experimental code never substitutes for real clinical decision making.


## Prerequisites & setup

- Create an OpenAI account and generate an API key with access to the models used below.
- Store your key in an environment variable (e.g., `OPENAI_API_KEY`) and let `api_key.py` read it at runtime.
- Install Python 3.10+ plus the packages referenced here. If you are running in a clean environment uncomment the `pip` command in the next cell.
- All examples default to lightweight, cost-efficient models so you can iterate quickly and upgrade to bigger models when you need higher fidelity.


In [None]:
# %pip install --upgrade "openai>=1.40.0" numpy


In [2]:

from pathlib import Path
from pprint import pprint
import json
import textwrap

import numpy as np
from openai import OpenAI

import api_key

client = OpenAI(api_key=api_key.openai)
default_model = "gpt-4o-mini"
embedding_model = "text-embedding-3-large"

print(f"Ready! Text model: {default_model}, embedding model: {embedding_model}")


Ready! Text model: gpt-4o-mini, embedding model: text-embedding-3-large


### Helper utilities

A couple of helper functions keep the later sections tidy. `print_response` mirrors what you would do when you log the assistant's output


In [12]:
def wrap(text, width=90):
    print(textwrap.fill(text, width=width))

def print_response(resp):
    """Utility for printing the text output of Responses API calls."""
    print(resp.output_text.strip())







## 1. First warm up

Patient education via LLM


In [4]:

warmup_prompt = """You are prepping a patient for a L4-L5 microdiscectomy and consenting them in the clinic for the same.
Explain in ~120 words what will happen in the operating room and how the decompression
relieves leg pain. Avoid giving promises or numerical risks; instead focus on the flow
of the day and what sensations the patient may or may not notice."""

warmup_response = client.responses.create(
    model=default_model,
    input=[
        {
            "role": "system",
            "content": "You are a friendly neurosurgery resident. Keep explanations accurate and calming."
        },
        {"role": "user", "content": warmup_prompt},
    ],
    temperature=0.7,
)


In [13]:

wrap(warmup_response.output_text.strip())


In the operating room, you’ll be given anesthesia to ensure you’re comfortable and relaxed
throughout the procedure. Once you're asleep, the surgeon will make a small incision in
your lower back to access the L4-L5 area. Using a microscope, they’ll carefully remove the
herniated disc material that’s pressing on the nerves. This decompression helps relieve
leg pain by allowing the nerves to return to their normal state, reducing irritation and
inflammation. You may wake up in the recovery area feeling groggy, but many patients
notice a significant difference in their leg pain soon after the procedure. The team will
monitor you closely to ensure your comfort and safety as you recover.


### What did the API return?

The Responses API returns a structured object that includes the model name, token usage, finish reasons, and the text blocks produced. Inspecting this metadata helps you judge cost, latency, and reliability.


In [14]:

pprint(
    {
        "model": warmup_response.model,
        "usage": warmup_response.usage,
        "finish_reason": warmup_response.output[0],
    }
)


{'finish_reason': ResponseOutputMessage(id='msg_045ae875194234a000693e2b59f344819790fb8001b110b7b5', content=[ResponseOutputText(annotations=[], text="In the operating room, you’ll be given anesthesia to ensure you’re comfortable and relaxed throughout the procedure. Once you're asleep, the surgeon will make a small incision in your lower back to access the L4-L5 area. Using a microscope, they’ll carefully remove the herniated disc material that’s pressing on the nerves. This decompression helps relieve leg pain by allowing the nerves to return to their normal state, reducing irritation and inflammation. You may wake up in the recovery area feeling groggy, but many patients notice a significant difference in their leg pain soon after the procedure. The team will monitor you closely to ensure your comfort and safety as you recover.", type='output_text', logprobs=[])], role='assistant', status='completed', type='message'),
 'model': 'gpt-4o-mini-2024-07-18',
 'usage': ResponseUsage(input

## Basics of Large Language Models (LLMs)

Large Language Models (LLMs) like GPT-4o or GPT-4o-mini are neural networks trained on trillions of tokens so they learn statistical patterns of language and reasoning. They operate token-by-token: given previous tokens plus the prompt, the model predicts probabilities for the next token, samples one, then repeats until a stop condition or max length. The art of using LLMs therefore revolves around understanding tokens and the knobs that shape each sampling step.

### Tokens and context windows
- **Tokens** are the smallest text chunks the model sees (roughly a word or part of a word). Pricing, latency, and limits are all measured in tokens.
- **Context windows** cap how many tokens the model can process at once (e.g., 128k tokens). Both input and output tokens count toward this limit.
- **Counting tokens** up front avoids truncation: mind the `max_output_tokens` you request versus the remaining room in the context window.

### Parameters you can tune in API calls
| Parameter | What it controls | When to raise | When to lower | Example |
| --- | --- | --- | --- | --- |
| `temperature` | Scales randomness during sampling. 0 = deterministic, 1 = creative. | Brainstorming, dialogue, creative writing. | Instructions, coding, summarization. | `temperature=0.2` for precise Q&A, `temperature=0.8` for ideation. |
| `top_p` (nucleus sampling) | Keeps only the smallest set of tokens whose probabilities sum to `p`. | When you want open-ended but coherent text. | When you need strict determinism. | `top_p=0.9` keeps ~90% cumulative probability mass. |
| `top_k` | Restricts sampling to the top-k most likely tokens. | When platform exposes it (Anthropic, Google, some OpenAI betas) and you want controlled diversity. | When chasing single best answer. | `top_k=40` only considers the 40 most probable tokens. |
| `max_output_tokens` | Hard cap on generated length. | Prevent verbose responses, control spend. | Need longer explanations or documents. | `max_output_tokens=256` for summaries, `1024` for reports. |
| `presence_penalty` | Penalizes using tokens already present in the prompt/output, nudging new topics. | Encourage model to explore new ideas. | Preserve topic repetition (e.g., keyword stuffing). | `presence_penalty=0.3` to avoid repeating the same sentence. |
| `frequency_penalty` | Penalizes repeating the same token multiple times. | Remove loops and redundant wording. | Preserve repetition (poetry, chants). | `frequency_penalty=0.5` to keep lists concise. |
| `stop` / `stop_sequences` | Strings that halt generation when hit. | Force the model to hand control back to you. | Rarely. Use only if you truly need to cut output. | `stop=["Observation:"]` in ReAct workflows. |
| `logprobs` / `logit_bias` | Inspect or alter token probabilities. | Debugging, constraining to specific outputs. | Most production flows. | `logit_bias={"</s>": -100}` to block EOS. |
| `response_format` / `json_schema` | Forces strict JSON or structured replies when supported. | Building agents and tools that parse responses. | Free-form prose. | `response_format={"type":"json_schema","json_schema":...}`. |
| `seed` | Makes sampling reproducible when supported. | Testing, demos. | Production (unless determinism desired). | `seed=123` for consistent completions. |

> **Rule of thumb:** `temperature`, `top_p`, and `top_k` all affect randomness—tweak only one or two at a time to understand the change. Penalties reshape style, while max tokens, stop signals, and response formats control structure.

### Example API call with tuned parameters
```python
from openai import OpenAI
client = OpenAI()

resp = client.responses.create(
    model="gpt-4o-mini",
    input=[{"role": "user", "content": "Explain the Krebs cycle simply."}],
    temperature=0.3,
    top_p=0.85,
    top_k=40,
    max_output_tokens=220,
    presence_penalty=0.2,
    frequency_penalty=0.2,
    stop=["Teacher:"]
)
print(resp.output_text)
```
- Increase `temperature`/`top_p` for more imaginative analogies.
- Lower `temperature`/`top_p` and add stricter penalties for factual, step-by-step explanations.
- Raise `max_output_tokens` whenever the answer needs room (e.g., multi-section reports).

Experimenting with these knobs—and logging both prompts and parameter sets—helps you converge on predictable behavior for each task while keeping costs and latency under control.



## 2. Clinical reasoning
Graduate from education to reasoning. The prompt below simulates a spine-call text.


In [15]:

case_study = """47-year-old with a history of lumbar fusion now presents with 10 days of worsening
saddle anesthesia, urinary hesitancy, and bilateral plantar-flexion weakness after a fall.
MRI from outside hospital shows a large central disc extrusion at L5-S1 with severe canal compromise.
Summarize likely diagnoses, key red flags, and the next two actions you would take before
calling the attending. Limit yourself to evidence-based guidance and highlight when data is missing."""

case_response = client.responses.create(
    model=default_model,
    input=[
        {
            "role": "system",
            "content": "You are the chief neurosurgery resident. Think step-by-step and state when imaging or labs are pending."
        },
        {
            "role": "user",
            "content": case_study,
        },
    ],
    temperature=0.3,
)

print_response(case_response)


### Likely Diagnoses:
1. **Cauda Equina Syndrome (CES)**: The combination of saddle anesthesia, urinary hesitancy, and bilateral plantar-flexion weakness raises concern for CES due to the large central disc extrusion at L5-S1.
2. **Lumbar Disc Herniation**: The MRI findings suggest a significant disc herniation that is likely contributing to the neurological symptoms.
3. **Post-surgical Complications**: Given the history of lumbar fusion, there may be complications related to the previous surgery, such as scar tissue or adjacent segment disease.

### Key Red Flags:
- **Saddle Anesthesia**: Indicates potential involvement of the cauda equina.
- **Urinary Hesitancy**: Suggests possible bladder dysfunction, which is a critical sign of CES.
- **Bilateral Weakness**: Indicates a more diffuse neurological compromise rather than unilateral nerve root involvement.
- **Recent Fall**: Could have exacerbated an existing condition or caused new injury.

### Next Two Actions:
1. **Obtain a Neurolog

### Bonus: streaming live tokens (optional)

If you want the interface to feel more conversational, use streaming. The snippet below prints tokens as they arrive so a learner can see the thought process unfold. Remove the triple quotes and run it when you have time to experiment.


In [16]:
from contextlib import suppress
from openai import Stream

with client.responses.stream(
    model=default_model,
    input=[{"role": "user", "content": "List sterile setup steps SOP for an ICP monitor."}],
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)
    final_response = stream.get_final_response()

Creating a Standard Operating Procedure (SOP) for a sterile setup of an Intracranial Pressure (ICP) monitor involves outlining a step-by-step process to ensure the integrity of the equipment and prevent infections. Below are the detailed steps:

### Standard Operating Procedure (SOP) for Sterile Setup of an ICP Monitor

#### 1. **Preparation**
   - **Gather Supplies:**
     - ICP monitoring kit (includes catheter, transducer, tubing)
     - Sterile drapes
     - Sterile gloves
     - Antiseptic solution (e.g., Chlorhexidine or Betadine)
     - Sterile syringes and saline
     - Gauze pads
     - Biohazard waste bags

   - **Clean Workspace:**
     - Disinfect the surface where the setup will occur.
     - Ensure that the area is free of non-sterile items.

#### 2. **Hand Hygiene**
   - Perform hand hygiene using soap and water or an alcohol-based hand sanitizer.
   - Put on a pair of sterile gloves.

#### 3. **Drape Setup**
   - Open sterile drapes and arrange them in a manner to cover

## 3. Structured operative notes with JSON output

Operative summaries often need to land in registries or EMRs. The `response_format` parameter enforces a JSON schema so the assistant cannot hallucinate field names.


In [15]:

structured_prompt = """Summarize the key details from this case presentation for the operative board.
Emphasize neurologic status, imaging, and what still needs clarification.
Case: 35-year-old with recurrent left temporal low-grade glioma. Neuro exam intact.
Recent MRI: slow growth bordering language cortex. Plan for awake craniotomy with mapping.
Pending: updated labs, DTI tractography, and anesthesia clearance for awake technique."""


schema = {
    "type": "json_schema",
    "name": "postop_note",
    "schema": {
        "type": "object",
        "properties": {
            "patient_summary": {"type": "string"},
            "surgical_priority": {
                "type": "string",
                "enum": ["elective", "urgent", "emergent"],
            },
            "key_findings": {
                "type": "array",
                "items": {"type": "string"},
            },
            "next_steps": {
                "type": "array",
                "items": {"type": "string"},
            },
        },
        "required": ["patient_summary", "surgical_priority", "key_findings", "next_steps"],
        "additionalProperties": False,
    },
    "strict": True
}

structured_response = client.responses.create(
    model=default_model,
    input=[
        {"role": "system", "content": "Return concise clinical facts only."},
        {"role": "user", "content": structured_prompt},
    ],
    text={
        "format": schema
    } # type: ignore
    
)

patient_note = json.loads(structured_response.output_text)
pprint(patient_note)


{'key_findings': ['Recurrent left temporal low-grade glioma',
                  'Slow growth bordering language cortex on recent MRI',
                  'Neurologic status intact'],
 'next_steps': ['Obtain updated labs',
                'Complete DTI tractography',
                'Get anesthesia clearance for awake craniotomy technique'],
 'patient_summary': '35-year-old with recurrent left temporal low-grade '
                    'glioma. Neurologic exam is intact.',
 'surgical_priority': 'elective'}


## 4. Build a focused knowledge base with embeddings

Embeddings convert text into vectors so we can search our own notes. Below we encode three short neurosurgery guidelines, then ask the model which passages best answer a resident question. Cosine similarity identifies candidate references before you feed them to the LLM.


In [16]:

neurosurgery_notes = [
    {
        "title": "Spine trauma activation",
        "text": "Immobilize, obtain CT first, and screen for hemodynamic instability before MRI. Rapid neuro checks every 30 minutes until OR.",
    },
    {
        "title": "Cauda equina workup",
        "text": "New saddle anesthesia, sphincter dysfunction, or progressive weakness warrants emergent MRI with contrast. Decompress within 24 hours when deficits present.",
    },
    {
        "title": "Pituitary apoplexy",
        "text": "Look for sudden headache, ophthalmoplegia, and visual loss. Give stress-dose steroids, obtain MRI sellar protocol, and call endocrine for hormone replacement.",
    },
]

note_embeddings = client.embeddings.create(
    model=embedding_model,
    input=[note["text"] for note in neurosurgery_notes],
)

note_matrix = np.vstack([np.array(data.embedding) for data in note_embeddings.data])
note_matrix = note_matrix / np.linalg.norm(note_matrix, axis=1, keepdims=True)


def search_notes(question, top_k=2):
    query_embedding = client.embeddings.create(model=embedding_model, input=question).data[0].embedding
    query_vec = np.array(query_embedding)
    query_vec = query_vec / np.linalg.norm(query_vec)
    scores = note_matrix @ query_vec
    best_indices = scores.argsort()[::-1][:top_k]
    return [
        {
            "title": neurosurgery_notes[idx]["title"],
            "score": float(scores[idx]),
            "text": neurosurgery_notes[idx]["text"],
        }
        for idx in best_indices
    ]

results = search_notes("How should I image suspected cauda equina and what timing is ideal?")
for row in results:
    print(f"{row['title']} (cosine={row['score']:.3f})")
    wrap(row["text"], width=100)
    print()


Cauda equina workup (cosine=0.595)
New saddle anesthesia, sphincter dysfunction, or progressive weakness warrants emergent MRI with
contrast. Decompress within 24 hours when deficits present.

Spine trauma activation (cosine=0.495)
Immobilize, obtain CT first, and screen for hemodynamic instability before MRI. Rapid neuro checks
every 30 minutes until OR.



Once you have the top passages you can pass them back into `client.responses.create` as context so the model cites your own material rather than guessing.


# Bonus Points Professionals

## 5. Vision-informed Results

`gpt-4o-mini` can accept both text and images.


In [None]:
image_url = "https://sanctuarymentalhealth.org/wp-content/uploads/2021/03/The-Starry-Night-1200x630-1.jpg.webp"

vision_response = client.responses.create(
    model=default_model,
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "expalin this painting to me",
                },
                {"type": "input_image", "image_url": image_url}, # type: ignore
            ],
        }
    ],
)

print_response(vision_response)


This painting is "The Starry Night" by Vincent van Gogh, created in 1889. It depicts a
swirling night sky filled with vibrant stars and a bright crescent moon, set against a
backdrop of deep blues.   In the foreground, a large, dark cypress tree rises, often
interpreted as a symbol of death but also a connection between the earth and the sky. The
swirling patterns in the sky suggest movement and emotion, expressing van Gogh’s turbulent
state of mind at the time. The bright yellows and whites of the stars contrast with the
deep blues and greens, creating a sense of wonder and dynamism.   The painting reflects
van Gogh's unique style, characterized by bold brushstrokes and a vibrant color palette,
capturing both beauty and emotional depth. It invites viewers to contemplate the mysteries
of the universe and the artist’s inner turmoil.


In [28]:
import base64

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "mri_brain.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)


response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "what's in this image?" },
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                },
            ],
        } # type: ignore
    ],
)

print(response.output_text)

This image is an MRI scan of a human brain, specifically an axial (horizontal) view. The scan shows a prominent abnormality on the left side of the brain: a large, ring-enhancing lesion (a bright ring with a darker center). 

This appearance is often associated with certain brain pathologies such as:

- **Glioblastoma multiforme (a type of brain tumor)**
- **Brain abscess**
- **Metastatic brain tumor**

The ring enhancement suggests a lesion with a central area of necrosis (dead tissue) surrounded by a rim of active (enhancing) tissue, which is a classic sign for some aggressive brain tumors or infections.

A radiologist or medical professional should be consulted for an accurate diagnosis and interpretation in a clinical context. If you have concerns about a medical imaging result, please consult your healthcare provider.


## 6. Generating narrated briefings

Here is how to generate an educationala narrated briefing. Remember to store the file path if you want to share it later.


In [None]:

speech_text = "You may feel throat soreness and shoulder tightness after surgery. Keep your collar on, walk every few hours, and call if you notice new weakness, fevers, or difficulty swallowing. We are always on call for you."

audio_response = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="alloy",
    input=speech_text,
)

output_path = Path("patient_briefing.mp3")
audio_response.stream_to_file(output_path)
print(f"Saved narrated briefing to {output_path.resolve()}")


Saved narrated briefing to /Users/bdthombre/developer/python/python_tutorial/002_langchain/patient_briefing.mp3


  audio_response.stream_to_file(output_path)


## 7. Safety checklist and next steps

- **De-identify everything.** Never paste MRNs, dates of birth, or other PHI into prompts.
- **Label educational use.** Students should know these are coaching aids, not chart-ready notes.
- **Track costs.** `warmup_response.usage` shows token counts—log them if you plan to share notebooks broadly.
- **Add guardrails.** Combine embeddings + prompting to keep outputs anchored in approved institutional guidelines.
- **Experiment mindfully.** Try swapping `gpt-4o` for complex reasoning or `gpt-4o-mini` for labs with limited budgets.

### Suggested practice exercises

1. Swap in one of your own de-identified consult notes and see how structured JSON output changes.
2. Expand the embeddings dataset with departmental protocols, then build a simple retrieval-augmented generation (RAG) helper.
3. Create a function tool that performs a quick NIHSS calculation and let the LLM decide when to call it.
4. Prototype a Gradio or Streamlit UI that wraps the prompts you liked best.
