## Text input

https://platform.openai.com/docs/models

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.agents import create_agent

agent = create_agent(
    model='gpt-5-nano',
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [4]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

In a fictional setting, the capital of The Moon is Seleneapolis, commonly called Selene City.

- Location: A sunlit plateau near the rim of Shackleton Crater on the Moon’s south polar region, built into domed rings that rise above the regolith.
- Government: Seat of the Lunar Confederation; home to the Lunar Assembly and the central ministries that guide lunar policy, science, and trade with Earth.
- Population: Roughly 10–15 million residents and workers within the city proper, with millions more in orbiting stations and nearby colonies.
- Architecture and tech: Glass-domed neighborhoods, lattice-work lunar bricks, solar towers, ice-mining facilities, and a network of elevators and transit tunnels that stitch the city together.
- Notable districts and features:
  - Legislative Crescent: the government district housing the Lunar Assembly and council chambers.
  - Lantern Quarter: cultural heart with museums, theaters, markets, and public plazas lit by solar-harvesting towers.
  - Archi

## Image input

In [5]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.png', multiple=False)
display(uploader)

FileUpload(value=(), accept='.png', description='Upload')

In [6]:
print(uploader.value)

({'name': 'moon_city.png', 'type': 'image/png', 'size': 2812795, 'content': <memory at 0x10f736740>, 'last_modified': datetime.datetime(2025, 11, 13, 23, 11, 13, 130000, tzinfo=datetime.timezone.utc)},)


In [7]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [8]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this capital"},
    {"type": "image", "base64": img_b64, "mime_type": "image/png"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

Nova Asterion, capital of the Jovian Pact, sits like a gleaming crown on the pale crust of a moon orbiting a colossal gas giant. The image you’ve shared captures its atmosphere: a city built for scarcity and grandeur, with gravity-defying architecture that hums under the light of a banded planet hanging in the sky.

What makes Nova Asterion unique
- Architecture: Asterion is a lattice of vertical spires and circular hubs linked by glassy skyways and mag-lev belts. Central to the skyline is the Convergence Spire, a tapered monument that houses the Pact’s high council and the city’s primary power core. Dome-covered markets and research domes nestle between the towers, creating a city of light and shadow rather than asphalt and brick.
- Lighting and climate: The domes are clad in nano-matrix sunscreens that shift opacity with the planet’s daylight cycle, giving the city a perpetual dusky glow. Inside, climate domes and aquifer networks keep living spaces temperate, while the outer ring ne

## Audio input

In [9]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5  # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
# Progress bar for the duration
for _ in tqdm(range(duration * 10)):   # update 10× per second
    time.sleep(0.1)
sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

Recording...


100%|██████████| 50/50 [00:05<00:00,  9.57it/s]


Done.


In [10]:
agent = create_agent(
    model='gpt-4o-audio-preview',
)

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

In the sunlit spots, they love to lay,  
With velvet paws, they prance and play.  
Mischief in their emerald eyes,  
Graceful leaps that touch the skies.

Purring softly in the night,  
Their gentle gaze a soothing light.  
Independent, yet they roam,  
Bringing warmth to every home.

Whiskers twitch and tails that sway,  
Mystic creatures of night and day.  
Silent steps and curious minds,  
In feline hearts, true friends we find.
