## Text input

https://platform.openai.com/docs/models

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.agents import create_agent

agent = create_agent(
    model='gpt-4.1-nano',
    system_prompt="You are a science fiction writer, create a capital city at the users request.",
)

In [3]:
from langchain.messages import HumanMessage

question = HumanMessage(content=[
    {"type": "text", "text": "What is the capital of The Moon?"}
])

response = agent.invoke(
    {"messages": [question]}
)

print(response['messages'][-1].content)

The Moon, being a celestial body rather than a nation or political entity, doesn't have an official capital. However, in science fiction and future scenarios imagining lunar colonies, the designated "capital" could be the central hub or administrative city of humanity’s lunar governance. 

For example, in some science fiction universes, the lunar capital might be called **"Luna Prime,"** a bustling metropolis situated near the Shackleton Crater to facilitate access to water ice and serve as the hub for lunar operations. 

Would you like me to create a detailed concept of a lunar capital city, including its name, location, infrastructure, and culture?


## Image input

We will be encoding our image and audio files in Base 64. This encodes binary, which is base two to 64 printable characters or base 64. This enables us to officially represent binary data and transmit this representation on our test-based communication channels.

In [None]:
#!pip install ipywidgets

In [4]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept='.png', multiple=False)
display(uploader)

FileUpload(value=(), accept='.png', description='Upload')

In [5]:
print(uploader.value)

({'name': 'moon_city.png', 'type': 'image/png', 'size': 3007510, 'content': <memory at 0x000001D712140640>, 'last_modified': datetime.datetime(2026, 1, 10, 8, 59, 13, 334000, tzinfo=datetime.timezone.utc)},)


In [7]:
uploader.value[0]["content"]

<memory at 0x000001D712140640>

In [8]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is a memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv)  # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

In [9]:
multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this capital"},
    {"type": "image", "base64": img_b64, "mime_type": "image/png"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

This image depicts a futuristic capital city situated on the moon's surface, emphasizing advanced technology and a grand-scale civilization. The city features towering, sleek skyscrapers with glowing lights, interconnected by luminous pathways, and protective domes over certain areas, suggesting a highly controlled and self-sufficient environment. 

In the background, Earth looms large, symbolizing a close connection or oversight from the home planet. The surrounding space with stars and distant celestial bodies highlights the city’s location beyond Earth, emphasizing human ingenuity and adaptation to extraterrestrial living.

This city exemplifies a vision of humankind’s evolution—complete with advanced architecture, space travel capabilities, and a thriving lunar metropolis—making it a hub of political, scientific, and cultural activity within a broader interplanetary civilization.


## Audio input

In [11]:
#!pip install sounddevice

In [12]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5  # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
# Progress bar for the duration
for _ in tqdm(range(duration * 10)):   # update 10× per second
    time.sleep(0.1)
sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8")

Recording...


100%|██████████| 50/50 [00:05<00:00,  9.88it/s]

Done.





In [13]:
agent = create_agent(
    model='gpt-4o-audio-preview',
)

multimodal_question = HumanMessage(content=[
    {"type": "text", "text": "Tell me about this audio file"},
    {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
])

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response['messages'][-1].content)

Silent paws, soft evening’s grace,  
Whiskers twitching, starry chase.  
Golden eyes in moonlight’s gleam,  
Dancing subtle in a dream.

Velvet steps on midnight floors,  
Shadow glides by open doors.  
Gentle purrs like soothing streams,  
Guardians of forgotten dreams.

With tails held high, they roam so free,  
Masters of curiosity.  
In sunbeams bright or shadows deep,  
They softly wander, then softly sleep.

A world of wonder in each stride,  
With secrets kept and hearts that bide.  
Oh playful muse of velvet grace,  
A cat’s soft magic fills the place.
