## Text Input

https://platform.openai.com/docs/models

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain.agents import create_agent

agent = create_agent(
    model="gpt-5-nano",
    system_prompt="You are a science fiction writer, create a capital city at the users request."
)

In [3]:
from langchain.messages import HumanMessage

question = HumanMessage(
    content=[
        {"type": "text", "text": "What is the capital of the Moon?"}
    ]
)

response = agent.invoke(
    {"messages": [question]}
)

print(response["messages"][-1].content)

In this sci‑fi setting, the Moon’s capital is Selene Prime (also known as Selene City). It sits in Shackleton Crater at the lunar south pole, a hub of governance and culture for the Lunar Confederation. The city is ringed with glass-domed districts, ice-powered infrastructure, and solar-collector rings. The Lunar Assembly and Central Archive meet in the Council Dome, while a magnetically levitated transit network carries people and goods between districts.


## Image input

In [4]:
from ipywidgets import FileUpload
from IPython.display import display

uploader = FileUpload(accept=".png", multiple=False) # Use /files/dog.png
display(uploader)

FileUpload(value=(), accept='.png', description='Upload')

In [5]:
print(uploader.value)

({'name': 'dog.png', 'type': 'image/png', 'size': 940294, 'content': <memory at 0x123b9aa40>, 'last_modified': datetime.datetime(2025, 12, 30, 17, 15, 25, 414000, tzinfo=datetime.timezone.utc)},)


In [None]:
import base64

# Get the first (and only) uploaded file dict
uploaded_file = uploader.value[0]

# This is memoryview
content_mv = uploaded_file["content"]

# Convert memoryview -> bytes
img_bytes = bytes(content_mv) # or content_mv.tobytes()

# Now base64 encode
img_b64 = base64.b64encode(img_bytes).decode("utf-8") # Convert raw bytes into base64 text string.

In [8]:
multimodal_question = HumanMessage(
    content=[
        {"type": "text", "text": "Tell me about the features you notice on this dog."},
        {"type": "image", "base64": img_b64, "mime_type": "image/png"}
    ]
)

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response["messages"][-1].content)

Here are the features I notice:

- Size and stance: A small, compact dog lounging on a cushioned metal bench. Front paws rest over the edge, giving a relaxed, curious look.
- Coat: Dense, tight curls all over the body. Solid black color with a bit of lighter gray around the muzzle and chin.
- Face: Round, with fur that hides much of the eyes. Small, dark nose centered on the snout.
- Ears: Folded into the fluffy fur; not clearly shaped or pointed.
- Eyes: Dark and partly obscured by the hair, giving a softly scruffy expression.
- Body build: Stocky and sturdy for a small dog; appears well-groomed.
- Paws: Small and fluffy, proportionate to the body.
- Tail: Fluffy and visible at the rear, blending into the rest of the coat.
- Overall vibe: Calm, content, and a bit inquisitive, enjoying a sunny day on the bench.


## Audio inout

In [None]:
import sounddevice as sd
from scipy.io.wavfile import write
import base64
import io
import time
from tqdm import tqdm

# Recording settings
duration = 5 # seconds
sample_rate = 44100

print("Recording...")
audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)

# Progress bar for the duration
for _ in tqdm(range(duration * 10)): # Update 10x per second
    time.sleep(0.1)

sd.wait()
print("Done.")

# Write WAV to an in-memory buffer
buf = io.BytesIO()
write(buf, sample_rate, audio)
wav_bytes = buf.getvalue()

aud_b64 = base64.b64encode(wav_bytes).decode("utf-8") # Convert raw bytes into base64 text string.

Recording...


100%|██████████| 50/50 [00:05<00:00,  9.51it/s]


Done.


In [10]:
agent = create_agent(
    model="gpt-4o-audio-preview"
)

multimodal_question = HumanMessage(
    content=[
        {"type": "text", "text": "Tell me about this audio file"},
        {"type": "audio", "base64": aud_b64, "mime_type": "audio/wav"}
    ]
)

response = agent.invoke(
    {"messages": [multimodal_question]}
)

print(response["messages"][-1].content)

Neural networks have an interesting history that starts with the early exploration of how the human brain works. 

In 1943, Warren McCulloch and Walter Pitts published a paper that proposed a simplified model of neurons, essentially creating the first computational model of neural networks.

In the 1950s, Frank Rosenblatt developed the Perceptron, which was an early type of neural network that could learn certain patterns. However, it wasn’t capable of solving more complex problems, which led to a decline in interest by the 1970s.

In the 1980s, interest resurfaced with the development of new algorithms, like backpropagation, which allowed neural networks to adjust their weights and learn more effectively.

By the 2000s and 2010s, with the advent of big data and more powerful computers, deep learning—using many-layered neural networks—became possible. This led to breakthroughs in tasks like image recognition, speech recognition, and natural language processing.

Today, neural networks 