<a href="https://colab.research.google.com/github/ahsanrazi/LangChain/blob/main/09_Multimodal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multimodality

In [9]:
from google.colab import userdata
gemini_api_key = userdata.get('GEMINI_API_KEY').strip()

In [2]:
!pip install -qU langchain-google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
# Multimodality refers to the ability to work with data that comes in different forms, such as text, audio, images, and video.

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI

model = ChatGoogleGenerativeAI(model = "gemini-2.0-flash-exp", api_key = gemini_api_key)

In [25]:
image_url = "https://images.ctfassets.net/hrltx12pl8hq/28ECAQiPJZ78hxatLTa7Ts/2f695d869736ae3b0de3e56ceaca3958/free-nature-images.jpg?fit=fill&w=1200&h=630"

In [26]:
from langchain_core.messages import HumanMessage

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model.invoke([message])

In [27]:
response.content

'The weather in the image appears to be sunny with a clear blue sky and some fluffy white clouds. It looks like a pleasant, bright day.'

In [28]:
# The most commonly supported way to pass in images is to pass it in as a byte string. This should work for most model integrations.

import base64
import httpx

image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
image_data[:500]

'/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFRUVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBT/wAARCAJ2BLADAREAAhEBAxEB/8QAHgAAAgMBAQEBAQEAAAAAAAAAAwQBAgUABgcICQr/xABJEAACAQMDAgQEAwYEBgEBBQkBAgMABBEFEiExQQYTUWEUInGBBzKRCBUjQqHBUrHR8BYkM2Lh8XJDCRclU4KSosI0sjVEc2P/xAAcAQADAQEBAQEBAAAAAAAAAAABAgMABAUGBwj/xABEEQACAgICAQMCAwYFAQcDAQkAAQIRAyESMUEEE1EiYQUycQYUgZGh8CNCscHR4QcVM1KCkvEWctIk'

In [29]:
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
    ],
)
response = model.invoke([message])
print(response.content)

The weather in the image appears to be sunny with a clear blue sky and some scattered white, fluffy clouds. The presence of a shadow cast by the tree suggests that the sun is shining brightly. It looks like a pleasant and mild day.


# How to use multimodal prompts

In [30]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Describe the image provided"),
        (
            "user",
            [
                {"type": "text", "text": "Describe this image"},
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image_data}"}},
            ],
        ),
    ]
)

In [31]:
chain = prompt | model

response = chain.invoke({"image_data": image_data})
print(response.content)

The image showcases a serene and picturesque landscape. A solitary, full-bodied tree stands prominently in the center of the scene, casting a shadow on the vibrant green, rolling hills that stretch across the foreground. Above, a bright blue sky is dotted with fluffy white clouds, adding depth and dimension to the composition. The overall impression is one of tranquility and natural beauty.
