#Multimodal Application with LangChain using Google Gemini

#🧠 What Are Multimodal Models?
Multimodal models can handle multiple types of inputs, such as:

Text

Images

Video (some models)

Audio

Gemini 1.5 flash latest is a multimodal model that supports text + image input and returns textual responses.

#🎯 Goal:
Build a simple LangChain app that sends a text prompt and an image to Gemini and gets a meaningful response.

##✅ Step-by-Step Practical with Gemini + Base64 Image

#✅ Step 1: Install Required Libraries
📦 Install Required Libraries:


In [12]:
!pip install langchain langchain-google-genai pillow






#✅ Step 2: Encode Local Image to Base64

In [14]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema.messages import HumanMessage
from langchain.schema.document import Document
from PIL import Image
import io


#✅ Step 2: Encode Local Image to Base64

In [27]:
import base64
from PIL import Image
from io import BytesIO

# Load your image
image_path = "sample_image.jpg"
image = Image.open(image_path)

# Convert to base64 string
buffered = BytesIO()
image.save(buffered, format="JPEG")
img_b64 = base64.b64encode(buffered.getvalue()).decode()

# Prepare final string for Gemini
base64_image_str = f"data:image/jpeg;base64,{img_b64}"


#✅ Step 3: Initialize Gemini API keys and Model

In [23]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema.messages import HumanMessage

# Set your API key
os.environ["GOOGLE_API_KEY"] = "AIzaSyDR7ItGwxOcbodnqRZXJQzFN_MVrRWxGaw"

# Load Gemini multimodal model
vision_model = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest")


#✅ Step 4: Send Image + Text Prompt to Gemini

In [24]:
# Create a multimodal input message
multimodal_input = HumanMessage(content=[
    {"type": "text", "text": "What is shown in this image?"},
    {"type": "image_url", "image_url": base64_image_str}
])

# Invoke Gemini with image + prompt
response = vision_model.invoke([multimodal_input])
print("Gemini Response:\n", response.content)


Gemini Response:
 That's a Golden Retriever puppy lying in the grass.  It's a light cream or almost white color, and it's looking directly at the camera.  The background is blurred, focusing attention on the adorable puppy.


#++++++++++++++++++++++++++++++++++++++++++END

#🧠 Example: Use Multiple Images in One Prompt (Base64)
We’ll pass a list of base64-encoded images along with your text prompt.

#✅ Step-by-Step: Compare Two Images with Gemini
#🔧 Step 1: Encode Multiple Local Images to Base64

#Create Sample Images Programmatically (Best for Testing)

In [29]:
from PIL import Image, ImageDraw

# Create image1
img1 = Image.new("RGB", (200, 100), color="white")
draw1 = ImageDraw.Draw(img1)
draw1.text((10, 40), "This is image 1", fill="black")
img1.save("image1.jpg")

# Create image2
img2 = Image.new("RGB", (200, 100), color="white")
draw2 = ImageDraw.Draw(img2)
draw2.text((10, 40), "This is image 2", fill="black")
img2.save("image2.jpg")


In [30]:
img1_b64 = encode_image_to_base64("image1.jpg")
img2_b64 = encode_image_to_base64("image2.jpg")


#🔧 Step 2: Initialize Gemini model

In [31]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema.messages import HumanMessage

# Set your API key
os.environ["GOOGLE_API_KEY"] = "AIzaSyDR7ItGwxOcbodnqRZXJQzFN_MVrRWxGaw"

# Load Gemini multimodal model
vision_model = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest")



#🔧 Step 3: Create Prompt with Multiple Images

In [32]:
# Construct the multimodal message with both images
multi_image_prompt = HumanMessage(content=[
    {"type": "text", "text": "Compare these two images and describe the key differences."},
    {"type": "image_url", "image_url": img1_b64},
    {"type": "image_url", "image_url": img2_b64}
])


#🔧 Step 4: Get Gemini’s Multimodal Response

In [33]:
response = vision_model.invoke([multi_image_prompt])
print("Gemini Response:\n", response.content)


Gemini Response:
 The key difference between the two images is the number.  Image 1 is labeled "This is image 1," while Image 2 is labeled "This is image 2."  Beyond that, both images are identical; they contain only text on a white background.
