# Multi-Modal extraction with Gemini-Flash-1.5 📸 & Langchain ⛓️‍💥
In this notebook we will demo how to use the Gemini-Flash-1.5 model to extract entities from a given text. The model is a multi-modal model that can extract entities from both text and images.

In this example notebook we will touch on the following topics:
1. Extracting image metadata using Langchain and Gemini-Flash-1.5
2. Running the extraction process in parallel across all images in a dataset
3. Add variation to the generated text without using the model temperature parameter

In [9]:
# Images to extract data from
fruits = ['https://storage.googleapis.com/vectrix-public/fruit/apple.jpeg',
          'https://storage.googleapis.com/vectrix-public/fruit/banana.jpeg',
          'https://storage.googleapis.com/vectrix-public/fruit/kiwi.jpeg',
          'https://storage.googleapis.com/vectrix-public/fruit/peach.jpeg',
          'https://storage.googleapis.com/vectrix-public/fruit/plum.jpeg']

## Passing an image directly to the model
[As described in the LangChain documentation](https://python.langchain.com/v0.2/docs/how_to/multimodal_inputs/), we can use the code below to directly pass an image to the model. This will pass our multi-modal input along with out chat contents to the model.

The code is the same for other LLMs like GPT4o


In [10]:
from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI
import base64, httpx

model = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
image_data = base64.b64encode(httpx.get(fruits[0]).content).decode("utf-8")
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the fruit in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
response = model.invoke([message])
print(response.content)


The fruit is a red and yellow apple. It has a dimple in the middle and is slightly bruised.


In [13]:
len(image_data)

3972324

## Extracting structured data from images
The next step is to extract structured data from the image. We can achieve this by combining a [Pydantic parser](https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/types/pydantic/) with a multi-modal message. First we define a Pydantic data model and we will then pass that to the model to extract structured data from the image.

In [6]:
from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
import json

class Fruit(BaseModel):
    name: str = Field(description="The name of the fruit shown in the image")
    color: str = Field(description="The color of the fruit shown in the image")
    taste: str = Field(description="The taste of the fruit shown in the image")
    marketing_description: str = Field(description="A marketing description of the fruit shown in the image")

    @classmethod
    def model_json_schema(cls):
        return json.loads(cls.schema_json())

parser = PydanticOutputParser(pydantic_object=Fruit)



prompt = ChatPromptTemplate.from_messages([
    (
        "system","Return the requested response object in {language}.\n'{format_instructions}'\n"
    ),
    (
        "human", [
            {
                "type": "image_url",
                "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
            },
        ],
    )
])

chain = prompt | model | parser

# Retrieve the encoded image data
image_data = base64.b64encode(httpx.get(fruits[3]).content).decode("utf-8")


# Run the chain and print the result
print(chain.invoke({"language":"English",
                    "format_instructions":parser.get_format_instructions(),
                    "image_data":image_data}).json(indent=2),)

{
  "name": "Apricot",
  "color": "Orange",
  "taste": "Sweet",
  "marketing_description": "A juicy and flavorful apricot, perfect for a summer snack or dessert."
}


## Processing all images in parallel and translating the description in two languages
In this example notebook we only have 5 images, but in the case you want to run this on a larger dataset, running this using a regular for loop would be very slow. So, we use Langchains built-in parallel processing capabilities to process all images in parallel.

### Run the requests from above in parallel
By using chain.batch we can now run the extraction process for all the images in parallel. This will be much faster than running the requests sequentially.

Also note that we use the all_images list of dictionaries to feed the chain.batch function. This is because the chain.batch function expects a list of dictionaries as input.

In [7]:
# Now run this chain in parallel for all the images
all_images = [{"language":"English", 
  "format_instructions": parser.get_format_instructions(),
  "image_data":  base64.b64encode(httpx.get(url).content).decode("utf-8")} for url in fruits]

results = chain.batch(all_images, config={"max_concurrency": 5})

for result in results:
    print(result.json(indent=2))

{
  "name": "Apple",
  "color": "Red and green",
  "taste": "Sweet and tart",
  "marketing_description": "A crisp and juicy apple with a perfect balance of sweet and tart flavors. Enjoy it fresh, in salads, or baked into delicious desserts."
}
{
  "name": "Banana",
  "color": "Yellow",
  "taste": "Sweet",
  "marketing_description": "A delicious and nutritious fruit, perfect for a healthy snack or a tasty addition to your breakfast."
}
{
  "name": "Kiwi",
  "color": "Green",
  "taste": "Sweet and tangy",
  "marketing_description": "The kiwi is a delicious and healthy fruit that is packed with nutrients. It is a good source of vitamin C, potassium, and fiber. The kiwi has a unique flavor that is both sweet and tangy. It is a versatile fruit that can be enjoyed in many different ways, such as in smoothies, salads, or as a snack."
}
{
  "name": "Apricot",
  "color": "Orange",
  "taste": "Sweet",
  "marketing_description": "A juicy and flavorful apricot, perfect for a summer snack or desser

## Create variations on the output data
As seen in the examples above, all marketing descriptions start with the letter and the last two descriptions start very similar:
> A juicy and flavorful apricot...

> A juicy and flavorful plum...

Output that is very similar is not great for SEO-purposes. For annotating images playing with the temperature of the model isn't the best way to get different results.

#### Instead, we should write a function to forse the model to be more creative. We ask it to start the description with a certain letter and have a description with x amount of words.

In [8]:
import random

def generate_random_letter():
    letters = ['A', 'B', 'C', 'D', 'M', 'P', 'R', 'S', 'T']
    return str(random.choice(letters))

def generate_random_number():
    return int(random.randint(30, 45))



# A new prompt template that includes the marketing description starting with a letter given in the variable
prompt = ChatPromptTemplate.from_messages([
    (
        "system","Return the requested response object in {language}. Make sure the marketing description starts with the letter '{starting_letter}'\n'{format_instructions}'\n"
    ),
    (
        "human", [
            {
                "type": "image_url",
                "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
            },
        ],
    )
])

# Now run this chain in parallel for all the images
all_images = [{"language":"English", 
  "format_instructions": parser.get_format_instructions(),
  "image_data":  base64.b64encode(httpx.get(url).content).decode("utf-8"),
  "starting_letter": generate_random_letter()} for url in fruits] # Make sure you add the starting letter as a variable for the call

chain = prompt | model | parser



results = chain.batch(all_images, config={"max_concurrency": 5})


for result in results:
    print(result.json(indent=2))

{
  "name": "Apple",
  "color": "Red and Green",
  "taste": "Sweet and Tart",
  "marketing_description": "A crisp and juicy apple with a perfect balance of sweet and tart flavors. Enjoy it fresh, baked into a pie, or in a delicious salad."
}
{
  "name": "Banana",
  "color": "Yellow",
  "taste": "Sweet",
  "marketing_description": "Deliciously sweet and creamy, this banana is perfect for a quick snack or a healthy addition to your favorite smoothie."
}
{
  "name": "Kiwi",
  "color": "Green",
  "taste": "Sweet and tangy",
  "marketing_description": "The taste of sunshine! This kiwi is bursting with juicy, tangy flavor that's sure to brighten your day. Perfect for snacking, smoothies, or adding a touch of sweetness to your favorite dishes."
}
{
  "name": "Apricot",
  "color": "Orange",
  "taste": "Sweet",
  "marketing_description": "The apricot is a stone fruit with a sweet and juicy flavor. It's a popular choice for desserts, jams, and preserves. Apricots are also a good source of vitami