# L4: Multimodal Retrieval Augmented Generation (MM-RAG)

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ‚è≥ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

>In this lesson you'll learn how to leverage Weaviate and Google Gemini to carry out a simple multimodal RAG workflow.

* In this classroom, the libraries have been already installed for you.
* If you would like to run this code on your own machine, you need to install the following:
```
    !pip install -U weaviate-client
    !pip install google-generativeai
```

In [1]:
import warnings
warnings.filterwarnings("ignore")

## Setup
### Load environment variables and API keys

In [2]:
# get necessary APIs
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

EMBEDDING_API_KEY = os.getenv("EMBEDDING_API_KEY")
GOOGLE_API_KEY=os.getenv("GOOGLE_API_KEY")

> Note: learn more about [GOOGLE_API_KEY](https://ai.google.dev/) to run it locally.

### Connect to Weaviate

In [3]:
import weaviate

client = weaviate.connect_to_embedded(
    version="1.24.21",
    environment_variables={
        "ENABLE_MODULES": "backup-filesystem,multi2vec-palm",
        "BACKUP_FILESYSTEM_PATH": "/home/jovyan/work/L4/backups", # where prevectorized data are
    },
    headers={
        "X-PALM-Api-Key": EMBEDDING_API_KEY,
    }
)

client.is_ready()

Started /home/jovyan/.cache/weaviate-embedded: process ID 240


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2025-11-05T16:54:58Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2025-11-05T16:54:58Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2025-11-05T16:54:58Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50050","time":"2025-11-05T16:54:58Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2025-11-05T16:54:58Z"}
            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


True

{"action":"telemetry_push","level":"info","msg":"telemetry started","payload":"\u0026{MachineID:53c090dc-8416-4d0d-bcbc-99f22f32f371 Type:INIT Version:1.24.21 NumObjects:0 OS:linux Arch:amd64 UsedModules:[]}","time":"2025-11-05T16:54:59Z"}


### Restore 13k+ prevectorized resources

In [4]:
client.collections.delete("Resources")

client.backup.restore(
    backup_id="resources-img-and-vid",
    include_collections="Resources", # where we load new dataset
    backend="filesystem"
)

# It can take a few seconds for the "Resources" collection to be ready.
# We add 5 seconds of sleep to make sure it is ready for the next cells to use.
import time
time.sleep(5)

{"action":"try_restore","backend":"filesystem","backup_id":"resources-img-and-vid","level":"info","msg":"","time":"2025-11-05T16:56:10Z","took":1282117}
{"action":"hnsw_prefill_cache_async","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2025-11-05T16:56:11Z","wait_for_cache_prefill":false}
{"level":"info","msg":"Completed loading shard resources_YR4XdMe3ODX3 in 4.969316ms","time":"2025-11-05T16:56:11Z"}
{"action":"restore","backup_id":"resources-img-and-vid","class":"Resources","level":"info","msg":"successfully restored","time":"2025-11-05T16:56:11Z"}
{"action":"restore","backup_id":"resources-img-and-vid","level":"info","msg":"backup restored successfully","time":"2025-11-05T16:56:11Z"}
{"action":"hnsw_vector_cache_prefill","count":16062,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2025-11-05T16:56:11Z","took":150554948}
{"action":"restore","backup_id":"resources-img-and-vid","level":"info","

### Preview data count

In [5]:
from weaviate.classes.aggregate import GroupByAggregate

resources = client.collections.get("Resources")

response = resources.aggregate.over_all(
    group_by=GroupByAggregate(prop="mediaType") # count all object and group them based on media type
)

# print rounds names and the count for each
for group in response.groups:
    print(f"{group.grouped_by.value} count: {group.total_count}")

image count: 13394
video count: 200


## Multimodal RAG

### Step 1 ‚Äì Retrieve content from the database with a query

In [7]:
# first step of running full multimodal RAG
from IPython.display import Image
from weaviate.classes.query import Filter

# given a query we want to retrieve an image
def retrieve_image(query):
    resources = client.collections.get("Resources")
# ============
    response = resources.query.near_text(
        query=query,
        # only images to pass to the vision model later
        filters=Filter.by_property("mediaType").equal("image"), # only return image objects
        # only interested in the path to the onject and return just one object
        return_properties=["path"],
        # return just one object
        limit = 1,
    )
# ============
    # grab the first object
    result = response.objects[0].properties
    return result["path"] # return  the image URL that matches our query

### Run image retrieval

In [None]:
# Try with different queries to retreive an image
img_path = retrieve_image("fishing with my buddies")
display(Image(img_path))

<p style="background-color:#fff6ff; padding:15px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px"> üíª &nbsp; <b>Access Files and Helper Functions:</b> To access the files for this notebook, 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>. For more help, please see the <em>"Appendix - Tips and Help"</em> Lesson.</p>


### Step 2 - Generate a description of the image

In [None]:
# set API key for generative model

import google.generativeai as genai
from google.api_core.client_options import ClientOptions

# Set the Vision model key
genai.configure(
        api_key=GOOGLE_API_KEY,
        transport="rest",
        client_options=ClientOptions(
            api_endpoint=os.getenv("GOOGLE_API_BASE"),
        ),
)

>**Note:** Late 2024, the model `'gemini-pro-vision'` (originally shown in the video's notebook) has been replaced with `'gemini-1.5-flash'` for updates.
Additionally, due to this change, the parameter `stream=True` has now been set to `stream=False`.

In [9]:
# Helper function
import textwrap
import PIL.Image
from IPython.display import Markdown, Image

# to convert output to markdown
def to_markdown(text):
    text = text.replace("‚Ä¢", "  *")
    return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))

# calling the function that given the image path and a prompt returns a noce description of the image
def call_LMM(image_path: str, prompt: str) -> str:
    img = PIL.Image.open(image_path)

    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content([prompt, img], stream=False)
    response.resolve()

    return to_markdown(response.text)    

<p style="background-color:#cceecc; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> üö® <b>Different Run Results:</b> The output generated by AI models can vary with each execution due to their probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>

### Run vision request

In [10]:
# path from step 1 and description
call_LMM(img_path, "Please describe this image in detail.")

NameError: name 'img_path' is not defined



> Note: Please be aware that the output from the previous cell may differ from what is shown in the video. This variation is normal and should not cause concern.

## All together

In [None]:
# combine all together
# create a function where the first step will be to call the retrieve image function 

def mm_rag(query):
    # Step 1 - retrieve an image ‚Äì Weaviate
    SOURCE_IMAGE = retrieve_image(query)
    # the output will be saved inside SOURCE_IMAGE variable
    display(Image(SOURCE_IMAGE)) 
#===========
    
    # Step 2 - generate a description - GPT4
    # call LLM with source image from previous step and the prompt
    # and that should return the description
    description = call_LMM(SOURCE_IMAGE, "Please describe this image in detail.")
    return description

In [None]:
# Call mm_rag function
mm_rag("paragliding through the mountains")

In [None]:
# Remember to close the weaviate instance
client.close()

### Try it yourself! 

Run the cells above selecting another image from the database and generate a description for it!