In [2]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Multimodal Retrieval Augmented Generation (RAG) with Gemini, Vertex AI Vector Search, and LangChain

<table align="left">
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretrieval-augmented-generation%2Fmultimodal_rag_langchain.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
    <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|-|-|
|Author(s) | [Holt Skinner](https://github.com/holtskinner) |

## Overview

Retrieval augmented generation (RAG) has become a popular paradigm for enabling LLMs to access external data and also as a mechanism for grounding to mitigate against hallucinations.

In this notebook, you will learn how to perform multimodal RAG where you will perform Q&A over a financial document filled with both text and images.

### Gemini

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini 1.0 Pro Vision and Gemini 1.0 Pro models.

### Comparing text-based and multimodal RAG

Multimodal RAG offers several advantages over text-based RAG:

1. **Enhanced knowledge access:** Multimodal RAG can access and process both textual and visual information, providing a richer and more comprehensive knowledge base for the LLM.
2. **Improved reasoning capabilities:** By incorporating visual cues, multimodal RAG can make better informed inferences across different types of data modalities.

This notebook shows you how to use RAG with Vertex AI Gemini API, and [multimodal embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/multimodal-embeddings), to build a document search engine.

Through hands-on examples, you will discover how to construct a multimedia-rich metadata repository of your document sources, enabling search, comparison, and reasoning across diverse information streams.

### Objectives

This notebook provides a guide to building a document search engine using multimodal retrieval augmented generation (RAG), step by step:

1. Extract and store metadata of documents containing both text and images, and generate embeddings the documents
2. Search the metadata with text queries to find similar text or images
3. Search the metadata with image queries to find similar images
4. Using a text query as input, search for contextual answers using both text and images

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Install Vertex AI SDK for Python and other dependencies

In [2]:
%pip install -U -q google-cloud-aiplatform langchain-core langchain-google-vertexai langchain-text-splitters langchain-community "unstructured[all-docs]" pypdf pydantic lxml pillow matplotlib opencv-python tiktoken

In [5]:
!pip install --upgrade google-cloud-storage
!pip install --upgrade langchain langchain-core langchain-google-vertexai unstructured
!apt-get install -y poppler-utils
!apt install -y tesseract-ocr
!pip install pytesseract
!pip install --upgrade nltk
!pip install chromadb
!pip install --upgrade pydantic

Collecting google-cloud-storage
  Using cached google_cloud_storage-2.18.2-py2.py3-none-any.whl.metadata (9.1 kB)
Using cached google_cloud_storage-2.18.2-py2.py3-none-any.whl (130 kB)
Installing collected packages: google-cloud-storage
  Attempting uninstall: google-cloud-storage
    Found existing installation: google-cloud-storage 1.44.0
    Uninstalling google-cloud-storage-1.44.0:
      Successfully uninstalled google-cloud-storage-1.44.0
Successfully installed google-cloud-storage-2.18.2


Collecting langchain
  Using cached langchain-0.3.7-py3-none-any.whl.metadata (7.1 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.139-py3-none-any.whl.metadata (13 kB)
Using cached langchain-0.3.7-py3-none-any.whl (1.0 MB)
Downloading langsmith-0.1.139-py3-none-any.whl (302 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.2/302.2 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langsmith, langchain
  Attempting uninstall: langsmith
    Found existing installation: langsmith 0.0.92
    Uninstalling langsmith-0.0.92:
      Successfully uninstalled langsmith-0.0.92
  Attempting uninstall: langchain
    Found existing installation: langchain 0.0.300
    Uninstalling langchain-0.0.300:
      Successfully uninstalled langchain-0.0.300
Successfully installed langchain-0.3.7 langsmith-0.1.139
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
poppler-utils 

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [6]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [1]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Define Google Cloud project information

In [2]:
PROJECT_ID = "gen-lang-client-0784670847"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

# For Vector Search Staging
GCS_BUCKET = "multimodalrag_lucid"  # @param {type:"string"}
GCS_BUCKET_URI = f"gs://{GCS_BUCKET}"

### Initialize the Vertex AI SDK

In [3]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=GCS_BUCKET_URI)

### Import libraries

In [4]:
import base64
import os
import re
import uuid

from IPython.display import Image, Markdown, display
from langchain.prompts import PromptTemplate
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_core.documents import Document
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_google_vertexai import (
    ChatVertexAI,
    VectorSearchVectorStore,
    VertexAI,
    VertexAIEmbeddings,
)
from langchain_text_splitters import CharacterTextSplitter
from unstructured.partition.pdf import partition_pdf

# from langchain_community.vectorstores import Chroma  # Optional

### Define model information

- [Vertex AI - Model Information](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models)

In [5]:
MODEL_NAME = "gemini-1.5-flash"
GEMINI_OUTPUT_TOKEN_LIMIT = 8192

EMBEDDING_MODEL_NAME = "text-embedding-004"
EMBEDDING_TOKEN_LIMIT = 2048

TOKEN_LIMIT = min(GEMINI_OUTPUT_TOKEN_LIMIT, EMBEDDING_TOKEN_LIMIT)

## Data Loading

#### Get documents and images from GCS

In [6]:
# Download documents and images used in this notebook
!gsutil -m rsync -r gs://multimodalrag_lucid/lucid_data .
print("Download completed")


both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
Download completed


## Partition PDF tables, text, and images

### The data

The source data that you will use in this notebook is a modified version of [Google-10K](https://abc.xyz/assets/investor/static/pdf/20220202_alphabet_10K.pdf) which provides a comprehensive overview of the company's financial performance, business operations, management, and risk factors. As the original document is rather large, you will be using [a modified version with only 14 pages](https://storage.googleapis.com/github-repo/rag/multimodal_rag_langchain/google-10k-sample-14pages.pdf) instead. Although it's truncated, the sample document still contains text along with images such as tables, charts, and graphs.

In [7]:
pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
pdf_file_name = [f for f in os.listdir(pdf_folder_path) if f.endswith('.pdf')]
# pdf_file_name = "About the Handbook _ The GitLab Handbook.pdf"                   : previously reading 1 document

# Extract images, tables, and chunk text from a PDF file.
raw_pdf_elements = partition_pdf(
    filename=pdf_file_name,
    extract_images_in_pdf=False,
    infer_table_structure=True,
    chunking_strategy="by_title",
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
    image_output_dir_path=pdf_folder_path,
)

# Categorize extracted elements from a PDF into tables and texts.
tables = []
texts = []
for element in raw_pdf_elements:
    if "unstructured.documents.elements.Table" in str(type(element)):
        tables.append(str(element))
    elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
        texts.append(str(element))

# Optional: Enforce a specific token size for texts
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=10000, chunk_overlap=0
)
joined_texts = " ".join(texts)
texts_4k_token = text_splitter.split_text(joined_texts)

In [8]:
# Generate summaries of text elements


def generate_text_summaries(
    texts: list[str], tables: list[str], summarize_texts: bool = False
) -> tuple[list, list]:
    """
    Summarize text elements
    texts: List of str
    tables: List of str
    summarize_texts: Bool to summarize texts
    """

    # Prompt
    prompt_text = """You are an assistant tasked with summarizing tables and text for retrieval. \
    These summaries will be embedded and used to retrieve the raw text or table elements. \
    Give a concise summary of the table or text that is well optimized for retrieval. Table or text: {element} """
    prompt = PromptTemplate.from_template(prompt_text)
    empty_response = RunnableLambda(
        lambda x: AIMessage(content="Error processing document")
    )
    # Text summary chain
    model = VertexAI(
        temperature=0, model_name=MODEL_NAME, max_output_tokens=TOKEN_LIMIT
    ).with_fallbacks([empty_response])
    summarize_chain = {"element": lambda x: x} | prompt | model | StrOutputParser()

    # Initialize empty summaries
    text_summaries = []
    table_summaries = []

    # Apply to text if texts are provided and summarization is requested
    if texts:
        if summarize_texts:
            text_summaries = summarize_chain.batch(texts, {"max_concurrency": 1})
        else:
            text_summaries = texts

    # Apply to tables if tables are provided
    if tables:
        table_summaries = summarize_chain.batch(tables, {"max_concurrency": 1})

    return text_summaries, table_summaries


# Get text, table summaries
text_summaries, table_summaries = generate_text_summaries(
    texts_4k_token, tables, summarize_texts=True
)

In [9]:
def encode_image(image_path: str) -> str:
    """Getting the base64 string"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


def image_summarize(model: ChatVertexAI, base64_image: str, prompt: str) -> str:
    """Make image summary"""
    msg = model.invoke(
        [
            HumanMessage(
                content=[
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ]
            )
        ]
    )
    return msg.content


def generate_img_summaries(path: str) -> tuple[list[str], list[str]]:
    """
    Generate summaries and base64 encoded strings for images
    path: Path to list of .jpg files extracted by Unstructured
    """

    # Store base64 encoded images
    img_base64_list = []

    # Store image summaries
    image_summaries = []

    # Prompt
    prompt = """You are an assistant tasked with summarizing images for retrieval. \
    These summaries will be embedded and used to retrieve the raw image. \
    Give a concise summary of the image that is well optimized for retrieval.
    If it's a table, extract all elements of the table.
    If it's a graph, explain the findings in the graph.
    Do not include any numbers that are not mentioned in the image.
    """

    model = ChatVertexAI(model_name=MODEL_NAME, max_output_tokens=TOKEN_LIMIT)

    # Apply to images
    for img_file in sorted(os.listdir(path)):
        if img_file.endswith(".png"):
            base64_image = encode_image(os.path.join(path, img_file))
            img_base64_list.append(base64_image)
            image_summaries.append(image_summarize(model, base64_image, prompt))

    return img_base64_list, image_summaries


# Image summaries
img_base64_list, image_summaries = generate_img_summaries(".")

## Create & Deploy Vertex AI Vector Search Index & Endpoint

Skip this step if you already have Vector Search set up.

- https://console.cloud.google.com/vertex-ai/matching-engine/indexes

- Create [`MatchingEngineIndex`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndex)
  - https://cloud.google.com/vertex-ai/docs/vector-search/create-manage-index

In [10]:
# https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings
DIMENSIONS = 768  # Dimensions output from textembedding-gecko

index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name="mm_rag_langchain_index",
    dimensions=DIMENSIONS,
    approximate_neighbors_count=150,
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Multimodal RAG LangChain Index",
    index_update_method="STREAM_UPDATE",
)

INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Creating MatchingEngineIndex
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Create MatchingEngineIndex backing LRO: projects/61515807285/locations/us-central1/indexes/8034399744996409344/operations/3415520343542988800
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:MatchingEngineIndex created. Resource name: projects/61515807285/locations/us-central1/indexes/8034399744996409344
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:To use this MatchingEngineIndex in another session:
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:index = aiplatform.MatchingEngineIndex('projects/61515807285/locations/us-central1/indexes/8034399744996409344')


- Create [`MatchingEngineIndexEndpoint`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndexEndpoint)
  - https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public

In [11]:
DEPLOYED_INDEX_ID = "mm_rag_langchain_index_endpoint"

# index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
#     display_name=DEPLOYED_INDEX_ID,
#     description="Multimodal RAG LangChain Index Endpoint",
#     public_endpoint_enabled=True,
#     location=LOCATION
# )


# Define the display name of your existing index endpoint
DEPLOYED_INDEX_ID = "mm_rag_langchain_index_endpoint"

# List all index endpoints with the specified display name
index_endpoints = aiplatform.MatchingEngineIndexEndpoint.list(
    filter=f'display_name="{DEPLOYED_INDEX_ID}"',
    location=LOCATION
)

# Check if any endpoints are found
if index_endpoints:
    # Use the first matching endpoint
    index_endpoint = index_endpoints[0]
    print(f"Using existing index endpoint: {index_endpoint.resource_name}")
else:
    print("No index endpoint found with the specified display name.")
    # Optionally, handle the case where no endpoint is found


Using existing index endpoint: projects/61515807285/locations/us-central1/indexEndpoints/7140857431428431872


In [12]:

# Replace with your existing Index's display name
INDEX_DISPLAY_NAME = "mm_rag_langchain_index"

# Replace with your existing Index Endpoint's display name
INDEX_ENDPOINT_DISPLAY_NAME = "mm_rag_langchain_index_endpoint"

# Function to retrieve an existing index by display name
def get_existing_index(display_name):
    indexes = aiplatform.MatchingEngineIndex.list()
    for idx in indexes:
        if idx.display_name == display_name:
            return idx
    raise ValueError(f"No index found with display name '{display_name}'")

# Function to retrieve an existing index endpoint by display name
def get_existing_index_endpoint(display_name):
    index_endpoints = aiplatform.MatchingEngineIndexEndpoint.list()
    for endpoint in index_endpoints[1:]:
        if endpoint.display_name == display_name:
            return endpoint
    raise ValueError(f"No index endpoint found with display name '{display_name}'")

# Retrieve the existing index
index = get_existing_index(INDEX_DISPLAY_NAME)
print(f"Using existing index: {index.resource_name}")

# Retrieve the existing index endpoint
index_endpoint = get_existing_index_endpoint(INDEX_ENDPOINT_DISPLAY_NAME)
print(f"Using existing index endpoint: {index_endpoint.resource_name}")

# Now you can use 'index' and 'index_endpoint' in your application


Using existing index: projects/61515807285/locations/us-central1/indexes/8034399744996409344
Using existing index endpoint: projects/61515807285/locations/us-central1/indexEndpoints/7796131177210839040


- Deploy Index to Index Endpoint
  - NOTE: This will take a while to run.
  - You can stop this cell after starting it instead of waiting for deployment.
  - You can check the status at https://console.cloud.google.com/vertex-ai/matching-engine/indexes

In [14]:
index_endpoint = index_endpoint.deploy_index(
    index=index, deployed_index_id="mm_rag_langchain_index_endpoint2"
)
index_endpoint.deployed_indexes

INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:Deploying index MatchingEngineIndexEndpoint index_endpoint: projects/61515807285/locations/us-central1/indexEndpoints/7796131177210839040
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:Deploy index MatchingEngineIndexEndpoint index_endpoint backing LRO: projects/61515807285/locations/us-central1/indexEndpoints/7796131177210839040/operations/4307233069762347008
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:MatchingEngineIndexEndpoint index_endpoint Deployed index. Resource name: projects/61515807285/locations/us-central1/indexEndpoints/7796131177210839040


[id: "multimodal_lucid_1730481025996"
index: "projects/61515807285/locations/us-central1/indexes/7765309667261022208"
display_name: "multimodal_lucid"
create_time {
  seconds: 1730481050
  nanos: 746459000
}
index_sync_time {
  seconds: 1730563514
  nanos: 504909000
}
automatic_resources {
  min_replica_count: 2
  max_replica_count: 2
}
deployment_group: "default"
, id: "mm_rag_langchain_index_endpoint1"
index: "projects/61515807285/locations/us-central1/indexes/5289455772114092032"
create_time {
  seconds: 1730560888
  nanos: 776293000
}
index_sync_time {
  seconds: 1730563565
  nanos: 244132000
}
automatic_resources {
  min_replica_count: 2
  max_replica_count: 2
}
deployment_group: "default"
, id: "mm_rag_langchain_index_endpoint2"
index: "projects/61515807285/locations/us-central1/indexes/8034399744996409344"
create_time {
  seconds: 1730563479
  nanos: 723943000
}
index_sync_time {
  seconds: 1730563724
  nanos: 378237000
}
automatic_resources {
  min_replica_count: 2
  max_replic

## Create retriever & load documents

- Create [`VectorSearchVectorStore`](https://api.python.langchain.com/en/latest/vectorstores/langchain_google_vertexai.vectorstores.vectorstores.VectorSearchVectorStore.html) with Vector Search Index ID and Endpoint ID.
- Use [`textembedding-gecko`](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings) as embedding model.

In [15]:
# print(index.name)
print(index_endpoint.name)

7796131177210839040


In [16]:
# The vectorstore to use to index the summaries

vectorstore = VectorSearchVectorStore.from_components(
    project_id=PROJECT_ID,
    region=LOCATION,
    gcs_bucket_name=GCS_BUCKET,
    index_id=index.name,
    endpoint_id=index_endpoint.name,
    embedding=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
    stream_update=True,
)

- Alternatively, use Chroma for a local vector store.

In [1]:
# # !pip install pydantic==1.10.12

# from langchain.vectorstores import Chroma
# from langchain.embeddings import VertexAIEmbeddings

# vectorstore = Chroma(
#     collection_name="mm_rag_test",
#     embedding_function=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
# )



ImportError: cannot import name 'model_validator' from 'pydantic' (/usr/local/lib/python3.10/dist-packages/pydantic/__init__.cpython-310-x86_64-linux-gnu.so)

- Create Multi-Vector Retriever using the vector store you created.
- Since vector stores only contain the embedding and an ID, you'll also need to create a document store indexed by ID to get the original source documents after searching for embeddings.

In [17]:
docstore = InMemoryStore()

id_key = "doc_id"
# Create the multi-vector retriever
retriever_multi_vector_img = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    id_key=id_key,
)

- Load data into Document Store and Vector Store

In [18]:
# Raw Document Contents
doc_contents = texts + tables + img_base64_list

doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(text_summaries + table_summaries + image_summaries)
]

retriever_multi_vector_img.docstore.mset(list(zip(doc_ids, doc_contents)))

# If using Vertex AI Vector Search, this will take a while to complete.
# You can cancel this cell and continue later.
retriever_multi_vector_img.vectorstore.add_documents(summary_docs)

INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Upserting datapoints MatchingEngineIndex index: projects/61515807285/locations/us-central1/indexes/8034399744996409344
INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:MatchingEngineIndex index Upserted datapoints. Resource name: projects/61515807285/locations/us-central1/indexes/8034399744996409344


['57fd1851-cb34-4713-82f7-362117e29a60',
 '2909c300-faa4-4db1-ac05-b5202077ebac',
 'c77a4064-be2d-495d-a1fa-dea26d5aee5b',
 '200d9443-20e5-403a-a671-88996a3f6a8f']

## Create Chain with Retriever and Gemini LLM

In [19]:
def looks_like_base64(sb):
    """Check if the string looks like base64"""
    return re.match("^[A-Za-z0-9+/]+[=]{0,2}$", sb) is not None


def is_image_data(b64data):
    """
    Check if the base64 data is an image by looking at the start of the data
    """
    image_signatures = {
        b"\xFF\xD8\xFF": "jpg",
        b"\x89\x50\x4E\x47\x0D\x0A\x1A\x0A": "png",
        b"\x47\x49\x46\x38": "gif",
        b"\x52\x49\x46\x46": "webp",
    }
    try:
        header = base64.b64decode(b64data)[:8]  # Decode and get the first 8 bytes
        for sig, format in image_signatures.items():
            if header.startswith(sig):
                return True
        return False
    except Exception:
        return False


def split_image_text_types(docs):
    """
    Split base64-encoded images and texts
    """
    b64_images = []
    texts = []
    for doc in docs:
        # Check if the document is of type Document and extract page_content if so
        if isinstance(doc, Document):
            doc = doc.page_content
        if looks_like_base64(doc) and is_image_data(doc):
            b64_images.append(doc)
        else:
            texts.append(doc)
    return {"images": b64_images, "texts": texts}


def img_prompt_func(data_dict):
    """
    Join the context into a single string
    """
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = [
        {
            "type": "text",
            "text": (
                "You are financial analyst tasking with providing investment advice.\n"
                "You will be given a mix of text, tables, and image(s) usually of charts or graphs.\n"
                "Use this information to provide investment advice related to the user's question. \n"
                f"User-provided question: {data_dict['question']}\n\n"
                "Text and / or tables:\n"
                f"{formatted_texts}"
            ),
        }
    ]

    # Adding image(s) to the messages if present
    if data_dict["context"]["images"]:
        for image in data_dict["context"]["images"]:
            messages.append(
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{image}"},
                }
            )
    return [HumanMessage(content=messages)]


# Create RAG chain
chain_multimodal_rag = (
    {
        "context": retriever_multi_vector_img | RunnableLambda(split_image_text_types),
        "question": RunnablePassthrough(),
    }
    | RunnableLambda(img_prompt_func)
    | ChatVertexAI(
        temperature=0,
        model_name=MODEL_NAME,
        max_output_tokens=TOKEN_LIMIT,
    )  # Multi-modal LLM
    | StrOutputParser()
)

## Process user query

In [21]:
query = """
 - What is the purpose of the GitLab Handbook?
 - Can you explain why GitLab prefers documenting information in the handbook?
 - When did GitLab start the handbook and why?
 - How has the size of the GitLab Handbook changed over the years?
 - What are some benefits of using a handbook according to GitLab?
 - How does GitLab believe the handbook aids in team onboarding?
"""

### Get Retrieved documents

In [22]:
# List of source documents
docs = retriever_multi_vector_img.get_relevant_documents(query, limit=10)

source_docs = split_image_text_types(docs)

print(source_docs["texts"])

for i in source_docs["images"]:
    display(Image(base64.b64decode(i)))

  docs = retriever_multi_vector_img.get_relevant_documents(query, limit=10)


['22/10/2024, 14:54\n\nAbout the Handbook | The GitLab Handbook\n\nThe GitLab Handbook\n\nGitLab TeamOps Handbook Job Families Reports\n\nAbout the Handbook\n\nHistory of the handbook\n\nThe handbook started when GitLab was a company of just ten people to make sharing information efficient and easy. We knew that future GitLab team-members wouldnʼt be able to see emails about process changes that were being sent before they joined and that most of the people who would eventually join GitLab likely hadnʼt even heard of us yet. The handbook was our way of ensuring that all of our company information was accessible to everyone regardless of when they became part of the team.\n\nAdvantages\n\nAt GitLab our handbook is extensive and keeping it relevant is an important part of\n\neveryoneʼs job. It is a vital part of who we are and how we communicate. We established these processes because we saw these benefits:\n\n. Reading is much faster than listening.\n\n. Reading is async, you donʼt have

### Get generative response

In [23]:
result = chain_multimodal_rag.invoke(query)

Markdown(result)

## GitLab Handbook: A Deep Dive into its Purpose and Benefits

The GitLab Handbook is a comprehensive document that serves as the central repository of information for the company. It was created in the early days of GitLab, when the company was only ten people strong, to ensure efficient and accessible information sharing. 

Here's a breakdown of your questions and their answers:

**1. Purpose of the GitLab Handbook:**

The handbook's primary purpose is to provide a single source of truth for all company information, processes, and guidelines. It aims to:

* **Make information accessible to everyone:** Regardless of when they joined the company.
* **Ensure consistency and clarity:** By documenting processes and procedures.
* **Facilitate onboarding:** By providing new hires with a comprehensive overview of the company's culture and operations.
* **Promote transparency and collaboration:** By allowing everyone to contribute to the handbook through merge requests.

**2. Why GitLab prefers documenting information in the handbook:**

GitLab believes that documenting information in a handbook offers several advantages over other methods:

* **Efficiency:** Reading is faster than listening, allowing for quicker information absorption.
* **Asynchronous access:** Information can be accessed anytime, anywhere, without interrupting others.
* **Improved talent acquisition:** Potential candidates can understand GitLab's values and operations before joining.
* **Enhanced retention:** Employees are better informed about the company's culture and expectations.
* **Simplified onboarding:** New hires can easily find relevant information.
* **Streamlined teamwork:** Understanding how different parts of the company operate fosters collaboration.
* **Easier change management:** Discussing and communicating changes is simplified by referencing the handbook.

**3. When did GitLab start the handbook and why?**

The GitLab Handbook was started when the company was only ten people. The founders recognized the need for a centralized source of information to ensure that everyone, regardless of their start date, had access to the same information. This was particularly important as GitLab was growing rapidly and new employees needed to be quickly integrated into the company culture.

**4. How has the size of the GitLab Handbook changed over the years?**

The handbook has grown significantly over the years, reflecting the growth of GitLab itself. It is currently over two thousand pages long, showcasing the vast amount of information it contains. The provided tables show the historical word and page counts, demonstrating the continuous expansion of the handbook.

**5. Benefits of using a handbook according to GitLab:**

GitLab highlights several benefits of using a handbook:

* **Transparency:** Everyone has access to the same information, fostering a culture of openness.
* **Consistency:** Standardized processes and guidelines ensure consistency across the company.
* **Empowerment:** Documenting processes allows for easier identification of areas for improvement and encourages contributions from all team members.
* **Flexibility:** While the handbook provides a framework, it is not rigid and is constantly evolving to reflect changing needs.

**6. How does GitLab believe the handbook aids in team onboarding:**

The handbook serves as a comprehensive onboarding resource for new hires. It provides information on:

* **Company culture and values:** Helping new employees understand the company's philosophy and expectations.
* **Processes and procedures:** Providing a clear understanding of how things work at GitLab.
* **Team structure and roles:** Giving new hires context for their role within the organization.
* **Tools and resources:** Providing access to the necessary tools and resources for success.

By providing this comprehensive information, the handbook helps new hires quickly integrate into the company and become productive members of the team.

**Investment Advice:**

While the GitLab Handbook itself is not an investment opportunity, understanding its purpose and benefits can provide valuable insights into the company's culture and operations. This information can be helpful for investors looking to assess GitLab's long-term growth potential and its ability to attract and retain talent. 


## Conclusions

Congratulations on making it through this multimodal RAG notebook!

While multimodal RAG can be quite powerful, note that it can face some limitations:

* **Data dependency:** Needs high-accuracy data from the text and visuals.
* **Computationally demanding:** Generating embeddings from multimodal data is resource-intensive.
* **Domain specific:** Models trained on general data may not shine in specialized fields like medicine.
* **Black box:** Understanding how these models work can be tricky, hindering trust and adoption.


Despite these challenges, multimodal RAG represents a significant step towards search and retrieval systems that can handle diverse, multimodal data.