In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Multimodal Retrieval Augmented Generation (RAG) with Gemini, Vertex AI Vector Search, and LangChain

<table align="left">
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretrieval-augmented-generation%2Fmultimodal_rag_langchain.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
    <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|-|-|
|Author(s) | [Holt Skinner](https://github.com/holtskinner) |

## Overview

Retrieval augmented generation (RAG) has become a popular paradigm for enabling LLMs to access external data and also as a mechanism for grounding to mitigate against hallucinations.

In this notebook, you will learn how to perform multimodal RAG where you will perform Q&A over a financial document filled with both text and images.

### Gemini

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini 1.0 Pro Vision and Gemini 1.0 Pro models.

### Comparing text-based and multimodal RAG

Multimodal RAG offers several advantages over text-based RAG:

1. **Enhanced knowledge access:** Multimodal RAG can access and process both textual and visual information, providing a richer and more comprehensive knowledge base for the LLM.
2. **Improved reasoning capabilities:** By incorporating visual cues, multimodal RAG can make better informed inferences across different types of data modalities.

This notebook shows you how to use RAG with Gemini API in Vertex AI, and [multimodal embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/multimodal-embeddings), to build a document search engine.

Through hands-on examples, you will discover how to construct a multimedia-rich metadata repository of your document sources, enabling search, comparison, and reasoning across diverse information streams.

### Objectives

This notebook provides a guide to building a document search engine using multimodal retrieval augmented generation (RAG), step by step:

1. Extract and store metadata of documents containing both text and images, and generate embeddings the documents
2. Search the metadata with text queries to find similar text or images
3. Search the metadata with image queries to find similar images
4. Using a text query as input, search for contextual answers using both text and images

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Install Vertex AI SDK for Python and other dependencies

In [4]:
%pip install -U -q google-cloud-aiplatform langchain-core langchain-google-vertexai langchain-text-splitters langchain-community "unstructured[all-docs]" pypdf pydantic lxml pillow matplotlib opencv-python tiktoken

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [5]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [1]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

In [None]:
# """### Define Google Cloud project information"""

# PROJECT_ID = "gen-ai-project-1-435519"  # @param {type:"string"}
# LOCATION = "us-central1"  # @param {type:"string"}

# # For Vector Search Staging
# GCS_BUCKET = "gen_ai_2"  # @param {type:"string"}
# GCS_BUCKET_URI = f"gs://{GCS_BUCKET}"

### Define Google Cloud project information

In [2]:
PROJECT_ID = "gen-ai-project-1-435519"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

# For Vector Search Staging
GCS_BUCKET = "gen_ai_2"  # @param {type:"string"}
GCS_BUCKET_URI = f"gs://{GCS_BUCKET}"

### Initialize the Vertex AI SDK

In [3]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=GCS_BUCKET_URI)

### Import libraries

In [4]:
import base64
import os
import re
import uuid

from IPython.display import Image, Markdown, display
from langchain.prompts import PromptTemplate
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_core.documents import Document
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_google_vertexai import (
    ChatVertexAI,
    VectorSearchVectorStore,
    VertexAI,
    VertexAIEmbeddings,
)
from langchain_text_splitters import CharacterTextSplitter
from unstructured.partition.pdf import partition_pdf

# from langchain_community.vectorstores import Chroma  # Optional

### Define model information

- [Vertex AI - Model Information](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models)

In [5]:
MODEL_NAME = "gemini-1.5-flash"
GEMINI_OUTPUT_TOKEN_LIMIT = 8192

EMBEDDING_MODEL_NAME = "text-embedding-004"
EMBEDDING_TOKEN_LIMIT = 2048

TOKEN_LIMIT = min(GEMINI_OUTPUT_TOKEN_LIMIT, EMBEDDING_TOKEN_LIMIT)

## Data Loading

#### Get documents and images from GCS

In [6]:
# Download documents and images used in this notebook
!gsutil -m rsync -r gs://github-repo/rag/intro_multimodal_rag/ .
print("Download completed")


both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
Copying gs://github-repo/rag/intro_multimodal_rag/data/med_gemini.pdf...
Copying gs://github-repo/rag/intro_multimodal_rag/images/2022-alphabet-annual-report.pdf_image_10_0_79.jpeg...
Copying gs://github-repo/rag/intro_multimodal_rag/data/Google Cloud TPU blog.pdf...
Copying gs://github-repo/rag/intro_multimodal_rag/images/2022-alphabet-annual-report.pdf_image_12_2_90.jpeg...
Copying gs://github-repo/rag/intro_multimodal_rag/data/gemma_technical_paper.pdf...
Copying gs://github-repo/rag/intro_multimodal_rag/images/2022-alphabet-annual-report.pdf_image_14_0_95.jpeg...
Copying gs://github-repo/rag/intro_multimodal_rag/data/g

## Partition PDF tables, text, and images

### The data

The source data that you will use in this notebook is a modified version of [Google-10K](https://abc.xyz/assets/investor/static/pdf/20220202_alphabet_10K.pdf) which provides a comprehensive overview of the company's financial performance, business operations, management, and risk factors. As the original document is rather large, you will be using [a modified version with only 14 pages](https://storage.googleapis.com/github-repo/rag/multimodal_rag_langchain/google-10k-sample-14pages.pdf) instead. Although it's truncated, the sample document still contains text along with images such as tables, charts, and graphs.

In [7]:
!apt-get install -y poppler-utils


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  poppler-utils
0 upgraded, 1 newly installed, 0 to remove and 49 not upgraded.
Need to get 186 kB of archives.
After this operation, 696 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 poppler-utils amd64 22.02.0-2ubuntu0.5 [186 kB]
Fetched 186 kB in 1s (239 kB/s)
Selecting previously unselected package poppler-utils.
(Reading database ... 123629 files and directories currently installed.)
Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.5_amd64.deb ...
Unpacking poppler-utils (22.02.0-2ubuntu0.5) ...
Setting up poppler-utils (22.02.0-2ubuntu0.5) ...
Processing triggers for man-db (2.10.2-1) ...


In [8]:
!pip install pytesseract


Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Downloading pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13


In [9]:
import os
import sys

# Define the folder path
pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"

# Function to display the size of each file and calculate the total size
def display_file_sizes(folder_path):
    total_size = 0
    try:
        for dirpath, dirnames, filenames in os.walk(folder_path):
            for filename in filenames:
                file_path = os.path.join(dirpath, filename)
                if os.path.isfile(file_path):
                    file_size = os.path.getsize(file_path)
                    total_size += file_size
                    print(f"File: {filename}, Size: {file_size / (1024 * 1024):.2f} MB")
        print(f"\nTotal size of all files in '{folder_path}': {total_size / (1024 * 1024):.2f} MB")
    except Exception as e:
        print(f"Error: {e}")

# Call the function
display_file_sizes(pdf_folder_path)


File: Google Cloud TPU blog.pdf, Size: 0.96 MB
File: med_gemini.pdf, Size: 6.53 MB
File: gemini_v1_5_report_technical.pdf, Size: 3.95 MB
File: gemma_technical_paper.pdf, Size: 0.55 MB

Total size of all files in '/content/data/': 11.99 MB


In [10]:
pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
pdf_file_name = "gemma_technical_paper.pdf"

In [11]:
pdf_file_name

'gemma_technical_paper.pdf'

In [12]:
!pip install PyPDF2


Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [13]:
import os
from PyPDF2 import PdfReader

# Define folder path and file name
pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
pdf_file_name = "gemma_technical_paper.pdf"

# Full path to the PDF file
pdf_file_path = os.path.join(pdf_folder_path, pdf_file_name)

# Function to print the first page of a PDF
def print_first_page(pdf_path):
    try:
        # Check if the file exists
        if not os.path.exists(pdf_path):
            raise FileNotFoundError(f"File not found: {pdf_path}")

        # Load the PDF and extract the first page
        reader = PdfReader(pdf_path)
        first_page = reader.pages[0]
        text = first_page.extract_text()

        # Print the content of the first page
        print("First Page Content:\n")
        print(text if text else "No text found on the first page.")
    except Exception as e:
        print(f"Error: {e}")

# Call the function
print_first_page(pdf_file_path)


First Page Content:

2024-02-21
Gemma: Open Models Based on Gemini
Research and Technology
Gemma Team, Google DeepMind1
ThisworkintroducesGemma,afamilyoflightweight,state-of-theartopenmodelsbuiltfromtheresearch
and technology used to create Gemini models. Gemma models demonstrate strong performance across
academicbenchmarksforlanguageunderstanding, reasoning, andsafety. Wereleasetwosizesofmodels
(2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma
outperformssimilarlysizedopenmodelson11outof18text-basedtasks, andwepresentcomprehensive
evaluations of safety and responsibility aspects of the models, alongside a detailed description of model
development. We believe the responsible release of LLMs is critical for improving the safety of frontier
models, and for enabling the next wave of LLM innovations.
Introduction
We present Gemma, a family of open models
based on Google’s Gemini models (Gemini Team,
2023).
We trained Gemma models on up to 6

In [None]:
#/content/data/gemma_technical_paper.pdf

In [14]:
pdf_folder_path = "/content/data/"
pdf_file_name = "gemma_technical_paper.pdf"

In [15]:
import os
from unstructured.partition.pdf import partition_pdf

# Install Tesseract (if running in Colab)
!apt-get update
!apt-get install -y tesseract-ocr
!pip install pytesseract

# Set Tesseract path for Python (optional)
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "/usr/bin/tesseract"

0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:3 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:10 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,148 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [8,486 kB]
Get:13 https://r2u.stat

In [19]:
# Define folder path and file name
pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
pdf_file_name = "gemma_technical_paper.pdf"

# Full file path
pdf_file_path = os.path.join(pdf_folder_path, pdf_file_name)

# Check if file exists
if not os.path.exists(pdf_file_path):
    print(f"Error: File '{pdf_file_path}' does not exist.")
else:
    # Process the PDF file
    try:
        raw_pdf_elements = partition_pdf(
            filename=pdf_file_path,
            extract_images_in_pdf=False,
            infer_table_structure=True,
            chunking_strategy="by_title",
            max_characters=4000,
            new_after_n_chars=3800,
            combine_text_under_n_chars=2000,
            image_output_dir_path=pdf_folder_path,
        )
        print("PDF processed successfully.")
    except Exception as e:
        print(f"Error processing PDF: {e}")

PDF processed successfully.


In [46]:
pdf_file_path

'/content/data/gemma_technical_paper.pdf'

In [20]:
# pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
# pdf_file_name = "gemma_technical_paper.pdf"

# Extract images, tables, and chunk text from a PDF file.
# raw_pdf_elements = partition_pdf(
#     filename=pdf_file_name,
#     extract_images_in_pdf=False,
#     infer_table_structure=True,
#     chunking_strategy="by_title",
#     max_characters=4000,
#     new_after_n_chars=3800,
#     combine_text_under_n_chars=2000,
#     image_output_dir_path=pdf_folder_path,
# )

# Categorize extracted elements from a PDF into tables and texts.
tables = []
texts = []
for element in raw_pdf_elements:
    if "unstructured.documents.elements.Table" in str(type(element)):
        tables.append(str(element))
    elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
        texts.append(str(element))

# Optional: Enforce a specific token size for texts
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=10000, chunk_overlap=0
)
joined_texts = " ".join(texts)
texts_4k_token = text_splitter.split_text(joined_texts)

In [21]:
# Generate summaries of text elements


def generate_text_summaries(
    texts: list[str], tables: list[str], summarize_texts: bool = False
) -> tuple[list, list]:
    """
    Summarize text elements
    texts: List of str
    tables: List of str
    summarize_texts: Bool to summarize texts
    """

    # Prompt
    prompt_text = """You are an assistant tasked with summarizing tables and text for retrieval. \
    These summaries will be embedded and used to retrieve the raw text or table elements. \
    Give a concise summary of the table or text that is well optimized for retrieval. Table or text: {element} """
    prompt = PromptTemplate.from_template(prompt_text)
    empty_response = RunnableLambda(
        lambda x: AIMessage(content="Error processing document")
    )
    # Text summary chain
    model = VertexAI(
        temperature=0, model_name=MODEL_NAME, max_output_tokens=TOKEN_LIMIT
    ).with_fallbacks([empty_response])
    summarize_chain = {"element": lambda x: x} | prompt | model | StrOutputParser()

    # Initialize empty summaries
    text_summaries = []
    table_summaries = []

    # Apply to text if texts are provided and summarization is requested
    if texts:
        if summarize_texts:
            text_summaries = summarize_chain.batch(texts, {"max_concurrency": 1})
        else:
            text_summaries = texts

    # Apply to tables if tables are provided
    if tables:
        table_summaries = summarize_chain.batch(tables, {"max_concurrency": 1})

    return text_summaries, table_summaries


# Get text, table summaries
text_summaries, table_summaries = generate_text_summaries(
    texts_4k_token, tables, summarize_texts=True
)

In [22]:
def encode_image(image_path: str) -> str:
    """Getting the base64 string"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


def image_summarize(model: ChatVertexAI, base64_image: str, prompt: str) -> str:
    """Make image summary"""
    msg = model.invoke(
        [
            HumanMessage(
                content=[
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ]
            )
        ]
    )
    return msg.content


def generate_img_summaries(path: str) -> tuple[list[str], list[str]]:
    """
    Generate summaries and base64 encoded strings for images
    path: Path to list of .jpg files extracted by Unstructured
    """

    # Store base64 encoded images
    img_base64_list = []

    # Store image summaries
    image_summaries = []

    # Prompt
    prompt = """You are an assistant tasked with summarizing images for retrieval. \
    These summaries will be embedded and used to retrieve the raw image. \
    Give a concise summary of the image that is well optimized for retrieval.
    If it's a table, extract all elements of the table.
    If it's a graph, explain the findings in the graph.
    Do not include any numbers that are not mentioned in the image.
    """

    model = ChatVertexAI(model_name=MODEL_NAME, max_output_tokens=TOKEN_LIMIT)

    # Apply to images
    for img_file in sorted(os.listdir(path)):
        if img_file.endswith(".png"):
            base64_image = encode_image(os.path.join(path, img_file))
            img_base64_list.append(base64_image)
            image_summaries.append(image_summarize(model, base64_image, prompt))

    return img_base64_list, image_summaries


# Image summaries
img_base64_list, image_summaries = generate_img_summaries(".")

## Create & Deploy Vertex AI Vector Search Index & Endpoint

Skip this step if you already have Vector Search set up.

- https://console.cloud.google.com/vertex-ai/matching-engine/indexes

- Create [`MatchingEngineIndex`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndex)
  - https://cloud.google.com/vertex-ai/docs/vector-search/create-manage-index

In [None]:
# # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings
# DIMENSIONS = 768  # Dimensions output from textembedding-gecko

# index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
#     display_name="mm_rag_langchain_index",
#     dimensions=DIMENSIONS,
#     approximate_neighbors_count=150,
#     leaf_node_embedding_count=500,
#     leaf_nodes_to_search_percent=7,
#     description="Multimodal RAG LangChain Index",
#     index_update_method="STREAM_UPDATE",
# )

- Create [`MatchingEngineIndexEndpoint`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndexEndpoint)
  - https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public

In [None]:
# DEPLOYED_INDEX_ID = "mm_rag_langchain_index_endpoint"

# index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
#     display_name=DEPLOYED_INDEX_ID,
#     description="Multimodal RAG LangChain Index Endpoint",
#     public_endpoint_enabled=True,
# )

- Deploy Index to Index Endpoint
  - NOTE: This will take a while to run.
  - You can stop this cell after starting it instead of waiting for deployment.
  - You can check the status at https://console.cloud.google.com/vertex-ai/matching-engine/indexes

In [None]:
# index_endpoint = index_endpoint.deploy_index(
#     index=index, deployed_index_id="mm_rag_langchain_deployed_index"
# )
# index_endpoint.deployed_indexes

## Create retriever & load documents

- Create [`VectorSearchVectorStore`](https://api.python.langchain.com/en/latest/vectorstores/langchain_google_vertexai.vectorstores.vectorstores.VectorSearchVectorStore.html) with Vector Search Index ID and Endpoint ID.
- Use [`textembedding-gecko`](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings) as embedding model.

In [None]:
# # The vectorstore to use to index the summaries
# vectorstore = VectorSearchVectorStore.from_components(
#     project_id=PROJECT_ID,
#     region=LOCATION,
#     gcs_bucket_name=GCS_BUCKET,
#     index_id=index.name,
#     endpoint_id=index_endpoint.name,
#     embedding=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
#     stream_update=True,
# )

- Alternatively, use Chroma for a local vector store.

In [None]:
# vectorstore = Chroma(
#     collection_name="mm_rag_test",
#     embedding_function=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
# )

- Create Multi-Vector Retriever using the vector store you created.
- Since vector stores only contain the embedding and an ID, you'll also need to create a document store indexed by ID to get the original source documents after searching for embeddings.

In [None]:
# !pip install -U langchain langchain-community pypdf pinecone-client langchain-google-genai PyMuPDF
# !pip install langchain-google-genai
# !pip install langchain-community --force-reinstall # Force reinstall to update dependencies
# !pip install langchain-pinecone --force-reinstall # Force reinstall to update dependencies

In [3]:
!pip install langchain-pinecone

Collecting langchain-pinecone
  Downloading langchain_pinecone-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting aiohttp<3.10,>=3.9.5 (from langchain-pinecone)
  Downloading aiohttp-3.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.5 kB)
Collecting pinecone-client<6.0.0,>=5.0.0 (from langchain-pinecone)
  Downloading pinecone_client-5.0.1-py3-none-any.whl.metadata (19 kB)
Collecting pinecone-plugin-inference<2.0.0,>=1.0.3 (from pinecone-client<6.0.0,>=5.0.0->langchain-pinecone)
  Downloading pinecone_plugin_inference-1.1.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone-client<6.0.0,>=5.0.0->langchain-pinecone)
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Downloading langchain_pinecone-0.2.0-py3-none-any.whl (11 kB)
Downloading aiohttp-3.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m

In [4]:
# Configuration dictionary remains the same
cred = {
    "PINECONE_API_KEY": "c2da52e9-c957-4584-89d5-bd8e6c3a4038",
    "PINECONE_HOST": "https://gemini-test-du051w4.svc.gcp-us-central1-4a9f.pinecone.io",
    "PINECONE_ENVIRONMENT": "us-central1",
    "PINECONE_NAMESPACE": "alectify",
    "PINECONE_INDEX": "operations-stage-768",
    "GEMINI_API_KEY": "AIzaSyD_XMBYHKI6qRNbmWd6TNbGerIJ2YFMPUE",
    "GOOGLE_API_KEY": "AIzaSyD_XMBYHKI6qRNbmWd6TNbGerIJ2YFMPUE"
}

In [40]:
from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore
pc = Pinecone(api_key=cred["PINECONE_API_KEY"])
# Get Pinecone index
index = pc.Index(cred["PINECONE_INDEX"])
namespace = "32b69824-1342-4776-8c0f-c56587881021"
  #Initialize the Pinecone vector store
formatted_namespace = f"company-{namespace}" if namespace else "company-default"
# vector_store = PineconeVectorStore(
#               index=index,
#               embedding=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
#           )
# Get stats for the entire index
all_stats = index.describe_index_stats()

# Extract and print the total number of embeddings in the entire index
total_embedding_count = all_stats.get("total_vector_count", 0)
print(f"Total number of embeddings in the entire index: {total_embedding_count}")

# Get list of namespaces in the index
namespaces = all_stats.get("namespaces", [])
print(namespace_stats)

Total number of embeddings in the entire index: 3158602
{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 24},
                '32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 332},
                'amar': {'vector_count': 83},
                'company-00cf4dc2-22aa-43ad-baa7-ffaf4a9f3df6': {'vector_count': 451},
                'company-03267e75-ac06-4a5b-b1c1-3167330e9b23': {'vector_count': 153071},
                'company-03e13178-7099-47b6-9a41-d2eb77e60f32': {'vector_count': 1032241},
                'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': {'vector_count': 446},
                'company-30465ecf-44d3-4483-b463-2e9f4392e689': {'vector_count': 14},
                'company-32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 71611},
                'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645},
                'company-3753b47c-37a6-4e80-9a1a-5368db8a8f48': {'vector_count': 99},
                'company-477c1f

In [42]:
from pinecone import Pinecone

# Initialize Pinecone
pc = Pinecone(api_key=cred["PINECONE_API_KEY"])

# Get Pinecone index
index = pc.Index(cred["PINECONE_INDEX"])

# Define the namespace
namespace = "32b69824-1342-4776-8c0f-c56587881021"
formatted_namespace = f"company-{namespace}" if namespace else "company-default"

# Get stats for the entire index
all_stats = index.describe_index_stats()

# Extract and print the total number of embeddings in the entire index
total_embedding_count = all_stats.get("total_vector_count", 0)
print(f"Total number of embeddings in the entire index: {total_embedding_count}")

# Get namespace-specific stats
namespace_stats = all_stats.get("namespaces", {}).get(formatted_namespace, {})

# Extract and print the total vector count for the specific namespace
namespace_vector_count = namespace_stats.get("vector_count", 0)
print(f"Total number of embeddings in the namespace '{formatted_namespace}': {namespace_vector_count}")

# Also, print all namespace stats for debugging if needed
print("Namespace stats:", all_stats.get("namespaces", {}))


Total number of embeddings in the entire index: 3158614
Total number of embeddings in the namespace 'company-32b69824-1342-4776-8c0f-c56587881021': 71659
Namespace stats: {'company-c615a354-695b-4e31-b86d-0aa461f721b1': {'vector_count': 1239}, '': {'vector_count': 24}, '32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 332}, 'company-852a2995-85f3-4ef2-932c-3ecbdbd29084': {'vector_count': 288}, 'company-fa2e7a88-b169-4107-8b8a-394066fd0738': {'vector_count': 13755}, 'company-32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 71659}, 'company-8db47b3d-41ff-4544-b5f8-0731cd5b546b': {'vector_count': 166}, 'company-b55922e4-af40-49f8-a8ad-9fcf5680ca46': {'vector_count': 10}, 'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': {'vector_count': 446}, 'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645}, 'company-7a5decbb-0002-4944-926a-e9470a76bd4c': {'vector_count': 90}, 'company-abfaa9a3-3ada-46f7-a84f-112cfb8b7bf2': {'vector_count': 2021}, 'company-abc': {'vector_c

In [44]:
from pinecone import Pinecone

# Initialize Pinecone
pc = Pinecone(api_key=cred["PINECONE_API_KEY"])

# Get Pinecone index
index = pc.Index(cred["PINECONE_INDEX"])

# Define the namespace
namespace = "32b69824-1342-4776-8c0f-c56587881021"
formatted_namespace = f"company-{namespace}" if namespace else "company-default"

# Get stats for the entire index
all_stats = index.describe_index_stats()

# Extract and print the total number of embeddings in the entire index
total_embedding_count = all_stats.get("total_vector_count", 0)
print(f"Total number of embeddings in the entire index: {total_embedding_count}")

# Get namespace-specific stats
namespace_stats = all_stats.get("namespaces", {}).get(formatted_namespace, {})

# Extract and print the total vector count for the specific namespace
namespace_vector_count = namespace_stats.get("vector_count", 0)
print(f"Total number of embeddings in the namespace '{formatted_namespace}': {namespace_vector_count}")

# Also, print all namespace stats for debugging if needed
print("Namespace stats:", all_stats.get("namespaces", {}))


Total number of embeddings in the entire index: 3158626
Total number of embeddings in the namespace 'company-32b69824-1342-4776-8c0f-c56587881021': 71671
Namespace stats: {'company-7e8d4361-a6ab-4b80-be92-a7ff5850ad44': {'vector_count': 638}, 'company-amar': {'vector_count': 860}, 'company-3753b47c-37a6-4e80-9a1a-5368db8a8f48': {'vector_count': 99}, 'company-852a2995-85f3-4ef2-932c-3ecbdbd29084': {'vector_count': 288}, 'company-9dea9a95-430b-408d-baee-21d98e9a0896': {'vector_count': 455}, 'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645}, 'company-8db47b3d-41ff-4544-b5f8-0731cd5b546b': {'vector_count': 166}, 'company-e2dd0852-599c-43cc-b64b-c361366324a4': {'vector_count': 989}, 'company-700300400': {'vector_count': 406}, 'company-a86ce217-3b83-4bc2-89ac-45881dca1ce2': {'vector_count': 262071}, 'company-d28fc4f7-8dac-44a5-8753-4727ef8fc1a7': {'vector_count': 24920}, 'company-da69369c-137d-46af-b8af-d7aba34b9d73': {'vector_count': 1080}, 'company-03267e75-ac06-4a5b-b1

In [41]:
from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore
pc = Pinecone(api_key=cred["PINECONE_API_KEY"])
# Get Pinecone index
index = pc.Index(cred["PINECONE_INDEX"])
namespace = "32b69824-1342-4776-8c0f-c56587881021"
  #Initialize the Pinecone vector store
formatted_namespace = f"company-{namespace}" if namespace else "company-default"
# vector_store = PineconeVectorStore(
#               index=index,
#               embedding=VertexAIEmbeddings(model_name=EMBEDDING_MODEL_NAME),
#           )
# Get stats for the entire index
all_stats = index.describe_index_stats()

# Extract and print the total number of embeddings in the entire index
total_embedding_count = all_stats.get("total_vector_count", 0)
print(f"Total number of embeddings in the entire index: {total_embedding_count}")

# Get list of namespaces in the index
namespaces = all_stats.get("namespaces", [])
print(namespace_stats)
# # Iterate through each namespace and print stats for each
# for namespace in namespaces:
#     namespace_stats = index.describe_index_stats(namespace=namespace)
#     namespace_embedding_count = namespace_stats.get("total_vector_count", 0)
#     print(f"Total number of embeddings in namespace '{namespace}': {namespace_embedding_count}")


Total number of embeddings in the entire index: 3158614
{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 24},
                '32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 332},
                'amar': {'vector_count': 83},
                'company-00cf4dc2-22aa-43ad-baa7-ffaf4a9f3df6': {'vector_count': 451},
                'company-03267e75-ac06-4a5b-b1c1-3167330e9b23': {'vector_count': 153071},
                'company-03e13178-7099-47b6-9a41-d2eb77e60f32': {'vector_count': 1032241},
                'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': {'vector_count': 446},
                'company-30465ecf-44d3-4483-b463-2e9f4392e689': {'vector_count': 14},
                'company-32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 71611},
                'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645},
                'company-3753b47c-37a6-4e80-9a1a-5368db8a8f48': {'vector_count': 99},
                'company-477c1f

In [None]:
Total number of embeddings in the entire index: 3158602
{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 24},
                '32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 332},
                'amar': {'vector_count': 83},
                'company-00cf4dc2-22aa-43ad-baa7-ffaf4a9f3df6': {'vector_count': 451},
                'company-03267e75-ac06-4a5b-b1c1-3167330e9b23': {'vector_count': 153071},
                'company-03e13178-7099-47b6-9a41-d2eb77e60f32': {'vector_count': 1032241},
                'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': {'vector_count': 446},
                'company-30465ecf-44d3-4483-b463-2e9f4392e689': {'vector_count': 14},
                'company-32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 71611},
                'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645},
                'company-3753b47c-37a6-4e80-9a1a-5368db8a8f48': {'vector_count': 99},
                'company-477c1fac-8f68-4344-b4e8-aeae8430d3bf': {'vector_count': 1563142},
                'company-700300400': {'vector_count': 406},
                'company-7a5decbb-0002-4944-926a-e9470a76bd4c': {'vector_count': 90},
                'company-7e8d4361-a6ab-4b80-be92-a7ff5850ad44': {'vector_count': 638},
                'company-852a2995-85f3-4ef2-932c-3ecbdbd29084': {'vector_count': 288},
                'company-8db47b3d-41ff-4544-b5f8-0731cd5b546b': {'vector_count': 166},
                'company-9dea9a95-430b-408d-baee-21d98e9a0896': {'vector_count': 455},
                'company-a7cd4ab1-3e17-4c2f-adb5-b4af76427d32': {'vector_count': 60},
                'company-a86ce217-3b83-4bc2-89ac-45881dca1ce2': {'vector_count': 262071},
                'company-abc': {'vector_count': 249},
                'company-abfaa9a3-3ada-46f7-a84f-112cfb8b7bf2': {'vector_count': 2021},
                'company-amar': {'vector_count': 860},
                'company-b55922e4-af40-49f8-a8ad-9fcf5680ca46': {'vector_count': 10},
                'company-b75d2156-15b6-4782-b4c5-c8334cab0574': {'vector_count': 24445},
                'company-c615a354-695b-4e31-b86d-0aa461f721b1': {'vector_count': 1239},
                'company-d28fc4f7-8dac-44a5-8753-4727ef8fc1a7': {'vector_count': 24920},
                'company-da69369c-137d-46af-b8af-d7aba34b9d73': {'vector_count': 1080},
                'company-dbc0c3dc-de92-40ae-82a8-e72562fc804d': {'vector_count': 473},
                'company-e2dd0852-599c-43cc-b64b-c361366324a4': {'vector_count': 989},
                'company-fa2e7a88-b169-4107-8b8a-394066fd0738': {'vector_count': 13755},
                'default-namespace': {'vector_count': 2192}},
 'total_vector_count': 3158566}

In [38]:
# Get stats for the entire index
all_stats = index.describe_index_stats()

# Extract and print the total number of embeddings in the entire index
total_embedding_count = all_stats.get("total_vector_count", 0)
print(f"Total number of embeddings in the entire index: {total_embedding_count}")

Total number of embeddings in the entire index: 3158590


In [22]:
# Get stats for the entire index
all_stats = index.describe_index_stats()

# Extract and print the total number of embeddings in the entire index
total_embedding_count = all_stats.get("total_vector_count", 0)
print(f"Total number of embeddings in the entire index: {total_embedding_count}")

# Get list of namespaces in the index
namespaces = all_stats.get("namespaces", [])

# Iterate through each namespace and print stats for each
for namespace in namespaces:
    namespace_stats = index.describe_index_stats(namespace=namespace)
    namespace_embedding_count = namespace_stats.get("total_vector_count", 0)
    print(f"Total number of embeddings in namespace '{namespace}': {namespace_embedding_count}")


Total number of embeddings in the entire index: 3158542
Total number of embeddings in namespace 'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': 3158542
Total number of embeddings in namespace '': 3158542
Total number of embeddings in namespace 'company-b55922e4-af40-49f8-a8ad-9fcf5680ca46': 3158542
Total number of embeddings in namespace 'company-da69369c-137d-46af-b8af-d7aba34b9d73': 3158542
Total number of embeddings in namespace 'company-e2dd0852-599c-43cc-b64b-c361366324a4': 3158542
Total number of embeddings in namespace 'company-amar': 3158542
Total number of embeddings in namespace 'company-30465ecf-44d3-4483-b463-2e9f4392e689': 3158542
Total number of embeddings in namespace 'company-d28fc4f7-8dac-44a5-8753-4727ef8fc1a7': 3158542
Total number of embeddings in namespace 'company-7a5decbb-0002-4944-926a-e9470a76bd4c': 3158542
Total number of embeddings in namespace 'default-namespace': 3158542
Total number of embeddings in namespace 'amar': 3158542
Total number of embeddings in n

In [23]:
print(namespace_stats)

{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 24},
                '32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 332},
                'amar': {'vector_count': 83},
                'company-00cf4dc2-22aa-43ad-baa7-ffaf4a9f3df6': {'vector_count': 451},
                'company-03267e75-ac06-4a5b-b1c1-3167330e9b23': {'vector_count': 153071},
                'company-03e13178-7099-47b6-9a41-d2eb77e60f32': {'vector_count': 1032241},
                'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': {'vector_count': 446},
                'company-30465ecf-44d3-4483-b463-2e9f4392e689': {'vector_count': 14},
                'company-32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 71587},
                'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645},
                'company-3753b47c-37a6-4e80-9a1a-5368db8a8f48': {'vector_count': 99},
                'company-477c1fac-8f68-4344-b4e8-aeae8430d3bf': {'vector_count': 156314

In [19]:
print(namespace_stats)

{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 24},
                '32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 332},
                'amar': {'vector_count': 83},
                'company-00cf4dc2-22aa-43ad-baa7-ffaf4a9f3df6': {'vector_count': 451},
                'company-03267e75-ac06-4a5b-b1c1-3167330e9b23': {'vector_count': 153071},
                'company-03e13178-7099-47b6-9a41-d2eb77e60f32': {'vector_count': 1032241},
                'company-08c7b330-09f1-4e98-9c7a-84a0d6066784': {'vector_count': 446},
                'company-30465ecf-44d3-4483-b463-2e9f4392e689': {'vector_count': 14},
                'company-32b69824-1342-4776-8c0f-c56587881021': {'vector_count': 71575},
                'company-341e3e63-ad79-4d8f-8add-922688f7e9c8': {'vector_count': 645},
                'company-3753b47c-37a6-4e80-9a1a-5368db8a8f48': {'vector_count': 99},
                'company-477c1fac-8f68-4344-b4e8-aeae8430d3bf': {'vector_count': 156314

In [None]:
# Total number of embeddings in the entire index: 3157687

In [36]:
vectorstore = vector_store

In [37]:
docstore = InMemoryStore()

id_key = "doc_id"
# Create the multi-vector retriever
retriever_multi_vector_img = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    id_key=id_key,
)

- Load data into Document Store and Vector Store

In [44]:
# Raw Document Contents
doc_contents = texts + tables + img_base64_list

doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(text_summaries + table_summaries + image_summaries)
]

retriever_multi_vector_img.docstore.mset(list(zip(doc_ids, doc_contents)))

# If using Vertex AI Vector Search, this will take a while to complete.
# You can cancel this cell and continue later.
retriever_multi_vector_img.vectorstore.add_documents(summary_docs, namespace=formatted_namespace)

['12300e0b-ba9d-4725-881d-30665e395196',
 '1e977045-4db4-447e-a24b-d6b538b805fd',
 '68845e63-7adc-4e11-aa5d-cc9294c09883',
 'd8944260-f4c9-4fcb-a4cb-950f88dfb3ab',
 'cd14f8c8-9eba-4c7f-ba78-1fb9d1f07e24',
 'ad24da5b-0bdf-441e-8b39-8a192dd94926',
 '87be9387-68ac-4d33-9e0e-ce06dfae6d83',
 '98368213-ca71-4e25-8010-d0467c5a2a13',
 '73676e01-87d7-4ed1-9b8c-9c7f54507231',
 '6f92d3c0-0047-441f-9998-e40963982bdb',
 '05a70748-57a4-4c0b-ac4a-2ec4138e8209',
 'ff963f18-d9d3-4b34-b041-52b674add48a']

## Create Chain with Retriever and Gemini LLM

In [29]:
def looks_like_base64(sb):
    """Check if the string looks like base64"""
    return re.match("^[A-Za-z0-9+/]+[=]{0,2}$", sb) is not None


def is_image_data(b64data):
    """
    Check if the base64 data is an image by looking at the start of the data
    """
    image_signatures = {
        b"\xFF\xD8\xFF": "jpg",
        b"\x89\x50\x4E\x47\x0D\x0A\x1A\x0A": "png",
        b"\x47\x49\x46\x38": "gif",
        b"\x52\x49\x46\x46": "webp",
    }
    try:
        header = base64.b64decode(b64data)[:8]  # Decode and get the first 8 bytes
        for sig, format in image_signatures.items():
            if header.startswith(sig):
                return True
        return False
    except Exception:
        return False


def split_image_text_types(docs):
    """
    Split base64-encoded images and texts
    """
    b64_images = []
    texts = []
    for doc in docs:
        # Check if the document is of type Document and extract page_content if so
        if isinstance(doc, Document):
            doc = doc.page_content
        if looks_like_base64(doc) and is_image_data(doc):
            b64_images.append(doc)
        else:
            texts.append(doc)
    return {"images": b64_images, "texts": texts}


def img_prompt_func(data_dict):
    """
    Join the context into a single string
    """
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = [
        {
            "type": "text",
            "text": (
                "You are financial analyst tasking with providing investment advice.\n"
                "You will be given a mix of text, tables, and image(s) usually of charts or graphs.\n"
                "Use this information to provide investment advice related to the user's question. \n"
                f"User-provided question: {data_dict['question']}\n\n"
                "Text and / or tables:\n"
                f"{formatted_texts}"
            ),
        }
    ]

    # Adding image(s) to the messages if present
    if data_dict["context"]["images"]:
        for image in data_dict["context"]["images"]:
            messages.append(
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{image}"},
                }
            )
    return [HumanMessage(content=messages)]


# Create RAG chain
chain_multimodal_rag = (
    {
        "context": retriever_multi_vector_img | RunnableLambda(split_image_text_types),
        "question": RunnablePassthrough(),
    }
    | RunnableLambda(img_prompt_func)
    | ChatVertexAI(
        temperature=0,
        model_name=MODEL_NAME,
        max_output_tokens=TOKEN_LIMIT,
    )  # Multi-modal LLM
    | StrOutputParser()
)

## Process user query

In [40]:
query = """
How does the Gemma family of open models compare to similarly-sized open models in terms of performance on text-based tasks, and what are the key advancements in its development?
"""

In [30]:
query = """
 - What are the critical difference between various graphs for Class A Share?
 - Which index best matches Class A share performance closely where Google is not already a part? Explain the reasoning.
 - Identify key chart patterns for Google Class A shares.
 - What is cost of revenues, operating expenses and net income for 2020. Do mention the percentage change
 - What was the effect of Covid in the 2020 financial year?
 - What are the total revenues for APAC and USA for 2021?
 - What is deferred income taxes?
 - How do you compute net income per share?
 - What drove percentage change in the consolidated revenue and cost of revenue for the year 2021 and was there any effect of Covid?
 - What is the cause of 41% increase in revenue from 2020 to 2021 and how much is dollar change?
"""

### Get Retrieved documents

In [41]:
# List of source documents
docs = retriever_multi_vector_img.get_relevant_documents(query, limit=10)

source_docs = split_image_text_types(docs)

print(source_docs["texts"])

for i in source_docs["images"]:
    display(Image(base64.b64decode(i)))

['2024\n\n2403.08295v4 [cs.CL] 16 Apr\n\narXiv\n\nGoogle DeepMind\n\n2024-02-21\n\nGemma: Open Models Based on Gemini Research and Technology\n\nGemma Team, Google DeepMind!\n\nThis work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.\n\n(2', 'Table 3 | Relevant formatting control to

### Get generative response

In [42]:
result = chain_multimodal_rag.invoke(query)

Markdown(result)

Based on the provided information, the Gemma family of open models demonstrates strong performance on text-based tasks compared to similarly-sized open models. Here's a breakdown:

**Performance:**

* **Outperforms:** Gemma models outperform similarly sized open models on 11 out of 18 text-based tasks. This suggests they are highly competitive in terms of language understanding, reasoning, and overall performance.
* **Benchmark Performance:** The paper mentions strong performance across academic benchmarks, indicating the models have been rigorously tested and validated.

**Key Advancements:**

* **Gemini Research and Technology:** Gemma models are built using research and technology from the Gemini models, which are known for their advanced capabilities. This suggests Gemma models benefit from cutting-edge advancements in language modeling.
* **Safety and Responsibility:** The paper emphasizes the importance of responsible release and provides comprehensive evaluations of safety and responsibility aspects. This indicates a focus on ethical considerations and mitigating potential risks associated with large language models.
* **Detailed Model Development:** The paper includes a detailed description of model development, providing insights into the design choices and training processes. This transparency is valuable for understanding the model's strengths and limitations.

**Investment Advice:**

While the provided information is limited, the strong performance and focus on safety and responsibility suggest Gemma models have potential for various applications. However, it's important to consider the following:

* **Model Size:** The paper mentions two sizes (billion and 7 billion parameters).  Understanding the specific performance differences between these sizes is crucial for choosing the right model for your needs.
* **Specific Tasks:**  The paper mentions 11 out of 18 text-based tasks where Gemma models outperform.  Knowing the specific tasks where they excel is essential for determining if they are suitable for your application.
* **Future Development:** The paper highlights the importance of responsible release and innovation.  Continued development and improvements in Gemma models are likely, which could further enhance their capabilities.

**Overall:**

The Gemma family of open models shows promise for text-based tasks.  Further research and analysis are needed to fully assess their potential and suitability for specific applications.  However, the strong performance, focus on safety, and detailed development information suggest they are worth considering for investment in the field of open language models. 


## Conclusions

Congratulations on making it through this multimodal RAG notebook!

While multimodal RAG can be quite powerful, note that it can face some limitations:

* **Data dependency:** Needs high-accuracy data from the text and visuals.
* **Computationally demanding:** Generating embeddings from multimodal data is resource-intensive.
* **Domain specific:** Models trained on general data may not shine in specialized fields like medicine.
* **Black box:** Understanding how these models work can be tricky, hindering trust and adoption.


Despite these challenges, multimodal RAG represents a significant step towards search and retrieval systems that can handle diverse, multimodal data.