In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Multimodal Retrieval Augmented Generation (RAG) with Gemini, Vertex AI, and LangChain

<table align="left">
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretrieval-augmented-generation%2F%2Fmultimodal_rag_langchain.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
    <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | | 
|-|-|
|Author(s) | [Holt Skinner](https://github.com/holtskinner) |

## Overview

Retrieval augmented generation (RAG) has become a popular paradigm for enabling LLMs to access external data and also as a mechanism for grounding to mitigate against hallucinations.

In this notebook, you will learn how to perform multimodal RAG where you will perform Q&A over a financial document filled with both text and images.

### Gemini

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini 1.0 Pro Vision and Gemini 1.0 Pro models.

### Comparing text-based and multimodal RAG

Multimodal RAG offers several advantages over text-based RAG:

1. **Enhanced knowledge access:** Multimodal RAG can access and process both textual and visual information, providing a richer and more comprehensive knowledge base for the LLM.
2. **Improved reasoning capabilities:** By incorporating visual cues, multimodal RAG can make better informed inferences across different types of data modalities.

This notebook shows you how to use RAG with Vertex AI Gemini API, and [multimodal embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/multimodal-embeddings), to build a document search engine.

Through hands-on examples, you will discover how to construct a multimedia-rich metadata repository of your document sources, enabling search, comparison, and reasoning across diverse information streams.

### Objectives

This notebook provides a guide to building a document search engine using multimodal retrieval augmented generation (RAG), step by step:

1. Extract and store metadata of documents containing both text and images, and generate embeddings the documents
2. Search the metadata with text queries to find similar text or images
3. Search the metadata with image queries to find similar images
4. Using a text query as input, search for contexual answers using both text and images

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.


## Getting Started


### Install Vertex AI SDK for Python and other dependencies


In [8]:
%pip install -U -q google-cloud-aiplatform langchain-google-vertexai langchain-text-splitters langchain-experimental "unstructured[all-docs]" pypdf pydantic lxml pillow matplotlib opencv-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>



### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).


In [5]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Define Google Cloud project information

In [1]:
PROJECT_ID = "cloud-llm-preview1"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}
# For Vector Search Staging
GCS_BUCKET = "gs://holtskinner-test-datasets"  # @param {type:"string"}

### Initialize the Vertex AI SDK

In [2]:
import vertexai
from google.cloud import aiplatform

vertexai.init(project=PROJECT_ID, location=LOCATION)
aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=GCS_BUCKET)

### Import libraries


In [2]:
import base64
import os
from typing import List, Tuple

from langchain_text_splitters import CharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain_google_vertexai import VertexAI
from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import AIMessage
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda

from unstructured.partition.pdf import partition_pdf

## Data Loading

#### Get documents and images from GCS

In [None]:
# Download documents and images used in this notebook
!gsutil -m rsync -r gs://github-repo/rag/intro_multimodal_rag/ .
print("Download completed")

## Partition PDF tables, text, and images

### The data

The source data that you will use in this notebook is a modified version of [Google-10K](https://abc.xyz/assets/investor/static/pdf/20220202_alphabet_10K.pdf) which provides a comprehensive overview of the company's financial performance, business operations, management, and risk factors. As the original document is rather large, you will be using [a modified version with only 14 pages](https://storage.googleapis.com/github-repo/rag/multimodal_rag_langchain/google-10k-sample-14pages.pdf) instead. Although it's truncated, the sample document still contains text along with images such as tables, charts, and graphs.

In [9]:
pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
pdf_file_name = "google-10k-sample-14pages.pdf"

# Extract images, tables, and chunk text from a PDF file.
raw_pdf_elements = partition_pdf(
    filename=pdf_file_name,
    extract_images_in_pdf=False,
    infer_table_structure=True,
    chunking_strategy="by_title",
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
    image_output_dir_path=pdf_folder_path,
)

# Categorize extracted elements from a PDF into tables and texts.
tables = []
texts = []
for element in raw_pdf_elements:
    if "unstructured.documents.elements.Table" in str(type(element)):
        tables.append(str(element))
    elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
        texts.append(str(element))

# Optional: Enforce a specific token size for texts
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=10000, chunk_overlap=0
)
joined_texts = " ".join(texts)
texts_4k_token = text_splitter.split_text(joined_texts)

# # Chunking is not needed with a large context window.
# joined_texts = [" ".join(texts)]

This function will be deprecated in a future release and `unstructured` will simply use the DEFAULT_MODEL from `unstructured_inference.model.base` to set default model name


In [10]:
# Generate summaries of text elements
def generate_text_summaries(texts: List[str], tables: List[str], summarize_texts: bool=False) -> Tuple[List, List]:
    """
    Summarize text elements
    texts: List of str
    tables: List of str
    summarize_texts: Bool to summarize texts
    """

    # Prompt
    prompt_text = """You are an assistant tasked with summarizing tables and text for retrieval. \
    These summaries will be embedded and used to retrieve the raw text or table elements. \
    Give a concise summary of the table or text that is well optimized for retrieval. Table or text: {element} """
    prompt = PromptTemplate.from_template(prompt_text)
    empty_response = RunnableLambda(
        lambda x: AIMessage(content="Error processing document")
    )
    # Text summary chain
    model = VertexAI(
        temperature=0, model_name="gemini-1.5-pro-preview-0215", max_output_tokens=1024
    ).with_fallbacks([empty_response])
    summarize_chain = {"element": lambda x: x} | prompt | model | StrOutputParser()

    # Initialize empty summaries
    text_summaries = []
    table_summaries = []

    # Apply to text if texts are provided and summarization is requested
    if texts:
        if summarize_texts:
            text_summaries = summarize_chain.batch(texts, {"max_concurrency": 1})
        else:
            text_summaries = texts

    # Apply to tables if tables are provided
    if tables:
        table_summaries = summarize_chain.batch(tables, {"max_concurrency": 1})

    return text_summaries, table_summaries


# Get text, table summaries
text_summaries, table_summaries = generate_text_summaries(
    texts_4k_token, tables, summarize_texts=True
)

In [12]:
table_summaries

['## Summary of Stock Purchase Program (October - December)\n\nThis table details a stock purchase program over three months, showing the number and value of Class A and Class C shares purchased, along with average prices. \n\n**Key Points:**\n\n* A total of 665,000 shares were purchased for $4.672 million.\n* More Class A shares were purchased than Class C shares.\n* The average price per share was slightly higher for Class C shares.\n* The number of shares purchased and total value spent decreased each month.\n\n**Additional Information:**\n\n* The table does not specify the company or the purpose of the stock purchase program.\n* There may be additional shares available for purchase in the future.\n',
 '## Unvested Restricted Stock Units Summary:\n\n**As of December 31, 2021, there were 16,894,713 unvested restricted stock units with a weighted average fair value of $1,626.13 per share.** This represents a decrease from 2020, when there were 19,288,793 unvested units with a fair val

In [13]:
text_summaries

['## Alphabet Inc. 2021 Annual Report (10-K) Summary: Market & Financial Performance\n\nThis document summarizes Alphabet Inc.\'s 2021 annual report, focusing on market performance, stock information, and key financial results.\n\n**Market & Stock:**\n\n* Alphabet Inc. became the successor issuer of Google Inc. in 2015.\n* Class A common stock trades on Nasdaq under "GOOGL" and "GOOG".\n* Class C capital stock trades on Nasdaq under "GOOG".\n* Class B common stock is not publicly traded.\n* No cash dividends have been declared or paid.\n* The company actively repurchases Class A and Class C shares.\n* Both Class A and Class C stock outperformed major indices over the past 5 years.\n\n**Financial Highlights:**\n\n* **Revenues:** $257.6 billion, up 41% year-over-year, driven by Google Services and Google Cloud.\n* **Operating Income:** $78.7 billion, up 91% year-over-year.\n* **Net Income:** $76.0 billion, up 89% year-over-year.\n* **Diluted EPS:** $112.20, up 91% year-over-year.\n* **Op

In [20]:
def encode_image(image_path):
    """Getting the base64 string"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


def image_summarize(img_base64, prompt):
    """Make image summary"""
    model = ChatVertexAI(
        model_name="gemini-pro-vision", max_output_tokens=1024
    )

    msg = model(
        [
            HumanMessage(
                content=[
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{img_base64}"},
                    },
                ]
            )
        ]
    )
    return msg.content


def generate_img_summaries(path):
    """
    Generate summaries and base64 encoded strings for images
    path: Path to list of .jpg files extracted by Unstructured
    """

    # Store base64 encoded images
    img_base64_list = []

    # Store image summaries
    image_summaries = []

    # Prompt
    prompt = """You are an assistant tasked with summarizing images for retrieval. \
    These summaries will be embedded and used to retrieve the raw image. \
    Give a concise summary of the image that is well optimized for retrieval.
    If it's a table, extract all elements of the table.
    If it's a graph, explain the findings in the graph.
    Do not include any numbers that are not mentioned in the image.
    """

    # Apply to images
    for img_file in sorted(os.listdir(path)):
        if img_file.endswith(".png"):
            img_path = os.path.join(path, img_file)
            base64_image = encode_image(img_path)
            img_base64_list.append(base64_image)
            image_summaries.append(image_summarize(base64_image, prompt))

    return img_base64_list, image_summaries


# Image summaries
img_base64_list, image_summaries = generate_img_summaries(".")

In [21]:
image_summaries

[' The image is a line chart that shows the stock price of Alphabet Inc. Class A, RDG Internet Composite, S&P 500, and NASDAQ Composite from December 2016 to December 2021. The stock price of Alphabet Inc. Class A and RDG Internet Composite outperformed the S&P 500 and NASDAQ Composite during this time period.',
 ' The image shows a table with four rows. The first row is labeled "TAC" and contains the values "$32,778" and "$45,566" in the "2020" and "2021" columns, respectively. The second row is labeled "Other cost of revenues" and contains the values "$51,954" and "$65,373" in the "2020" and "2021" columns, respectively. The third row is labeled "Total cost of revenues" and contains the values "$84,732" and "$110,939" in the "2020" and "2021" columns, respectively. The fourth row is labeled "Total cost of revenues as a percentage of revenues" and contains the values "46.4%" and "43.1%" in the "2020" and "2021" columns, respectively.']

In [23]:
import uuid

from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_core.documents import Document
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_google_vertexai import VectorSearchVectorStore

In [24]:
def create_multi_vector_retriever(
    vectorstore, text_summaries, texts, table_summaries, tables, image_summaries, images
):
    """
    Create retriever that indexes summaries, but returns raw images or texts
    """

    # Initialize the storage layer
    store = InMemoryStore()
    id_key = "doc_id"

    # Create the multi-vector retriever
    retriever = MultiVectorRetriever(
        vectorstore=vectorstore,
        docstore=store,
        id_key=id_key,
    )

    # Helper function to add documents to the vectorstore and docstore
    def add_documents(retriever, doc_summaries, doc_contents):
        doc_ids = [str(uuid.uuid4()) for _ in doc_contents]
        summary_docs = [
            Document(page_content=s, metadata={id_key: doc_ids[i]})
            for i, s in enumerate(doc_summaries)
        ]
        retriever.vectorstore.add_documents(summary_docs)
        retriever.docstore.mset(list(zip(doc_ids, doc_contents)))

    # Add texts, tables, and images
    # Check that text_summaries is not empty before adding
    if text_summaries:
        add_documents(retriever, text_summaries, texts)
    # Check that table_summaries is not empty before adding
    if table_summaries:
        add_documents(retriever, table_summaries, tables)
    # Check that image_summaries is not empty before adding
    if image_summaries:
        add_documents(retriever, image_summaries, images)

    return retriever

## Create & Deploy Vertex AI Vector Search Index & Endpoint

Skip this step if you already have Vector Search set up.

- https://console.cloud.google.com/vertex-ai/matching-engine/indexes

- Create [`MatchingEngineIndex`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndex)
  - https://cloud.google.com/vertex-ai/docs/vector-search/create-manage-index

In [3]:
# https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings
DIMENSIONS = 768  # Dimensions output from textembedding-gecko

index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name="mm_rag_langchain_index",
    dimensions=DIMENSIONS,
    approximate_neighbors_count=150,
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Multimodal RAG LangChain Index",
)

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/908687846511/locations/us-central1/indexes/5333409848946065408/operations/6832789700848123904
MatchingEngineIndex created. Resource name: projects/908687846511/locations/us-central1/indexes/5333409848946065408
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/908687846511/locations/us-central1/indexes/5333409848946065408')


- Create [`MatchingEngineIndexEndpoint`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndexEndpoint)
  - https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public

In [4]:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="mm_rag_langchain_index_endpoint",
    description="Multimodal RAG LangChain Index Endpoint",
    public_endpoint_enabled=True,
)

Creating MatchingEngineIndexEndpoint
Create MatchingEngineIndexEndpoint backing LRO: projects/908687846511/locations/us-central1/indexEndpoints/7088687803713716224/operations/2729333141150892032
MatchingEngineIndexEndpoint created. Resource name: projects/908687846511/locations/us-central1/indexEndpoints/7088687803713716224
To use this MatchingEngineIndexEndpoint in another session:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/908687846511/locations/us-central1/indexEndpoints/7088687803713716224')


- Deploy Index to Index Endpoint
  - NOTE: You can stop this cell after starting it instead of waiting for deployment
  - You can check the status at https://console.cloud.google.com/vertex-ai/matching-engine/indexes 

In [5]:
index_endpoint = index_endpoint.deploy_index(index=index, deployed_index_id="mm_rag_langchain_deployed_index")
index_endpoint.deployed_indexes

Deploying index MatchingEngineIndexEndpoint index_endpoint: projects/908687846511/locations/us-central1/indexEndpoints/7088687803713716224
Deploy index MatchingEngineIndexEndpoint index_endpoint backing LRO: projects/908687846511/locations/us-central1/indexEndpoints/7088687803713716224/operations/4152866447585968128


KeyboardInterrupt: 

In [None]:
# If the request or runtime times out, then use the index endpoint number shown at:
# https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints
INDEX_ENDPOINT_ID = "948680622677688320"  # @param {type:"string"}
index_endpoint = aiplatform.MatchingEngineIndexEndpoint(INDEX_ENDPOINT_ID)

## Load `VectorSearchVectorStore`


In [66]:
# # The vectorstore to use to index the summaries
# vectorstore = VectorSearchVectorStore.from_components(
#     project_id=PROJECT_ID,
#     region=LOCATION,
#     gcs_bucket_name="holtskinner-test",
#     index_id="langchain_mm_rag_deployed__1710182761418",
#     endpoint_id="6649108552487010304",
#     embedding=VertexAIEmbeddings(model_name="textembedding-gecko@latest"),
# )

from langchain_community.vectorstores import Chroma

# The vectorstore to use to index the summaries
vectorstore = Chroma(
    collection_name="mm_rag_test",
    embedding_function=VertexAIEmbeddings(model_name="textembedding-gecko@latest"),
)

# Create retriever
retriever_multi_vector_img = create_multi_vector_retriever(
    vectorstore,
    text_summaries,
    texts,
    table_summaries,
    tables,
    image_summaries,
    img_base64_list,
)

In [68]:
import io
import re

from IPython.display import HTML, display
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from PIL import Image


def plt_img_base64(img_base64):
    """Disply base64 encoded string as image"""
    # Create an HTML img tag with the base64 string as the source
    image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
    # Display the image by rendering the HTML
    display(HTML(image_html))


def looks_like_base64(sb):
    """Check if the string looks like base64"""
    return re.match("^[A-Za-z0-9+/]+[=]{0,2}$", sb) is not None


def is_image_data(b64data):
    """
    Check if the base64 data is an image by looking at the start of the data
    """
    image_signatures = {
        b"\xFF\xD8\xFF": "jpg",
        b"\x89\x50\x4E\x47\x0D\x0A\x1A\x0A": "png",
        b"\x47\x49\x46\x38": "gif",
        b"\x52\x49\x46\x46": "webp",
    }
    try:
        header = base64.b64decode(b64data)[:8]  # Decode and get the first 8 bytes
        for sig, format in image_signatures.items():
            if header.startswith(sig):
                return True
        return False
    except Exception:
        return False


def resize_base64_image(base64_string, size=(128, 128)):
    """
    Resize an image encoded as a Base64 string
    """
    # Decode the Base64 string
    img_data = base64.b64decode(base64_string)
    img = Image.open(io.BytesIO(img_data))

    # Resize the image
    resized_img = img.resize(size, Image.LANCZOS)

    # Save the resized image to a bytes buffer
    buffered = io.BytesIO()
    resized_img.save(buffered, format=img.format)

    # Encode the resized image to Base64
    return base64.b64encode(buffered.getvalue()).decode("utf-8")


def split_image_text_types(docs):
    """
    Split base64-encoded images and texts
    """
    b64_images = []
    texts = []
    for doc in docs:
        # Check if the document is of type Document and extract page_content if so
        if isinstance(doc, Document):
            doc = doc.page_content
        if looks_like_base64(doc) and is_image_data(doc):
            doc = resize_base64_image(doc, size=(1300, 600))
            b64_images.append(doc)
        else:
            texts.append(doc)
    return {"images": b64_images, "texts": texts}


def img_prompt_func(data_dict):
    """
    Join the context into a single string
    """
    formatted_texts = "\n".join(data_dict["context"]["texts"])
    messages = []

    # Adding the text for analysis
    text_message = {
        "type": "text",
        "text": (
            "You are financial analyst tasking with providing investment advice.\n"
            "You will be given a mixed of text, tables, and image(s) usually of charts or graphs.\n"
            "Use this information to provide investment advice related to the user question. \n"
            f"User-provided question: {data_dict['question']}\n\n"
            "Text and / or tables:\n"
            f"{formatted_texts}"
        ),
    }
    messages.append(text_message)
    # Adding image(s) to the messages if present
    if data_dict["context"]["images"]:
        for image in data_dict["context"]["images"]:
            image_message = {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{image}"},
            }
            messages.append(image_message)
    return [HumanMessage(content=messages)]


def multi_modal_rag_chain(retriever):
    """
    Multi-modal RAG chain
    """

    # Multi-modal LLM
    model = ChatVertexAI(
        temperature=0, model_name="gemini-pro-vision", max_output_tokens=1024
    )

    # RAG pipeline
    chain = (
        {
            "context": retriever | RunnableLambda(split_image_text_types),
            "question": RunnablePassthrough(),
        }
        | RunnableLambda(img_prompt_func)
        | model
        | StrOutputParser()
    )

    return chain


# Create RAG chain
chain_multimodal_rag = multi_modal_rag_chain(retriever_multi_vector_img)

In [69]:
query = """
 - What are the critical difference between various graphs for Class A Share?
 - Which index best matches Class A share performance closely where Google is not already a part? Explain the reasoning.
 - Identify key chart patterns for Google Class A shares.
 - What is cost of revenues, operating expenses and net income for 2020. Do mention the percentage change
 - What was the effect of Covid in the 2020 financial year?
 - What are the total revenues for APAC and USA for 2021?
 - What is deferred income taxes?
 - How do you compute net income per share?
 - What drove percentage change in the consolidated revenue and cost of revenue for the year 2021 and was there any effect of Covid?
 - What is the cause of 41% increase in revenue from 2020 to 2021 and how much is dollar change?

"""

docs = retriever_multi_vector_img.get_relevant_documents(query, limit=10)

result = chain_multimodal_rag.invoke(query)

### Print Retrieved documents

In [70]:
source_docs = split_image_text_types(docs)

print(source_docs["texts"])

['source: https://abc.xyz/assets/investor/static/pdf/20220202_alphabet_10K.pdf\n\nMARKET FOR REGISTRANT’S COMMON EQUITY, RELATED STOCKHOLDER MATTERS AND ISSUER PURCHASES OF EQUITY SECURITIES\n\nAs of October 2, 2015, Alphabet Inc. became the successor issuer of Google Inc. pursuant to Rule 12g-3(a) under the Exchange Act. Our Class A common stock has been listed on the Nasdaq Global Select Market under the symbol “GOOG” since August 19, 2004 and under the symbol "GOOGL" since April 3, 2014. Prior to August 19, 2004, there was no public market for our stock. Our Class B common stock is neither listed nor traded. Our Class C capital stock has been listed on the Nasdaq Global Select Market under the symbol “GOOG” since April 3, 2014.\n\nHolders of Record\n\nAs of December 31, 2021, there were approximately 4,907 and 1,733 stockholders of record of our Class A common stock and Class C capital stock, respectively. Because many of our shares of Class A common stock and Class C capital stock 

In [71]:
print(docs)

['source: https://abc.xyz/assets/investor/static/pdf/20220202_alphabet_10K.pdf\n\nMARKET FOR REGISTRANT’S COMMON EQUITY, RELATED STOCKHOLDER MATTERS AND ISSUER PURCHASES OF EQUITY SECURITIES\n\nAs of October 2, 2015, Alphabet Inc. became the successor issuer of Google Inc. pursuant to Rule 12g-3(a) under the Exchange Act. Our Class A common stock has been listed on the Nasdaq Global Select Market under the symbol “GOOG” since August 19, 2004 and under the symbol "GOOGL" since April 3, 2014. Prior to August 19, 2004, there was no public market for our stock. Our Class B common stock is neither listed nor traded. Our Class C capital stock has been listed on the Nasdaq Global Select Market under the symbol “GOOG” since April 3, 2014.\n\nHolders of Record\n\nAs of December 31, 2021, there were approximately 4,907 and 1,733 stockholders of record of our Class A common stock and Class C capital stock, respectively. Because many of our shares of Class A common stock and Class C capital stock 

In [72]:
from IPython.display import Markdown

Markdown(result)


 **1. What are the critical differences between various graphs for Class A Share?**

The critical differences between the various graphs for Class A Share are:

- The line graph shows the stock price over time, while the bar graph shows the total volume of shares traded over time.
- The pie chart shows the percentage of shares held by different types of investors, while the table shows the financial performance of the company.

**2. Which index best matches Class A share performance closely where Google is not already a part? Explain the reasoning.**

The index that best matches Class A share performance closely where Google is not already a part is the S&P 500. The S&P 500 is a stock market index that tracks the 500 largest publicly traded companies in the United States. Google is a member of the S&P 500, and its performance is closely correlated with the performance of the index.

**3. Identify key chart patterns for Google Class A shares.**

The key chart patterns for Google Class A shares are:

- An uptrend since the company's IPO in 2004
- A series of higher highs and higher lows
- A strong correlation with the S&P 500
- A beta of 1.2, which means that the stock is more volatile than the S&P 500

**4. What is cost of revenues, operating expenses and net income for 2020. Do mention the percentage change**

The cost of revenues for 2020 was $84,732 million, the operating expenses were $43,564 million, and the net income was $40,269 million. The percentage change in cost of revenues from 2019 to 2020 was 46.4%, the percentage change in operating expenses from 2019 to 2020 was 23.6%, and the percentage change in net income from 2019 to 2020 was 16.4%.

**5. What was the effect of Covid in the 2020 financial year?**

The effect of Covid in the 2020 financial year was significant. The company's revenue declined by 13% year-over-year, and its net income declined by 21% year-over-year. The decline in revenue was due to a decrease in advertising spending by businesses, while the decline in net income was due to an increase in expenses related to the pandemic.

**6. What are the total revenues for APAC and USA for 2021?**

The total revenues for APAC and USA for 2021 were $20,462 million and $51,297 million, respectively.

**7. What is deferred income taxes?**

Deferred income taxes are taxes that are owed on income that has been earned but not yet received. These taxes are recorded as a liability on the balance sheet and are paid when the income is received.

**8. How do you compute net income per share?**

Net income per share is computed by dividing the net income by the number of shares outstanding.

**9. What drove percentage change in the consolidated revenue and cost of revenue for the year 2021 and was there any effect of Covid?**

The percentage change in the consolidated revenue and cost of revenue for the year 2021 was driven by a number of factors, including the growth of the company's advertising business, the launch of new products and services, and the impact of the COVID-19 pandemic. The COVID-19 pandemic had a negative impact on the company's revenue and cost of revenue in 2021, as it led to a decrease in advertising spending by businesses and an increase in the cost of doing business.

**10. What is the cause of 41% increase in revenue from 2020 to 2021 and how much is dollar change?**

The 41% increase in revenue from 2020 to 2021 was due to a number of factors, including the growth of the company's advertising business, the launch of new products and services, and the impact of the COVID-19 pandemic. The COVID-19 pandemic had a positive impact on the company's revenue in 2021, as it led to an increase in the demand for the company's products and services. The dollar change in revenue from 2020 to 2021 was $12,788 million.

## Conclusions

Congratulations on making it through this multimodal RAG notebook!

While multimodal RAG can be quite powerful, note that it can face some limitations:

* **Data dependency:** Needs high-quality paired text and visuals.
* **Computationally demanding:** Processing multimodal data is resource-intensive.
* **Domain specific:** Models trained on general data may not shine in specialized fields like medicine.
* **Black box:** Understanding how these models work can be tricky, hindering trust and adoption.


Despite these challenges, multimodal RAG represents a significant step towards search and retrieval systems that can handle diverse, multimodal data.