In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Interactive Loan Application Assistant (Financial Services)


<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fmultimodal-live-api%2Freal_time_rag_bank_loans_gemini_2_0.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<br/>

<br/>

<div style="clear: both;"></div>
<br/>
<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/53/X_logo_2023_original.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/multimodal-live-api/real_time_rag_bank_loans_gemini_2_0.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| | |
|-|-|
| Author(s) | [Koushik Ghosh](https://github.com/Koushik25feb) |

<div class="alert alert-block alert-warning">
<b>
⚠️ Gemini 2.0 Flash (Model ID: <code>gemini-2.0-flash-exp</code>) and the Google Gen AI SDK are currently experimental and output can vary ⚠️</b>
</div>

## Overview

Navigating complex loan documents can be overwhelming for users, with lengthy terms, intricate conditions, and financial jargon that often create confusion. Our Interactive Loan Application Assistant demo app is designed to simplify this process using Gemini 2.0.

By leveraging cutting-edge **text-to-text** capabilities, this solution enhances accessibility, enabling users to:

- Make informed financial decisions with clarity.
- Save time by avoiding the need for manual document review.
- Ensure transparency and confidence when dealing with critical loan information.
- Whether it's for individuals seeking to understand their loan agreements or financial advisors assisting clients, this demo app bridges the gap between complex documentation and user comprehension, making financial services more transparent and user-centric.

This notebook provides a comprehensive demonstration of how Gemini 2.0 can act as your personal file assistant across various storage platforms (local storage, Google Cloud Storage, and Google Drive). It empowers users to seamlessly understand and interact with their loan documents. This notebook focuses on two key features:

- **Retrieval Augmented Generation (RAG):**  Text  output generation grounded in provided documents.
  - **Multimodal Live API:** Text output generation.

- **Large Context Window (Entire document):**  Text  output generation for large context in one go.
  - **Multimodal Live API:** Text output generation.


## High-Level Flow

This notebook simplifies understanding complex loan documents using Gemini. Here's how it works:

1.   Initialization:
      * Installs required libraries (like tools for reading PDFs and using Gemini).
      * Connects to your Google Cloud account for secure access.
      * Chooses the Gemini model for generating answers.

2.   Document Processing:
      * Loads your loan documents (from cloud storage or your computer).
      * Divides documents into smaller, manageable chunks.
      * Creates searchable representations of chunks for easy retrieval.

3. Question Answering:
      * You ask a question related to the loan documents.
      * The system quickly finds relevant parts of the documents.
      * The Gemini model combines your question and relevant information to generate an answer.
      * The answer is presented to you in text format.

4. Text Output:
      * The answer on text output
      * This is useful for accessibility or personal preference.

5. Large Context Window:
      * For longer documents, the entire content can be analyzed at once.
      * This might be slower, but can provide more comprehensive answers.

**More in depth technical details in the code below**


## Get started

### Install Google Gen AI SDK


In [None]:
# Install the SDK
%pip install google-genai==0.1.0 PyPDF2

In [None]:
%%capture

from google.colab import auth

auth.authenticate_user()


### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
import os

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

## Essential imports and the model configuration.

### Import libraries

In [None]:
import os
import subprocess
from typing import Any

from IPython.display import Audio, Markdown, display
import PyPDF2
import gcsfs
from google import genai
from google.cloud import storage
from google.genai.types import (
    EmbedContentConfig,
    GenerateContentConfig,
    LiveConnectConfig,
    Retrieval,
    Tool,
    VertexAISearch,
    VertexRagStore,
)
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from tenacity import retry, stop_after_attempt, wait_random_exponential

### Define Gemini 2.0 model and text embedding

In [None]:
MODEL_ID = "gemini-2.0-flash-exp"  # @param {type: "string"}

MODEL = (
    f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"
)
text_embedding_model = "text-embedding-004"  # @param {type:"string", isTemplate: true}

### Initialize GenAi Client

* Client for calling the Vertex AI Gen AI APIs.
* `vertexai=True`, indicates the client should communicate with the Vertex AI API endpoints.

In [None]:
client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location=LOCATION,
)

## Multimodal Live API Implementation


### Authentication and token setup

In [None]:
def get_access_token():
    """Fetches the Google Cloud access token."""
    try:
        return subprocess.check_output(
            ["gcloud", "auth", "print-access-token"], text=True
        ).strip()
    except subprocess.CalledProcessError as e:
        print(f"Error getting access token: {e}")
        return None


def get_api_endpoint():
    """Retrieves the API endpoint from the environment variable."""
    api_endpoint = os.environ.get("API_ENDPOINT")
    if not api_endpoint:
        print("Error: API_ENDPOINT environment variable not set.")
        return None
    return api_endpoint

### Multimodal Live API - Text in text out implementation

In [None]:
def generate_text(prompt: str) -> str:
    """Generates text using the specified model and prompt.

    This function utilizes the `vertex_client` to interact with the
    Generative AI models. It takes a prompt as input and returns the
    generated text response.

    Args:
        prompt: The input prompt for text generation.

    Returns:
        The generated text response.
    """
    modality = "TEXT"
    response = client.models.generate_content(
        model=MODEL,
        contents=f"{prompt}",
        config=GenerateContentConfig(
            response_modalities=[modality],
        ),
    )
    return response.text

### Multimodel Live API - Text to Audio implementation

In [None]:
async def generate_n_play_audio(client, prompt):
    """Generates audio from text using Gemini and plays it.

    Args:
      client: The Gen AI client instance.
      prompt: The text to convert to audio.
      model_id: The ID of the Gemini model to use (default: 'gemini-2.0-flash-exp').

    Returns:
      None. Plays the generated audio directly.
    """
    config = LiveConnectConfig(response_modalities=["AUDIO"])
    async with client.aio.live.connect(
        model=MODEL_ID,
        config=config,
    ) as session:
        text_input = prompt
        display(Markdown(f"**Input:** {text_input}"))

        await session.send(input=text_input, end_of_turn=True)

        audio_data = []
        async for message in session.receive():
            if message.server_content.model_turn:
                for part in message.server_content.model_turn.parts:
                    if part.inline_data:
                        audio_data.append(
                            np.frombuffer(part.inline_data.data, dtype=np.int16)
                        )

        if audio_data:
            display(Audio(np.concatenate(audio_data), rate=24000, autoplay=True))

## Quick Usages

Verify the initilisation with simple question

In [None]:
test_prompt = "How many days are there in year 2025?"

### Text In Text Out

Quick verification of all setup before further proceeding

In [None]:
output = generate_text(test_prompt)
print(output)

### Text in Audio Out

In [None]:
await generate_n_play_audio(client, test_prompt)

## Option 1: Custom RAG (Retrieval-Augmented Generation)

This section demonstrates how to build a custom RAG system using Gemini. RAG combines information retrieval (finding relevant information) with text generation (creating answers).

## What is RAG?

Imagine having a vast library of loan documents. RAG helps you quickly find the answers to your questions within these documents. It's like having a smart assistant that can pinpoint the specific sections you need, instead of having to read everything yourself.

## **How RAG Works:**

1. Get Your Documents:

  * Specify the location of your loan documents (Google Cloud Storage or local files).

2. Create a Searchable Index:

  * The system breaks down the documents into smaller chunks of text.
  * It then creates "embeddings" – numerical representations of these chunks – which allow for fast and efficient searching. Think of it like creating a detailed index for your library, making it easy to find what you're looking for.

3. Ask Questions and Get Answers:

  * When you ask a question, the system uses the embeddings to quickly identify the most relevant chunks of text.
  * These chunks, along with your question, are given to the Gemini AI model, which generates an answer. The answer is based on the specific information within the documents, ensuring accuracy and relevance.


## **Benefits of RAG:**

  * Efficiency: RAG allows for faster and more targeted information retrieval compared to manual searching.
  * Accuracy: Answers are grounded in the provided documents, reducing the risk of hallucinations or incorrect information.
  * Flexibility: You can easily update or add new documents to the system, enhancing its capabilities over time.


## **Example:**
Let's say you want to know about the types of home loans available. Using RAG, you can simply ask the question, and the system will:

  1. Identify the relevant sections in your loan documents.
  2. Use Gemini to extract and generate an answer based on those sections.
  3. Present the answer to you in a clear and concise format.

This approach significantly streamlines the process of extracting information from complex documents, enabling more efficient and informed decision-making.


#### Get your documents



1.   Local Content
2.   GCS Files


In [None]:
# GCS path for the demo document, please use this for only reference.

# gs://github-repo/generative-ai/gemini2/use-cases/loan_example_documents/DEMO-BANK-LOAN-DETAILS.pdf
# gs://github-repo/generative-ai/gemini2/use-cases/loan_example_documents/Demo-bank-home-loan-agreement.pdf

document = [
    "gs://github-repo/generative-ai/gemini2/use-cases/loan_example_documents/DEMO-BANK-LOAN-DETAILS.pdf",
    "gs://github-repo/generative-ai/gemini2/use-cases/loan_example_documents/Demo-bank-home-loan-agreement.pdf",
]

In [None]:
## Use and modify the below example code if you have the local document

# document = [
#     "/content/DEMO-BANK-LOAN-DETAILS.pdf",
#     "/content/Demo-bank-home-loan-agreement.pdf"
# ]

In [None]:
# Document read from GCS


def extract_text_from_gcs(gcs_path):
    """Extracts text from a PDF file."""
    bucket_name = gcs_path.split("/")[2]
    file_name = "/".join(gcs_path.split("/")[3:])

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(file_name)

    document_content = blob.download_as_bytes()
    return document_content

In [None]:
def extract_text_from_pdf(pdf_path):
    """Extracts text from a PDF file, skipping blanks and handling empty PDFs."""
    try:
        with open(pdf_path, "rb") as pdf_file:
            pdf_reader = PyPDF2.PdfReader(pdf_file)
            text = []

            # Check if PDF has any pages
            if not pdf_reader.pages:
                return "Error: PDF file is empty."

            for page_number in range(len(pdf_reader.pages)):
                page_text = pdf_reader.pages[page_number].extract_text()

                # Only append non-empty, non-blank pages
                if page_text and not page_text.isspace():
                    clean_text = page_text.strip()
                    if clean_text:
                        text.append(clean_text)

            # Join the extracted text from all pages into a single string
            final_text = "\n".join(text)
            return (
                final_text
                if final_text
                else "Error: No readable text found in the PDF."
            )

    except FileNotFoundError:
        return "Error: PDF file not found."
    except PyPDF2.errors.PdfReadError:
        return "Error: Could not read the PDF file. It may be corrupted or encrypted."

#### RAG Creation

RAG based on the large files chunking and embedding using text-embedding-004 with vector db

In [None]:
@retry(wait=wait_random_exponential(multiplier=1, max=120), stop=stop_after_attempt(4))
def get_embeddings(
    embedding_client: Any, embedding_model: str, text: str, output_dim: int = 768
) -> list[float]:
    """
    Generate embeddings for text with retry logic for API quota management.
    """
    try:
        response = embedding_client.models.embed_content(
            model=embedding_model,
            contents=[text],
            config=EmbedContentConfig(output_dimensionality=output_dim),
        )
        return [response.embeddings[0].values]
    except Exception as e:
        if "RESOURCE_EXHAUSTED" in str(e):
            return None
        print(f"Error generating embeddings: {str(e)}")
        raise

In [None]:
def build_index(
    document_paths: list[str],
    embedding_client: Any,
    embedding_model: str,
    chunk_size: int = 500,
) -> pd.DataFrame:
    """
    Build searchable index from documents with page-wise processing.
    """
    all_chunks = []
    gcs_file_system = gcsfs.GCSFileSystem(project=PROJECT_ID)

    for doc_path in document_paths:
        try:
            with gcs_file_system.open(doc_path, "rb") as file:
                pdf_reader = PyPDF2.PdfReader(file)

                for page_num in range(len(pdf_reader.pages)):
                    page = pdf_reader.pages[page_num]
                    page_text = page.extract_text()

                    chunks = [
                        page_text[i : i + chunk_size]
                        for i in range(0, len(page_text), chunk_size)
                    ]

                    for chunk_num, chunk_text in enumerate(chunks):
                        embeddings = get_embeddings(
                            embedding_client, embedding_model, chunk_text
                        )

                        if embeddings is None:
                            print(
                                f"Warning: Could not generate embeddings for chunk {chunk_num} on page {page_num + 1}"
                            )
                            continue

                        chunk_info = {
                            "document_name": doc_path,
                            "page_number": page_num + 1,
                            "page_text": page_text,
                            "chunk_number": chunk_num,
                            "chunk_text": chunk_text,
                            "embeddings": embeddings,
                        }
                        all_chunks.append(chunk_info)

        except Exception as e:
            print(f"Error processing document {doc_path}: {str(e)}")
            continue

    if not all_chunks:
        raise ValueError("No chunks were created from the documents")

    return pd.DataFrame(all_chunks)

In [None]:
def get_relevant_chunks(
    query: str,
    vector_db: pd.DataFrame,
    embedding_client: Any,
    embedding_model: str,
    top_k: int = 3,
) -> str:
    """
    Retrieve most relevant document chunks for a query using similarity search.
    """
    try:
        query_embedding = get_embeddings(embedding_client, embedding_model, query)

        if query_embedding is None:
            return "Could not process query due to quota issues"

        similarities = [
            cosine_similarity(query_embedding, chunk_emb)[0][0]
            for chunk_emb in vector_db["embeddings"]
        ]

        top_indices = np.argsort(similarities)[-top_k:]
        relevant_chunks = vector_db.iloc[top_indices]

        context = []
        for _, row in relevant_chunks.iterrows():
            context.append(
                {
                    "document_name": row["document_name"],
                    "page_number": row["page_number"],
                    "chunk_number": row["chunk_number"],
                    "chunk_text": row["chunk_text"],
                }
            )

        return "\n\n".join(
            [
                f"[Page {chunk['page_number']}, Chunk {chunk['chunk_number']}]: {chunk['chunk_text']}"
                for chunk in context
            ]
        )

    except Exception as e:
        print(f"Error getting relevant chunks: {str(e)}")
        return "Error retrieving relevant chunks"

In [None]:
@retry(wait=wait_random_exponential(multiplier=1, max=120), stop=stop_after_attempt(4))
def generate_answer(query: str, context: str, llm_client: Any, llm_model: str) -> str:
    """
    Generate answer using LLM with retry logic for API quota management.
    """
    try:
        # If context indicates earlier quota issues, return early
        if context in [
            "Could not process query due to quota issues",
            "Error retrieving relevant chunks",
        ]:
            return "Can't Process, Quota Issues"

        prompt = f"""Based on the following context, please answer the question.
        Include page number and chunk number in your citations when referring to specific information.

        Context:
        {context}

        Question: {query}

        Answer:"""

        response = llm_client.models.generate_content(model=llm_model, contents=prompt)
        return response.text

    except Exception as e:
        if "RESOURCE_EXHAUSTED" in str(e):
            return "Can't Process, Quota Issues"
        print(f"Error generating answer: {str(e)}")
        return "Error generating answer"

In [None]:
def rag(
    document_name: str,
    question_set: list[dict],
    vector_db: pd.DataFrame,
    embedding_client: Any,
    embedding_model: str,
    llm_client: Any,
    top_k: int,
    llm_model: str,
) -> pd.DataFrame:
    """
    RAG Pipeline.

    Args:
        document_name: Name of the document being queried
        question_set: List of question dictionaries
        vector_db: DataFrame containing document chunks and embeddings
        embedding_client: Client for accessing embedding API
        embedding_model: Name of the embedding model
        llm_client: Client for accessing LLM API
        top_k: Number of relevant chunks to retrieve (default: 3)

    Returns:
        DataFrame containing questions and generated answers
    """
    results = []

    for question in question_set[:1]:
        try:
            # Get relevant context for question
            relevant_context = get_relevant_chunks(
                question["question"],
                vector_db,
                embedding_client,
                embedding_model,
                top_k=top_k,
            )

            # Generate answer using LLM
            generated_answer = generate_answer(
                question["question"], relevant_context, llm_client, llm_model
            )

            # Store results
            results.append(
                {
                    "document_name": document_name,
                    "question": question["question"],
                    "source_page_num": question["page"],
                    "answer": question["answer"],
                    "generated_answer": generated_answer,
                }
            )

        except Exception as e:
            print(f"Error processing question '{question['question']}': {str(e)}")
            results.append(
                {
                    "document_name": document_name,
                    "question": question["question"],
                    "source_page_num": question["page"],
                    "answer": question["answer"],
                    "generated_answer": "Error processing question",
                }
            )

    return pd.DataFrame(results)

In [None]:
vector_db_mini_vertex = build_index(
    document, embedding_client=client, embedding_model=text_embedding_model
)
vector_db_mini_vertex.head()

In [None]:
question_set_1 = [
    {
        "question": "What are the loan products available?",
        "answer": "Home Loan, Smart Loan, Loan Against Property, Smart loan against property",
        "page": 6,
    },
    {
        "question": "How much is the Processing fee for the loan?",
        "answer": "1% of the sanctioned loan amount or 10000 INR, which ever is higher",
        "page": 7,
    },
    {
        "question": "Documents to submit as proof od identity?",
        "answer": "Passport, Election/voters IDs, Permanent Driving license, permanent account number, Adhaar card",
        "page": 2,
    },
    {
        "question": "How many days it take for Loan Pay Order?",
        "answer": "1 day",
        "page": 5,
    },
    {
        "question": "Phone number for phone banking service?",
        "answer": "+91-49-3111-1111",
        "page": 16,
    },
]

In [None]:
%%time

results_df_vertex = rag(
    document_name=document[0].split("/")[-1],
    question_set=question_set_1,
    vector_db=vector_db_mini_vertex,
    embedding_client=client,  # For embedding generation
    embedding_model=text_embedding_model,  # For embedding model
    llm_client=client,  # For answer generation,
    top_k=10,
    llm_model=MODEL,
)

### Loan QnA with Gemini 2.0 Model - RAG

In [None]:
question = "What are different types of home loan?"

In [None]:
relevant_context = get_relevant_chunks(
    question, vector_db_mini_vertex, client, text_embedding_model, top_k=10
)
rag_prompt = f"""Based on the following context, please answer the question.

Context:
{relevant_context}

Question: {question}

Answer:"""

#### Text output - RAG

In [None]:
response = generate_text(prompt=rag_prompt)
print(response)

#### Audio Output

In [None]:
await generate_n_play_audio(client, rag_prompt)

## Option 2 : Large Context Window

This section demonstrates Gemini's ability to handle larger documents for more comprehensive question answering.

## **What is a Large Context Window?**

Think of a context window as the amount of information a model can consider at once. With a larger context window, Gemini can process entire documents instead of just smaller chunks.

## **Why Use a Large Context Window?**

  * More Comprehensive Answers: By seeing the full context of a document, Gemini can provide more nuanced and detailed answers to your questions.

  * Better Understanding: The model gains a better understanding of the relationships and dependencies within the document, leading to more accurate responses.

  * Reduced Fragmentation: You can avoid potential issues with information being fragmented across different chunks, ensuring a smoother flow of information.


## **How it Works:**

**1. Load the Document:**
  * Select the loan document you want to use (from Google Cloud Storage or your local files).

**2. Prepare the Input:**

* The system reads the entire document and formats it for input into the Gemini model.

**3. Ask Your Question:**

* Pose your question related to the document.

**4. Generate Answer:**

* Gemini processes the entire document along with your question to generate a response. This response is informed by the full context of the document, leading to more in-depth answers.


## **Benefits:**

* Enhanced Detail: Get more comprehensive answers that reflect the full content of the document.
* Improved Accuracy: Reduce the chances of misinterpretations by providing the model with complete context.
* Greater Flexibility: Explore complex questions that require understanding information across different sections of the document.


## **Example:**

Let's say you want to know about various loan types and their specific details. By using a large context window, Gemini can analyze the entire loan agreement document and provide you with a more comprehensive answer, including specific clauses and conditions related to each type of loan.

## **Limitations:**
  * Processing Time: Working with large documents can take longer compared to using smaller chunks.
  * Resource Requirements: It may require more computing power and memory depending on the document size.

## **When to Use It:**

Consider using a large context window when:

  * You need deeper insights and comprehensive answers.
  * Your questions involve understanding information spread across the entire document.
  * Accuracy and detailed analysis are critical for your task.

In [None]:
# Taking document from the GCS path
# document_path = "gs://github-repo/generative-ai/gemini2/use-cases/loan_example_documents/Demo-bank-home-loan-agreement.pdf"
# document_content = extract_text_from_gcs(document_path)

In [None]:
# Taking document from the local path
# download the example file and keep in the Colab files.
document_path = "/content/DEMO-BANK-LOAN-DETAILS.pdf"
document_content = extract_text_from_pdf(document_path)

### Text Output - Large Context

In [None]:
query = "what are the type of loans?"

large_context_prompt = f"""Based on the following context, please answer the question.

  Context:
  {document_content}

  Question: {query}

  Answer:"""

In [None]:
response = client.models.generate_content(model=MODEL, contents=large_context_prompt)
display(Markdown(response.text))

### Text In Audio Out, Multimodal Live API

In [None]:
await generate_n_play_audio(client, large_context_prompt)

## Option 3 : With Vertex AI Search

<div class="alert alert-block alert-warning">

<b>⚠️ Assumption, you have already created or have a datastore with your documents, If not please follow this guide to create it <a href="https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#cloud-storage"> Create a search datastore</a> ⚠️</b>
</div>

This section demonstrates how to use Gemini with Vertex AI Search for question answering. Vertex AI Search is a fully managed, scalable NoSQL database service that can store and retrieve large amounts of data.

- What is Vertex AI Search?

  - Vertex AI Search is a database service that can store and retrieve large amounts of data. It is fully managed and scalable, making it a good choice for storing large datasets. In this context, it is used to store the loan documents and their embeddings, which are used by Gemini to answer questions about the documents.

- Why Use Vertex AI Search?

  - Scalability: Vertex AI Search can handle large datasets and high traffic loads.
  - Reliability: Vertex AI Search is a fully managed service, so you don't have to worry about managing the infrastructure.
  - Integration: Vertex AI Search integrates well with other Google Cloud services, such as Gemini.

- How it Works:

  1. Store the Documents: The loan documents and their embeddings are stored in Vertex AI Search.
  2. Ask Your Question: You ask a question about the documents.
  3. Retrieve Relevant Information: Gemini uses the embeddings to quickly identify the most relevant documents in Vertex AI Search.
  4. Generate Answer: Gemini processes the relevant documents along with your question to generate a response.

- Benefits

  - Efficiency: Vertex AI Search enables fast and efficient retrieval of relevant information.
  - Scalability: You can easily handle large datasets and high traffic loads.
  - Integration: It seamlessly integrates with Gemini and other Google Cloud services.

### Example

Let's say you want to know about the different types of loans available. Using Vertex AI Search, you can simply ask the question, and Gemini will:

1. Identify the relevant documents in Vertex AI Search.
2. Use Gemini to extract and generate an answer based on those documents.
3. Present the answer to you in a clear and concise format.

### Limitations:

* Cost: Vertex AI Search is a paid service, so you will need to pay for storage and retrieval costs.

### When to Use It:

Consider using Vertex AI Search when:

* You don't want to manage the RAG infrastructure and complex logic
* You have a large dataset of documents.
* You need high performance and scalability.
* You want to integrate with other Google Cloud services.

### Initialize the Vertex AI Search data store

In [None]:
## Vertex AI Search

datastore_id = "Your-datastore"  # @param {type: "string", isTemplate: true}

datastore_path = f"projects/{PROJECT_ID}/locations/global/collections/default_collection/dataStores/{datastore_id}"

vertex_ai_search_tool = Tool(
    retrieval=Retrieval(vertex_ai_search=VertexAISearch(datastore=datastore_path))
)

### Get your QnA with Vertex AI Search and Vertex AI Multimodal Live API


In [None]:
query = "what are the types of loans?"  # @param  {type: "string", isTemplate: true, label: "Select Modality"}

modality = "TEXT"

response = client.models.generate_content(
    model=MODEL_ID,
    contents=query,
    config=GenerateContentConfig(
        tools=[vertex_ai_search_tool], response_modalities=[modality]
    ),
)

display(Markdown(response.text))

## Option 4: with Vertex AI Vector Search

This section demonstrates how to use Gemini with Vertex AI Vector Search for question answering. Vertex AI Vector Search is a fully managed, scalable search service that can be used to power retrieval augmented generation (RAG) applications.

- What is Vertex AI Vector Search?

  - Vertex AI Vector Search is a powerful search service that allows you to build applications that can understand and respond to natural language queries. It uses advanced machine learning techniques to index and search your data, and it can be used to power a variety of applications, including RAG.

- Why Use Vertex AI Vector Search?

  - Scalability: Vertex AI Vector Search can handle large datasets and high traffic loads. This means you can use it to build applications that can scale to meet the needs of your business.
  - Reliability: Vertex AI Vector Search is a fully managed service, so you don't have to worry about managing the infrastructure. This means you can focus on building your application, without having to worry about the underlying infrastructure.
  - Integration: Vertex AI Vector Search integrates well with other Google Cloud services, such as Gemini. This makes it easy to build applications that use multiple Google Cloud services.

- How it Works:

  1. Create a Search Index: You first need to create a search index for your data. This is done using the Vertex AI Vector Search API.
  2. Ask Your Question: You ask a question about your data.
  3. Retrieve Relevant Information: Vertex AI Vector Search uses its index to quickly identify the most relevant information to your query.
  4. Generate Answer: Gemini processes the relevant information along with your question to generate a response.

- Benefits

  - Efficiency: Vertex AI Vector Search enables fast and efficient retrieval of relevant information.
  - Scalability: You can easily handle large datasets and high traffic loads.
  - Integration: It seamlessly integrates with Gemini and other Google Cloud services.

### Example

Let's say you want to know about the different types of loans available. Using Vertex AI Vector Search, you can simply ask the question, and Gemini will:

1. Identify the relevant information in your search index.
2. Use Gemini to extract and generate an answer based on that information.
3. Present the answer to you in a clear and concise format.

### Limitations

Cost: Vertex AI Vector Search is a paid service, so you will need to pay for storage and retrieval costs.

### When to Use It

Consider using Vertex AI Vector Search when:

* You have a large dataset of documents.
* You need high performance and scalability.
* You want to integrate with other Google Cloud services.
* You want to improve search result diversity, quality, and ranking through ranking and recall tuning features of vector search.

### Import for Vertex AI

In [None]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

### Setup Vertex AI Vector Search index and index endpoint

In this section, we have some helper methods to help you setup your Vector Search index.

This section is not required if you already have a Vector Search index ready to use.

The index has to meet the following criteria:

1. `IndexUpdateMethod` must be `STREAM_UPDATE`, see [Create stream index]({{docs_path}}vector-search/create-manage-index#create-stream-index).

2. Distance measure type must be explicitly set to one of the following:

   * `DOT_PRODUCT_DISTANCE`
   * `COSINE_DISTANCE`

3. Dimension of the vector must be consistent with the embedding model you plan
   to use in the RAG corpus. Other parameters can be tuned based on
   your choices, which determine whether the additional parameters can be
   tuned.

In [None]:
# create the index
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name="loanDemoRag",
    description="Demo app for rag with banking use case",
    dimensions=768,
    approximate_neighbors_count=10,
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    feature_norm_type="UNIT_L2_NORM",
    index_update_method="STREAM_UPDATE",
)

### Vertex ai search public endpoint [public endpoints](https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public).

In [None]:
# create IndexEndpoint
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="loanDemoRag", public_endpoint_enabled=True
)

Deploying the Index to the Index Endpoint
When deploying an index to an index endpoint for the first time, it takes approximately 30 minutes to automatically build and initialize the backend. Subsequent deployments are significantly faster, with the index becoming ready in seconds.

To monitor the status of the index deployment:

Open the Vector Search Console.
Navigate to the Index endpoints tab.
Select your index endpoint to view its details.
Resource Name Formats
Ensure you have the correct resource names for your index and index endpoint in the following formats:

* `projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}`
* `projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}`.


If you're unsure of the resource names, use the following command to retrieve them:

In [None]:
print(my_index_endpoint.resource_name)
print(my_index.resource_name)
print(my_index.name)
print(my_index)

In [None]:
# Deploy Index
my_index_endpoint.deploy_index(index=my_index, deployed_index_id="loanDemoRag")

In [None]:
from vertexai.preview import rag

In [None]:
vector_db = rag.VertexVectorSearch(
    index=my_index.resource_name, index_endpoint=my_index_endpoint.resource_name
)

# Name your corpus
DISPLAY_NAME = "loan_app_corpus"  # @param  {type:"string"}

# Create RAG Corpus
rag_corpus = rag.create_corpus(display_name=DISPLAY_NAME, vector_db=vector_db)
print(f"Created RAG Corpus resource: {rag_corpus.name}")

### Import the files from the GCS

In [None]:
GCS_BUCKET = "gs://demo-loan-documents/"  # @param {type:"string", "placeholder": "your-gs-bucket"}

response = rag.import_files(  # noqa: F704
    corpus_name=rag_corpus.name,
    paths=[GCS_BUCKET],
    chunk_size=512,
    chunk_overlap=50,
)

In [None]:
### Check the files just imported. It may take a few seconds to process the imported files.
rag.list_files(corpus_name=rag_corpus.name)

### Add Rag corpus to the context

In [None]:
rag_resource = rag.RagResource(
    rag_corpus=rag_corpus.name,
)

vertex_ai_rag_tool = Tool(
    retrieval=Retrieval(
        vertex_rag_store=VertexRagStore(
            rag_resources=[rag_resource],  # Currently only 1 corpus is allowed.
            similarity_top_k=10,
            vector_distance_threshold=0.4,
        ),
    )
)

In [None]:
query = "what are the types of loans?"  # @param  {type: "string", isTemplate: true, label: "Select Modality"}

modality = "TEXT"


response = client.models.generate_content(
    model=MODEL_ID,
    contents=query,
    config=GenerateContentConfig(
        tools=[vertex_ai_rag_tool], response_modalities=[modality]
    ),
)

display(Markdown(response.text))

## Conclusion

In this tutorial, you have learned how to leverage the capabilities of Gemini 2.0, covering the following topics:

* Using the Google Gen AI SDK: Learn how to interact with Gemini models through the SDK.

* Utilizing the Google Multimodal Live API: Explore its features for handling multimodal data.

* Creating and Implementing Retrieval Augmented Generation (RAG): Define and apply RAG techniques effectively.

* Extracting Information from PDFs:
    * Converting Text to Text.
    * Generating Text to Audio.
* Leveraging a Large Context Window:

    * Maximize capabilities with the Gen AI SDK and Multimodal Live API.

* Working with Vertex AI Features:
    * Vertex AI Search for efficient data management.
    * Vertex AI Search for enhanced query capabilities.

* Developing Q&A Applications: Build a Question-and-Answer application with Gemini 2.0.

This comprehensive guide equips you with practical knowledge for utilizing Gemini 2.0 in diverse scenarios, from multimodal data handling to advanced AI-powered application development.
