In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# RAG - QnA using Query Routing

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/qna_using_query_routing/qna_using_query_routing.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretrieval-augmented-generation%2Fqna_using_query_routing%2Fqna_using_query_routing.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retrieval-augmented-generation/qna_using_query_routing/qna_using_query_routing.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/use-cases/retrieval-augmented-generation/qna_using_query_routing/qna_using_query_routing.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>


| | |
|-|-|
|Author(s) | [Charu Shelar](https://github.com/CharulataShelar) |

## Overview

This notebook showcases the use of query routing techniques to improve retrieval performance in an AI-powered learning assistant for a computer training institute. This assistant is designed to use LLM to classify the intent of the user query, which in turn determines the appropriate source(s) to answer the query. The solution has been built using the custom RAG approach and Gemini model (`Gemini Pro 1.0`).

## Folder Structure

1. qna_using_query_routing/
    - config.ini : Configuration file.
    - qna_using_query_routing.ipynb: Main demo notebook.

2. utils/
    - intent_routing.py : Contains methods for intent classification and route the request to respective componets.
    - qna_vector_search.py : Answer QnA Type Questions using indexed documents.
    - qna_using_query_routing_utils.py : Contains other utility functions.

3. images/
    - This folder contains images used in the notebook.

## As a developer, you will learn the following steps to implement the solution

1. Embed the document and create a vector search index using Vector Search (previously known as Matching Engine).
    - Create document embeddings and upload them to GCS bucket.
    - Create/update the vector search index using the embeddings.
    - Python code function used here: `utils.qna_using_query_routing_utils.create_vector_search_index()`

2. Build RAG (Retrieval-Augmented Generation) for intra document search using routing.
    - Identify the intent of the user query and route the query.
    - Answer programming questions using indexed documents i.e. using Vector Search's semantic search.
    - Answer coding questions using the Gemini model if the knowledge base does not have the relevant context/content.
    - To prevent hallucinations and maintain appropriate responses, the solution demonstrates how to guardrail the system's response to predetermined programming languages when handling user queries. The config.ini file can be used to configure the list of supported programming languages.
    - Python code files used here: `utils/intent_routing.py` and `utils/qna_vector_search.py`

3. To build chat UI interface using Gradio
    - Create a chat interface to allow users to interact with the virtual assistant
    - Create a separate tab on the UI to allow end users to index new documents from the GCS bucket

## Google Cloud Services used

1. Vector Search (previously Matching Engine)
2. Large Language Models - Gemini Pro 1.0, textembedding-gecko

## Costs
This tutorial uses billable components of Google Cloud:
- Vector Search
- Gemini Pro 1.0
- textembedding-gecko

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/?hl=en) to generate a cost estimate based on your projected usage.

## Solution Design Flow

![genAI Asset Learning assistant](https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/rag/qna-using-query-routing/architecture.png)

### Response Generation using Query Routing:
1. The user starts a natural language query through a Gradio Chat User Interface (UI).

2. Intent classification is done using Gemini model. It classifies the message into one of the following intents: `WELCOME`, `PROGRAMMING_QUESTION_AND_ANSWER`, `WRITE_CODE`, `FOLLOWUP`, or `CLOSE`.

3. For the `WRITE_CODE` intent, the Gemini model is used to generate code using its coding capability.

4. For the `PROGRAMMING_QUESTION_AND_ANSWER` intent, custom orchestration (RAG) retrieves context relevant to the user query from Vector Search and summarises relavent contexts. If the answer is not found, the user query is routed to the Gemini Model to respond using its knowledge.

5. For the `FOLLOWUP` intent, such as explaining more or writing code for previous responses, the Gemini Model is used to generate responses using its code capability.

6. For the `WELCOME` and `CLOSE` intents, the Gemini model is used to generate appropriate responses.

## Getting Started

### Install Vertex AI SDK and other required packages


In [None]:
!pip3 install --upgrade --user google-cloud-aiplatform \
langchain==0.1.13 \
pypdf==4.1.0 \
gradio==3.41.2 \
langchain-google-vertexai \
--quiet

### Restart runtime (Colab only)

To use the newly installed packages, you must restart the runtime on Google Colab.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

Authenticate your environment on Google Colab.


In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Import required packages

In [None]:
import configparser
import logging
import os
import uuid
import pandas as pd
from datetime import datetime

import gradio as gr
import vertexai
from vertexai.generative_models import GenerativeModel
from vertexai.preview.language_models import TextEmbeddingModel

from utils import qna_using_query_routing_utils
from utils.intent_routing import IntentRouting

### Set Google Cloud project information and initialize Vertex AI SDK

In [None]:
PROJECT_ID = "your-project-id"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

vertexai.init(project=PROJECT_ID, location=LOCATION)

Set up logging for the application

In [None]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

### Update the project settings in config file

<div class="alert alert-block alert-warning">
<b>⚠️ Please do not change the configuration file name i.e. `config.ini` ⚠️</b>
</div>

In [None]:
config_file = "config.ini"

#### Update the settings in the config file

**Note:** Some settings in the `config.ini` file are are updated from this notebook. 
Additional parameters can be modified manually or using same code.

In [None]:
config = configparser.ConfigParser()
config.read(config_file)

config.set("default", "project_id", PROJECT_ID)
config.set("default", "region", LOCATION)

with open(config_file, "w") as cf:
    config.write(cf)

### [One-time] Setup Vector Search for QnA

- Download sample pdf document and save it in `DOCUMENT_FOLDER`
- Generate document embeddings, this will split and chunk the documents as configured using `chunk_size` and `chunk_overlap` in the `config.init` file
- Setup a Vector Search index (create vector search index, endpoint and deploy the index to the endpoint)

#### Download sample pdf document and save it in `DOCUMENT_FOLDER`

In [None]:
# Download the sample document
!wget https://cfm.ehu.es/ricardo/docs/python/Learning_Python.pdf

DOCUMENT_FOLDER = "document"

# Create a "document" directory if it doesn't exist
if not os.path.exists(DOCUMENT_FOLDER):
    os.makedirs(DOCUMENT_FOLDER)

# Move the document to `DOCUMENT_FOLDER` folder
!mv Learning_Python.pdf {DOCUMENT_FOLDER}/Learning_Python.pdf

#### Generate document embeddings

Read document(s) and split into chunks

In [None]:
doc_splits = qna_using_query_routing_utils.get_split_documents(DOCUMENT_FOLDER)

for idx, split in enumerate(doc_splits):
    split.metadata["chunk"] = idx

# Log the number of documents after splitting
print(f"Number of chunks = {len(doc_splits)}")

data = [{"content": ob.page_content, "metadata": ob.metadata} for ob in doc_splits]

local_embedding_filename = config["vector_search"]["embedding_jsonl_file"]

with open(local_embedding_filename, "w") as outfile:
    for item in data:
        json_line = json.dumps(item)
        outfile.write(json_line + "\n")

print("Saving document chunks in json file:", local_embedding_filename)

Update the BUCKET_URI in config.ini file

In [None]:
UID = datetime.now().strftime("%m%d%H%M")
BUCKET_URI = f'gs://{config["default"]["project_id"]}-embedding'

config.set("vector_search", "me_gcs_bucket", BUCKET_URI)

with open(config_file, "w") as cf:
    config.write(cf)

print("UID :", UID)
print("BUCKET_URI :", BUCKET_URI)

Create a GCS bucket and move the json file in same bucket

In [None]:
! gsutil mb -l $LOCATION -p {PROJECT_ID} $BUCKET_URI
! gsutil cp $local_embedding_filename $BUCKET_URI

Create embeddings for all the text chunks in a batch

In [None]:
textembedding_model = TextEmbeddingModel.from_pretrained(
    config["vector_search"]["embedding_model_name"]
)
batch_prediction_job = textembedding_model.batch_predict(
    dataset=[f"{BUCKET_URI}/{local_embedding_filename}"],
    destination_uri_prefix=f"{BUCKET_URI}/vertex-LLM-Batch-Prediction/{UID}",
)
print(batch_prediction_job.display_name)
print(batch_prediction_job.resource_name)
print(batch_prediction_job.state)

Download the embeddings file (jsonl) 

In [None]:
! gsutil cp -r $batch_prediction_job.gca_resource.output_info.gcs_output_directory json_files/

#### Setup Vector Search index

1. Create Vector Search index and Endpoint for Retrieval
2. Create and add document embeddings to Vector Store

Set display name for the vector search index

In [None]:
me_index_name = f"{PROJECT_ID}-index"  # @param {type:"string"}
me_region = "us-central1"  # @param {type:"string"}

config.set("vector_search", "me_index_name", me_index_name)
config.set("vector_search", "me_region", me_region)

with open(config_file, "w") as cf:
    config.write(cf)

Save embeddings from jsonl to json file

In [None]:
jsonl_file_path = "json_files/000000000000.jsonl"

df_list = []
with open("create_index.json", "w") as outfile:
    with open(jsonl_file_path, "r") as infile:
        for line in infile:
            df_l = json.loads(line)
            line_details = {
                "id": df_l["instance"]["metadata"]["chunk"],
                "embedding": df_l["predictions"][0]["embeddings"]["values"],
            }
            json_line = json.dumps(line_details)
            outfile.write(json_line + "\n")

            df_list.append(
                pd.DataFrame(
                    {
                        "id": df_l["instance"]["metadata"]["chunk"],
                        # "embedding": str(df_l["predictions"][0]["embeddings"]["values"]),
                        "page_source": df_l["instance"]["metadata"]["source"],
                        "text": df_l["instance"]["content"],
                    },
                    index=[0],
                )
            )

embeddings = pd.concat(df_list, ignore_index=True)
embeddings.to_csv(config["vector_search"]["embedding_csv_file"], index=False)
print("Embeddings data:", embeddings.shape)
embeddings.head()

Move the embedings json file to GCS bucket

In [None]:
! gsutil cp create_index.json gs://genai-github-assets-embedding/create_index_json/

Create new vector search index and deploy it to a endpoint

In [None]:
(
    index,
    index_endpoint,
    deployed_index_id,
) = qna_using_query_routing_utils.create_vector_search_index(BUCKET_URI)
print("Index :", index)
print("Index endpoint :", index_endpoint)
print("Deployed Index Id :", deployed_index_id)

**Note:** If you are re-running the code in this notebook and want to reuse the previously created index without creating a new one, execute the code in the cell below.

This cell assumes that the index has already been created, deployed, and can be used for retrieval.

In [None]:
# # Get details for already deployed index
# me_index_name = config["vector_search"]["me_index_name"]
# me_region = config["vector_search"]["me_region"]

# index_endpoint, deployed_index_id = qna_using_query_routing_utils.get_deployed_index_id(me_index_name, me_region)
# print("Index endpoint :", index_endpoint)
# print("Deployed Index Id :", deployed_index_id)

## Chat interface using gradio app

#### Mount Learning Assistant app with gradio 
Now, Lets write a chat interface using Gradio that has two elements:

**Chatbot:**
`chatbot` element has chat UI, and it shows messages from both user and chatbot(virtual assistant we buit)

**Textbox:**
'msg` element allow gradio UI to accept the input text from the user. This text is passed to the system to fetch the answer either from Vector search or Gemini model.

In [None]:
# Load language models for QnA and conversational interaction
model = GenerativeModel(config["genai_qna"]["model_name"])

# Initialize core components using configuration settings
genai_assistant = IntentRouting(
    model=model,
    index_endpoint=index_endpoint,
    deployed_index_id=deployed_index_id,
    config_file=config_file,
    logger=logger,
)

# Start the chat session and provide initial instructions to the chatbot
default_programming_language = config["default"]["default_language"]
chat = model.start_chat(history=[])
_ = chat.send_message(
    f"""You are a Programming Language Learning Assistant.
        Your task is to undersand the question and respond with the descriptive answer for the same.

        Instructions:
        1. If programming language is not mentioned, then use {default_programming_language} as default programming language to write a code.
        2. Strictly follow the instructions mentioned in the question.
        3. If the question is not clear then you can answer "I apologize, but I am not able to understand the question. Please try to elaborate and rephrase your question."

        If the question is about other programming language then DO NOT provide any answer, just say "I apologize, but I am not able to understand the question. Please try to elaborate and rephrase your question."
"""
)

In [None]:
def respond(message, chat_history):
    """Handles user input within a Gradio chatbot interface."""
    (response, intent) = genai_assistant.classify_intent(
        message,
        session_state,
    )

    # append response to history
    chat_history.append((message, response))

    return "", chat_history

In [None]:
gr.close_all()  # Ensure a clean Gradio interface

with gr.Blocks() as demo:
    with gr.Tab("Learning Assistant"):
        # Welcome message for the chatbot
        bot_message = "Hi there! I'm Generative AI powered Learning Assistant. I can help you with coding tasks, answer questions, and generate code. Just ask me anything you need, and I'll do my best to help!"  # pylint: disable=C0301:line-too-long

        # Generate a unique session identifier
        session_state = str(uuid.uuid4())
        logger.info("session_state : %s", session_state)

        # Configure the chatbot's appearance using Chatbot element
        chatbot = gr.Chatbot(
            height=600,
            label="",  # No display label for the chatbot
            value=[[None, bot_message]],  # Initialize with the welcome message
            avatar_images=(
                None,
                "https://fonts.gstatic.com/s/i/short-term/release/googlesymbols/smart_assistant/default/24px.svg",
            ),  # Assistant avatar
            elem_classes="message",
            show_label=False,
        )

        # Configure the textbox for the user to enter questions.
        msg = gr.Textbox(
            scale=4,
            label="",
            placeholder="Enter your question here..",
            elem_classes=["form", "message-row"],
        )

        # Event handling, Link the `respond` function to the textbox, enabling interaction
        msg.submit(fn=respond, inputs=[msg, chatbot], outputs=[msg, chatbot])

#### Launch the gradio app to view the chatbot

**Note:**
1. For a better experience, Open the demo application interface in a new tab by clicking on the Localhost url generated after running this cell.
2. For debugging mode, set `debug=True`


**Example Questions to try on UI**
1. Where can we use python programming language?
2. What is the difference between list and set?
3. Fix the error in below code:

```
def create_dataset(id: str): -> None
...

SyntaxError: invalid syntax
```

In [None]:
demo.launch(share=True, debug=False)

### Close the demo

**Note:** Stop the previous cell to close the Gradio server running, then run this cell to free up the port utilised for running the server.

In [None]:
demo.close()

### Cleaning up
To clean up all Google Cloud resources used in this project, you can delete the Google Cloud project you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.

In [None]:
delete_bucket = False

# Force undeployment of indexes and delete endpoint
index_endpoint.delete(force=True)

# Delete indexes
index.delete()

if delete_bucket:
    ! gsutil rm -rf {BUCKET_URI}