In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Building Search Applications with Vertex AI Search

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fsearch%2Fvertexai-search-options%2Fvertexai_search_options.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/search/vertexai-search-options/vertexai_search_options.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/53/X_logo_2023_original.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/vertexai-search-options/vertexai_search_options.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            

| | |
|-|-|
| Author(s) | [Megha Agarwal](https://github.com/agarwal22megha/) |

## Overview

Vertex AI Search leverages decades of expertise Google has in information retrieval and brings together the power of deep information retrieval, state-of-the-art natural language processing, and the latest in large language model (LLM) processing to understand user intent and return the most relevant results for the user.

Based on where Developers are in their journey, their orchestration framework of choice,  they can select Vertex AI Search out-of-the-box capabilities or customize their search solutions with Vertex Retrievers or use the Vertex AI DIY APIs to build the end to end RAG application.

This notebook explores how you can leverage Vertex AI Search out-of-the-box capabilities/ customize your search application with Vertex AI Search Retriever via LangChain or Grounding Service.


We create a Vertex AI Agent Builder Search App with unstructured data store on Google Cloud Console.

Once the Search App is created, this notebook walks you through how you can leverage:

1. Vertex AI Search out-of-the-box capabilities by leveraging the [Vertex AI Agent Builder Python API reference documentation](https://cloud.google.com/python/docs/reference/discoveryengine/latest)

2. Leverage Vertex AI Search Datastore as a [Grounding Source](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/grounding) with Gemini Models directly to ground responses in your data

3. Leverage Vertex DataStore as [LangChain Retriever](https://python.langchain.com/v0.2/docs/integrations/retrievers/google_vertex_ai_search/)


## Get started

### Install Vertex AI SDK and other required packages


In [None]:
%pip install --upgrade --user --quiet google-cloud-aiplatform
%pip install google-cloud-discoveryengine
%pip install langchain_google_community
%pip install langchain langchain-google-vertexai
%pip install langchain-google-community[vertexaisearch]

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it has restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
PROJECT_ID = "your_project_id"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}


import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Create a Vertex AI Search App On Google Cloud
[Follow These Steps](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#unstructured-data_1) to leverage Vertex AI Agent Builder to create a Search App with unstructured data store on Google Cloud Console.

Once the search app is created successfully, make a Note of the Search App Id, Datastore ID, location from Vertex AI Agent Builder console.

In [None]:
# Set to your data store location
VERTEX_AI_SEARCH_LOCATION = "global"  # @param {type:"string"}
# Set to your search app ID
VERTEX_AI_SEARCH_APP_ID = "your_search_app_id"  # @param {type:"string"}
# Set to your data store ID
VERTEX_AI_SEARCH_DATASTORE_ID = "your_datastore_id"  # @param {type:"string"}

MODEL = "gemini-1.5-pro-001"  # @param {type:"string"}

SEARCH_QUERY = "When does Alphabet plan to get to net zero?"  # @param {type:"string"}

### Import libraries

In [None]:
from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1 as discoveryengine
from langchain.chains import RetrievalQAWithSourcesChain
from langchain_google_community import VertexAISearchRetriever
from langchain_google_vertexai import VertexAI
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    SafetySetting,
    Tool,
)
import vertexai.preview.generative_models as generative_models

## Part 1: Vertex AI Search Out-of-the-box capabilities
Leverage the [Vertex AI Agent Builder Python API reference documentation](https://cloud.google.com/python/docs/reference/discoveryengine/latest) to get the search results

In [None]:
CUSTOM_PROMPT = """
    <PERSONA_AND_GOAL>
      You are a helpful assistant knowledgeable about Alphabet quarterly earning reports.
      Be mindful of the time frame user inputs
      Help user with their queries related with Alphabet
      Only respond with relevant information available in Grounding Knowledge snippet.
      If no relevant snippet is available, respond with you dont know
      Do not make up information
    </PERSONA_AND_GOAL>
"""

In [None]:
def search_spec():
    content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
        snippet_spec=discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
            return_snippet=True
        ),
        summary_spec=discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec(
            summary_result_count=10,
            include_citations=True,
            ignore_adversarial_query=True,
            ignore_non_summary_seeking_query=True,
            model_prompt_spec=discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec.ModelPromptSpec(
                preamble=CUSTOM_PROMPT
            ),
            model_spec=discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec.ModelSpec(
                version="stable",
            ),
        ),
    )
    return content_search_spec


def search_sample(project_id: str, location: str, engine_id: str, search_query: str):
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    client = discoveryengine.SearchServiceClient(client_options=client_options)

    serving_config = f"projects/{project_id}/locations/{location}/collections/default_collection/engines/{engine_id}/servingConfigs/default_config"

    content_search_spec = search_spec()

    request = discoveryengine.SearchRequest(
        serving_config=serving_config,
        query=search_query,
        page_size=10,
        content_search_spec=content_search_spec,
        query_expansion_spec=discoveryengine.SearchRequest.QueryExpansionSpec(
            condition=discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO,
        ),
        spell_correction_spec=discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode=discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        ),
    )
    response = client.search(request)

    return response

In [None]:
search_response = search_sample(
    PROJECT_ID, VERTEX_AI_SEARCH_LOCATION, VERTEX_AI_SEARCH_APP_ID, SEARCH_QUERY
)

In [None]:
search_response.summary.summary_with_metadata

## Part 2: Grounding with Gemini on Vertex AI Search Datastore

Leverage Vertex AI Search Datastore as a [Grounding Source](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/grounding) with Gemini Models directly to ground responses in your data.

In [None]:
system_instructions = """
<PERSONA_AND_GOAL>
  You are a helpful assistant knowledgeable about Alphabet quarterly earning reports.
  Help user with their queries related with Alphabet by following given <INSTRUCTIONS> and <CONTEXT>
  only respond with information available in Grounding Knowledge store.
</PERSONA_AND_GOAL>

<INSTRUCTIONS>
- Always refer to the tool and Ground your answers in it
- Understand the retrieved snippet by the tool and only use that information to help users
- For supporting references, you can provide the Grounding tool snippets verbatim, and any other info like page number
- For Information not available in the tool, mention you dont have access to the information.
- Output "answer" should be I dont know when the user question is irrelevant or outside the <CONTEXT>
- Leave "reference_snippet" as null if you are not sure about the page and text snippet
<INSTRUCTIONS>

<CONTEXT>
  Grounding tool finds most relevant snippets from the Alphabet earning reports data store.
  Use the information provided by the tool as your knowledge base.
</CONTEXT>

<CONSTRAINTS>
- ONLY use information available from the Grounding tool
</CONSTRAINTS>

<OUTPUT_FORMAT>
- Response should ALWAYS be in following JSON Output with answer and reference_snippet as keys, e.g. {"answer": , "reference_snippet": }
</OUTPUT_FORMAT>
"""

In [None]:
tools = [
    Tool.from_retrieval(
        retrieval=generative_models.grounding.Retrieval(
            source=generative_models.grounding.VertexAISearch(
                datastore=VERTEX_AI_SEARCH_DATASTORE_ID,
                project=PROJECT_ID,
                location=VERTEX_AI_SEARCH_LOCATION,
            ),
            disable_attribution=False,
        )
    ),
]

generation_config = GenerationConfig(max_output_tokens=8192, temperature=1, top_p=0.95)

safety_settings = [
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
        threshold=SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH,
    ),
    SafetySetting(
        category=SafetySetting.HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH,
    ),
]

In [None]:
model = GenerativeModel(MODEL, tools=tools, system_instruction=[system_instructions])

response = model.generate_content(
    [SEARCH_QUERY],
    generation_config=generation_config,
    safety_settings=safety_settings,
    stream=True,
)

In [None]:
for candidate in response:
    print(candidate)

##### [Here](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb) is another Grounding Gemini with Vertex AI Search Example which you might find useful

## Part 3: Vertex AI Search with LangChain

Developers have the flexibility to incorporate Vertex AI Search as a [LangChain Retriever](https://python.langchain.com/v0.2/docs/integrations/retrievers/google_vertex_ai_search/) in their existing LangChain applications.

This means you can continue leveraging your preferred orchestrator while seamlessly integrating Vertex AI Search data stores into existing RAG pipelines.  Vertex AI Search enables Google-quality search capabilities applied directly to your custom data, elevating the search result quality and relevance of your retrieval-augmented generation workflows.

Find more notebook examples of leveraging VertexAISearchRetriever with LangChain [here](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/search/retrieval-augmented-generation/examples/question_answering.ipynb).


In [None]:
llm = VertexAI(model_name=MODEL)

retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    location_id=VERTEX_AI_SEARCH_LOCATION,
    data_store_id=VERTEX_AI_SEARCH_DATASTORE_ID,
    get_extractive_answers=True,
    max_documents=10,
    max_extractive_segment_count=1,
    max_extractive_answer_count=5,
)

In [None]:
retrieval_qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever
)

retrieval_qa_with_sources.invoke(SEARCH_QUERY, return_only_outputs=True)

## Cleaning up

[Delete the Vertex AI Search App](https://cloud.google.com/generative-ai-app-builder/docs/delete-engine) and

[Delete the datastore](https://cloud.google.com/generative-ai-app-builder/docs/delete-a-data-store) your created