# Vertex AI Agent Builder (Search App)

## Deep Dive

The purpose of this code is to explore the use of the client libraries and APIs in Vertex AI Agent Builder and the LangChain LLM integrations & retrievers to build a custom search app.

You'll use these tools to build a question and answer service that takes a user query, retrieves relevant documents from a Search data store in Vertex AI Agent Builder, then returns an LLM-generated answer to the original query along with source documents that were used to generate the answer.

Helpful resources for the lab coding exercise:

- [Vertex AI Agent Builder (Documentation)](https://cloud.google.com/generative-ai-app-builder/docs/introduction)
- [Vertex AI Search Retriever (LangChain Documentation)](https://python.langchain.com/docs/integrations/retrievers/google_vertex_ai_search/)
- [Question Answering Over Documents (GitHub)](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb)
- [Grounding Generative AI using Search Data Stores (Colab)](https://colab.research.google.com/drive/174YYPNNy1rWdIFvV-_LWZ-cueRB7Q6EC?resourcekey=0-9bYTUjXMbEkHIuduaNjNJw&usp=sharing)
- [Vertex AI Agent Builder - Search App (GitHub)](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gen-app-builder/search-web-app)

## Create a unstructured data search

### Step 1

Follow the steps to [create a unstructured data search app that uses the Alphabet Investors PDFs dataset](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#create_and_preview_a_search_app_for_unstructured_data_from)

### Step 2 (1 min)

Install the Vertex AI Agent Builder, Vertex AI, and LangChain client libraries for Python:

In [1]:
# Install packages
!pip install google-cloud-discoveryengine google-cloud-aiplatform langchain==0.1.16 langchain-google-vertexai==1.0.1 --upgrade --quiet
!pip install -U langchain-google-vertexai
!pip install --upgrade langchain google-cloud-aiplatform
!pip install langchain_google_community --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/817.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m809.0/817.7 kB[0m [31m37.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.3/53.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m50.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.5/130.5 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [23]:
# Note: This Colab notebook should be running as a user that has access to the project that contains your search app

import os
import sys

if "google.colab" in sys.modules:
    from google.colab import auth as google_auth

    google_auth.authenticate_user()

### Step 3 (10 mins)

Use the [Vertex AI Search document retriever in LangChain](https://python.langchain.com/docs/integrations/retrievers/google_cloud_enterprise_search) to retrieve documents from your data store based on a query.

Sample query: “What are Alphabet's social and environmental impact?”

In [24]:
PROJECT_ID = "andresousa-pso-upskilling"
LOCATION = "us-central1"
DATA_STORE_ID = "investor-data_1732554963521"

In [25]:
import vertexai

PROJECT_ID =PROJECT_ID
REGION = LOCATION

vertexai.init(project=PROJECT_ID, location=REGION)

In [26]:
from langchain_google_vertexai import VertexAI

In [27]:
from langchain_google_community import VertexAISearchRetriever
from google.cloud import discoveryengine_v1beta as discoveryengine

retriever = VertexAISearchRetriever(
   project_id=PROJECT_ID,
    data_store_id=DATA_STORE_ID,
    serving_config_id="demo_1732554835348",
    max_documents=3
)

In [28]:
# LLM model
llm = VertexAI(
    model_name="text-bison",
    max_output_tokens=256,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)

In [29]:
# Create chain to answer questions
from langchain.chains import RetrievalQA

# Uses LLM to synthesize results from the search index.
# We use Vertex PaLM Text API for LLM
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True
)

In [30]:
QUERY = "What was Alphabet's net income in 2024?"

result = qa({"query": QUERY})

print('Resposta:',result['result'])
print('\nFonte:', result['source_documents'])

Resposta:  Alphabet's net income in 2024 was $23,662 million.

Fonte: [Document(metadata={'id': 'b3f002410b0c9070afef3e147738cb01', 'source': 'gs://andresousa-demo-helder/goog-10-q-q1-2024.pdf6', 'previous_segments': [], 'next_segments': []}, page_content='Alphabet Inc.\nCONSOLIDATED STATEMENTS OF INCOME\n(in millions, except per share amounts; unaudited)\n\nThree Months Ended\nMarch 31,\n\n2023\n\n2024\n\nRevenues\n\n$\n\n69,787 $\n\n80,539\n\nCosts and expenses:\nCost of revenues\n\n30,612\n\n33,712\n\nResearch and development\n\n11,468\n\n11,903\n\nSales and marketing\n\n6,533\n\n6,426\n\nGeneral and administrative\n\n3,759\n\n3,026\n\nTotal costs and expenses\n\n52,372\n\n55,067\n\nIncome from operations\n\n17,415\n\n25,472\n\nOther income (expense), net\n\n790\n\n2,843\n\nIncome before income taxes\n\n18,205\n\n28,315\n\nProvision for income taxes\n\n3,154\n\n4,653\n\nNet income\n\n$\n\n15,051 $\n\n23,662\n\nBasic net income per share of Class A, Class B, and Class C stock\n\n$\n\

### Step 4 (15 mins)

Given a search query, use [LangChain's LLM integration with Vertex AI](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm) to send a search query and return an answer with source documents

Hint: Use [RetrievalQAWithSourcesChain](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb) and refer to the “Helpful resources” at the top of this notebook!

Sample query: “Who is the CEO of Google?”

In [31]:
import vertexai
from langchain_google_vertexai import VertexAI
from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever
from langchain.chains import RetrievalQAWithSourcesChain


# Uses LLM to synthesize results from the search index.
# We use Vertex PaLM Text API for LLM
qa = RetrievalQAWithSourcesChain.from_llm(
    llm=llm,  retriever=retriever
)

QUERY = "Who is the CEO of Google?"

result = qa({"question": QUERY})

print('Resposta:',result['answer'])
print('\nFontes:', result['sources'])

Resposta:  The provided text does not mention the CEO of Google.


Fontes: gs://andresousa-demo-helder/goog-10-q-q3-2024.pdf11, gs://andresousa-demo-helder/goog-10-q-q1-2024.pdf10, gs://andresousa-demo-helder/goog-10-q-q2-2024.pdf11


In [35]:
QUERY = "What is the revenue of Google in the last quarter in United States ?"

result = qa({"question": QUERY})

print('Resposta:',result['answer'])
print('\nFontes:', result['sources'])

Resposta:  The revenue of Google in the United States in the last quarter is 49% of total revenue.


Fontes: gs://andresousa-demo-helder/goog-10-q-q2-2024.pdf38
