# Gen App Builder - Enterprise Search - Technical Deep Dive - Lab Exercise

The purpose of this lab is to explore the use of the client libraries and APIs in Gen App Builder and the Langchain LLM integrations and retrievers for Enterprise Search and Vertex AI.

You'll use these tools to build a question and answer service that takes a user query, retrieves relevant documents from a search data store in Gen App Builder, then returns an LLM-generated answer to the original query along with source documents that were used to generate the answer.

Helpful resources for the lab coding exercise:

- [Gen App Builder Code Samples (Documentation)](https://cloud.google.com/generative-ai-app-builder/docs/samples)
- [Question Answering Over Documents (GitHub)](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb)
- [Grounding Generative AI using Enterprise Search Results (Colab)](https://colab.research.google.com/drive/174YYPNNy1rWdIFvV-_LWZ-cueRB7Q6EC?resourcekey=0-9bYTUjXMbEkHIuduaNjNJw&usp=sharing)
- [Gen App Builder - Search Web App (GitHub)](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gen-app-builder/search-web-app)

# Coding exercise (Technical asset)

## Step 1

Follow the steps to [create a unstructured data search app that uses the Alphabet Investors PDFs data](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#create_and_preview_an_unstructured_data_search_app)

## Step 2

Install the Enterprise Search, Vertex AI, and Langchain 0.0.236 (newer versions are broken as of 2023-08-10) client libraries for Python:

In [1]:
# Install packages
# Note: You might need to restart the runtime after installing these packages
!pip install google-cloud-discoveryengine google-cloud-aiplatform langchain==0.0.236 "shapely<2.0.0" -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m918.2/918.2 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m29.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m39.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not 

In [2]:
# Note: This Colab notebook should be running as a user that has access to the project that contains your search app

import os
import sys

if "google.colab" in sys.modules:
    from google.colab import auth as google_auth

    google_auth.authenticate_user()

In [3]:
from os.path import basename
from typing import Dict, List, Optional, Tuple, Any

## Step 3

Use the [Enterprise Search document retriever in LangChain](https://python.langchain.com/docs/integrations/retrievers/google_cloud_enterprise_search) to retrieve documents from your data store based on a query.

Sample query: “What are Alphabet's social and environmental impact?”

In [15]:
PROJECT_ID = "genai-ml-project"
LOCATION = "us"  # e.g., "us-central1"
DATA_STORE_ID = "alphainvest_1701858245424"  # e.g., "investor-pdfs_1791245104861"

In [31]:
from google.cloud import discoveryengine_v1beta as discoveryengine
from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever

QUERY = "What are Alphabet's social and environmental impact?"

# Code your solution here

retriever = GoogleCloudEnterpriseSearchRetriever(
    project_id=PROJECT_ID,
    search_engine_id=DATA_STORE_ID,
    max_documents=3,
)

result = retriever.get_relevant_documents(QUERY)
for doc in result:
    print(doc)


page_content='Culture and Workforce\nWe’re a company of curious, talented and passionate people. We embrace collaboration and creativity, and encourage the iteration\nof ideas to address complex challenges in technology and society.\nOur people are critical for our continued success. We work hard to provide an environment where Googlers can have fulfilling careers,\nand be happy, healthy and productive. We offer industry-leading benefits and programs to take care of the diverse needs of our\nemployees and their families, including access to excellent healthcare choices, opportunities for career growth and development,\nand resources to support their financial health. Our competitive compensation programs help us to attract and retain top candidates,\nand we will continue to invest in recruiting talented people to technical and non-technical roles and rewarding them well.\nAlphabet is committed to making diversity, equity, and inclusion part of everything we do and we’re committed to gr

## Step 4

Given a search query, use [Langchain's LLM integration with Vertex AI](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm) to send a search query and return an answer with source documents

Hint: Use [RetrievalQAWithSourcesChain](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb) and refer to the “Helpful resources” at the top of this notebook!

Sample query: “Who is the CEO of DeepMind?”

In [40]:
import vertexai
from langchain.llms import VertexAI
from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever
from langchain.chains import RetrievalQAWithSourcesChain

PROJECT_ID = "genai-ml-project"
LOCATION = "us-central1"  # e.g., "us-central1"
DATA_STORE_ID = "alphainvest_1701858245424"  # e.g., "investor-pdfs_1791245104861"
MODEL="text-bison@001"


QUERY="Who is the CEO of DeepMind?"

# Code your solution here
vertexai.init(project=PROJECT_ID, location=LOCATION)

llm = VertexAI(model_name=MODEL)

retriever = GoogleCloudEnterpriseSearchRetriever(
    project_id=PROJECT_ID,
    search_engine_id=DATA_STORE_ID,
    get_extractive_answers=True,
    max_documents=10,
    max_extractive_segment_count=1,
    max_extractive_answer_count=5,
)

retrieval_qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever
)

retrieval_qa_with_sources({"question": QUERY}, return_only_outputs=True)



{'answer': 'DeepMind is a subsidiary of Alphabet. Sundar Pichai is the CEO of Alphabet.',
 'sources': ''}

# Questions and answers (Use case asset)

### **Question 1**

What are the pros and cons of using Langchain vs. the native client libraries for Vertex AI and Enterprise Search?

### **Answer 1**

####Pros of Langchain####

Langchain provides a single interface to interact with different AI APIs, reducing the need to learn and use multiple client libraries.

Langchain abstracts away the underlying API details, simplifying development tasks.

Langchain supports various data structures and workflows, offering more flexibility than native client libraries.

Langchain has a growing community and offers tutorials and examples, making it easier to get started.

Langchain integrates with popular tools and frameworks, such as Jupyter Notebook and TensorFlow, streamlining development workflows.

####Cons of Langchain####

Langchain is still under development, and its feature set may not be as mature as native client libraries.

While Langchain's documentation is improving, it may not be as comprehensive as the documentation for native client libraries.

While the Langchain community is growing, it may not be as large or active as the communities surrounding native client libraries.

Langchain's abstraction can be beneficial, but it may also hide some of the underlying API details, making it harder for developers to understand the nuances of the APIs.

Using Langchain adds an additional dependency to your project, which may not be ideal for all situations.

### **Question 2**

In the lab coding exercise, what is the benefit of using a Langchain retriever with Enterprise Search in the chain as opposed to just asking an LLM directly?

### **Answer 2**

Enterprise Search specializes in indexing and searching enterprise data, which can be more accurate and factual than the broader knowledge base used by LLMs  and can provide specific grounding in an Enterprise Subject Matter Expertise.

By first retrieving relevant documents from Enterprise Search, the LLM is provided with a more focused context and less prone to generating outputs based on incomplete or inaccurate information.

Enterprise Search can be configured to restrict access to sensitive information, which can help to comply with security and privacy regulations.

Enterprise Search can be scaled to handle large amounts of data, which can be important for organizations with large knowledge bases.


### **Question 3**

What are the benefits and risks of using a custom/DIY approach with Langchain on top of Enterprise Search?

### **Answer 3**

#### Benefits

You have complete control over the retrieval process and can tailor it to your specific needs.

Langchain provides a flexible framework that allows you to integrate with other tools and systems. This can be helpful for building custom applications and workflows.

If you have the expertise and resources, a DIY approach can be more cost-effective than using a managed service or pre-built solution.

You have full control over the data and how it is used, allowing for greater transparency and accountability.

Building your own solution can be a valuable learning experience and contribute to your team's expertise in AI and search technologies.

#### Risks

Building and maintaining a custom solution can be complex and require significant technical expertise. This can lead to challenges in development, deployment, and ongoing maintenance.

Developing and refining a custom solution can be time-consuming, especially for organizations with limited resources.

Implementing custom security measures can be a challenge and may require additional effort to ensure data protection and compliance.

Customized solutions are more prone to errors and bugs, which can negatively impact performance and accuracy.

Unlike managed services, DIY solutions often lack readily available support options, making troubleshooting and issue resolution more challenging.
