In [1]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Grounding 的 Vertex AI 入門指南

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/doggy8088/generative-ai/blob/main/language/grounding/intro-grounding.zh.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> 於 Colab 中執行
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/doggy8088/generative-ai/blob/main/language/grounding/intro-grounding.zh.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> 在 GitHub 上檢視
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/doggy8088/generative-ai/blob/main/language/grounding/intro-grounding.zh.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> 在 Vertex AI Workbench 中開啟
    </a>
  </td>
</table>


| | |
|-|-|
| 作者 | [Kristopher Overholt](https://github.com/koverholt) |


**_NOTE_** : 此筆記本已在以下環境中進行了測試：

* Python 版本 = 3.11


## 概述

[在 Vertex AI 中奠基](https://cloud.google.com/vertex-ai/docs/generative-ai/grounding/ground-language-models) 讓你可以使用語言模型 (例如 [`text-bison` 與 `chat-bison`](https://cloud.google.com/vertex-ai/docs/generative-ai/language-model-overview)) 生成奠基於你自己文件與資料的內容。此功能讓模型在執行階段存取的資訊超出了其訓練資料。藉由在 [Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/enterprise-search-introduction) 中將模型回應奠基於 Google 搜尋結果或資料儲存空間，奠基於資料的 LLM 可產生更準確、最新且相關的回應。

奠基提供了以下優點：

- 減少模型幻覺 (模型產生不切實際內容的實例) 
- 將模型回應錨定至特定資訊、文件和資料來源
- 提升產生內容的可信度、準確性和可適用性

在 Vertex AI 中奠基的脈絡中，你可設定兩個不同的奠基來源：

1. 公開可用且已建立索引的資料的 Google 搜尋結果
1. [Vertex AI Search 中的資料儲存空間](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest)，其中可包含你的網站資料、非結構化資料或結構化資料等形式的資料

**注意：** 此範例筆記本中的一些功能需要早期使用允許清單來存取特定功能。[在 Vertex AI Search 中奠基](https://cloud.google.com/vertex-ai/docs/generative-ai/grounding/ground-language-models) 可在公開預覽版中使用，而使用 Google 網路搜尋結果奠基則可在私人預覽版中使用。若要申請早期存取私人預覽版的功能，請聯絡你的客戶代表或 [Google Cloud 支援](https://cloud.google.com/contact)。


### 目標

在此教學課程中，你將學習如何：

- 產生以 Google 搜尋結果為基礎的 LLM 文字和聊天機器人回應
- 比較未經基礎訓練的 LLM 回應和經基礎訓練的 LLM 回應的結果
- 在 Vertex AI 搜尋中建立和使用資料庫，以根據自訂的文件和資料建立回應基礎
- 產生以 Vertex AI 搜尋結果為基礎的 LLM 文字和聊天機器人回應
- 使用具有基礎功能的非同步文字和聊天機器人模型 API

本教學課程使用下列 Google Cloud AI 服務和資源：

- Vertex AI
- Vertex AI 搜尋和對話

執行的步驟包括：

- 設定 LLM 和提示，以供各種範例使用
- 在 Vertex AI 中將範例提示傳送至生成文字和聊天機器人模型
- 使用你自己的資料，在 Vertex AI 搜尋中設定資料庫
- 傳送具有各種基礎級別 (無基礎、網路基礎、資料庫基礎) 的範例提示


## 在你開始之前

### 設定你的 Google Cloud 專案

**以下步驟為必需，無關於你的筆記本環境。** 

1. [選擇或建立一個 Google Cloud 專案](https://console.cloud.google.com/cloud-resource-manager)。當你第一次建立一個帳戶時，你將會有 300 美元的免費額度可用於你的運算/儲存成本。
1. [確保已為你的專案啟用帳單](https://cloud.google.com/billing/docs/how-to/modify-project)。
1. 啟用 [Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) 和 [Vertex AI 搜尋與對話 API](https://console.cloud.google.com/flows/enableapi?apiid=discoveryengine.googleapis.com)。
1. 如果你想將落地與 Google 網路搜尋結果結合使用，則你的專案也必須在此功能處於 Private Preview 階段時加入允許清單。
1. 如果你在本機上執行這個筆記本，則你需要安裝 [Cloud SDK](https://cloud.google.com/sdk)。


### 安裝

安裝執行此筆記本所需的下列套件。


In [2]:
!pip install --upgrade --quiet google-cloud-aiplatform==1.38.1

安裝套件後重新啟動 Kernel：


In [3]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### 設定你的專案 ID

**如果你不知道你的專案 ID，** 請嘗試執行下列操作：
* 執行 `gcloud config list`。
* 執行 `gcloud projects list`。
* 查看支援頁面：[找出專案 ID](https://support.google.com/googleapi/answer/7014113)


In [4]:
PROJECT_ID = "your-project-id"  # @param {type:"string"}

# Set the project ID
!gcloud config set project {PROJECT_ID}

Updated property [core/project].


### 設定你的區域

你也可以變更 Vertex AI 使用的 `REGION` 變數。深入瞭解 [Vertex AI 地區](https://cloud.google.com/vertex-ai/docs/general/locations)。


In [5]:
REGION = "us-central1"  # @param {type: "string"}

### 驗證你的 Google Cloud 帳戶

如果你在 Google Colab 上執行此筆記本，你將需要驗證你的環境。為執行此操作，請執行下列新Cell。如果你使用 Vertex AI Workbench，這個步驟並非必要。


In [6]:
import sys

if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### 匯入函式庫


In [7]:
import vertexai
from vertexai.language_models import TextGenerationModel, ChatModel, GroundingSource

### 初始化 Python 的 Vertex AI SDK

為你的專案初始化 Python 的 Vertex AI SDK：


In [8]:
vertexai.init(project=PROJECT_ID, location=REGION)

初始化 Vertex AI 中的生成式文字和聊天模型：


In [9]:
text_model = TextGenerationModel.from_pretrained("text-bison")
chat_model = ChatModel.from_pretrained("chat-bison")

## 例子：使用 Google 搜尋結果作為接地

這個例子中，你會將沒有接地的 LLM 回應與接地 Google 搜尋結果後之回應進行比較。你會針對 Google 商店最近發布的硬體提問。


In [10]:
PROMPT = (
    "What are the price, available colors, and storage size options of a Pixel Tablet?"
)

### 無基礎文字生成

向 LLM 發出沒有基礎的預測要求：


In [11]:
response = text_model.predict(PROMPT)
response

 **Price:**

* Starting at $399 for the Wi-Fi-only model with 128GB of storage
* $499 for the Wi-Fi + 5G model with 128GB of storage
* $599 for the Wi-Fi + 5G model with 256GB of storage

**Available Colors:**

* Chalk (white)
* Charcoal (black)
* Sage (green)

**Storage Size Options:**

* 128GB
* 256GB

### 文字產生基於 Google 搜尋結果

現在你可以新增關鍵字參數 `grounding_source`，其 grounding source 為 `GroundingSource.WebSearch()`，指示 LLM 先利用提示執行 Google 搜尋，然後根據網路搜尋結果建構答案：


In [12]:
grounding_source = GroundingSource.WebSearch()

response = text_model.predict(
    PROMPT,
    grounding_source=grounding_source,
)

response, response.grounding_metadata

( The Pixel Tablet starts at $499 in the US, £599 in the UK, €679 throughout selected European regions, and CAD $699 in Canada. It comes in three colors: Porcelain (white), Rose (pink), and Hazel (green). The storage size options are 128GB and 256GB.,
 GroundingMetadata(citations=[GroundingCitation(start_index=1, end_index=129, url='https://www.androidauthority.com/google-pixel-tablet-3163922/', title=None, license=None, publication_date=None), GroundingCitation(start_index=130, end_index=206, url='https://www.androidpolice.com/google-pixel-tablet/', title=None, license=None, publication_date=None)], search_queries=['Pixel Tablet price, colors, and storage size options?']))

請注意，沒有基礎的回應只有來自 LLM 關於 Pixel 平板電腦的有限資訊。而以網路搜尋結果為基礎的回應包含網路搜尋結果的最新資訊，這些資訊會作為帶有基礎要求的 LLM 一部分被返回。


##範例：以客製化文件和資料為基礎

在這個範例中，你會將大型語言模型不依據任何根據的回應與奠基於在 Vertex AI 搜尋 [資料儲存結果](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest) 的回應做比較。你會詢問一個關於建立 [BigQuery 中的物件資料表](https://cloud.google.com/bigquery/docs/object-table-introduction) 的 GoogleSQL 查詢問題。


### 在 Vertex AI Search 中建立一個資料儲存庫

請按照 [Vertex AI Search 入門文件](https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search#create_a_search_app_for_website_data) 中的步驟，在 Vertex AI Search 中使用範例資料建立一個資料儲存庫。在此範例中，你會使用一個基於網站的資料儲存庫，其中包含來自 Google Cloud 網站的內容，包括文件。

建立好資料儲存庫後，取得資料儲存庫 ID 並在下方輸入。


In [13]:
DATA_STORE_ID = "your-data-store-id_1234567890123"  # Replace this with your data store ID from Vertex AI Search
DATA_STORE_REGION = "global"

現在，可以針對 BigQuery 中的物件表格提出問題以及在什麼時候使用它們：


In [14]:
PROMPT = "When should I use an object table in BigQuery? And how does it store data?"

### 無基礎文字生成

向 LLM 發出沒有基礎的預測要求：


In [15]:
response = text_model.predict(PROMPT)

response, response.grounding_metadata

( **When to use an object table in BigQuery**

Object tables are a specialized type of table in BigQuery that is designed for storing and querying semi-structured data. Semi-structured data is data that does not conform to a fixed schema, such as JSON, XML, or Avro.

Object tables are useful for storing data that is:

* **Complex and hierarchical:** Object tables can store data that is nested or has a complex structure. For example, you could store a JSON object that represents a customer record, which includes the customer's name, address, and order history.
* **Changing frequently:** Object,
 GroundingMetadata(citations=[], search_queries=[]))

### 以 Vertex AI 搜尋結果為基礎的文字生成

現在，我們可以在 `grounding_source` 關鍵字參數中加入 `GroundingSource.VertexAISearch()` 的根據來源，指示 LLM 首先在你的自訂資料儲存中進行搜尋，然後根據相關文件建立答案：


In [16]:
grounding_source = GroundingSource.VertexAISearch(
    data_store_id=DATA_STORE_ID, location=DATA_STORE_REGION
)

response = text_model.predict(
    PROMPT,
    grounding_source=grounding_source,
)

response, response.grounding_metadata

( **When to use an object table in BigQuery**

Object tables are useful for storing and analyzing unstructured data, such as images, videos, and audio files. They can also be used to store semi-structured data, such as JSON or XML files.

Object tables are particularly useful when you need to:

* Store large amounts of unstructured data
* Perform complex analysis on unstructured data
* Share unstructured data with others
* Access unstructured data from multiple locations

**How object tables store data**

Object tables store data in a columnar format, which makes it efficient to query and analyze large amounts of data. Each column,
 GroundingMetadata(citations=[], search_queries=['When should I use an object table in BigQuery?']))

請注意，沒有基礎的回應只有關於 BigQuery 中物件表格的有限資訊，可能不準確。而基於 Vertex AI 搜尋結果的回應包含 Google Cloud 文件中關於 BigQuery 的最新資訊。


## 範例：實際對話回應

你也可以使用 grounding 機制，在 Vertex AI 中使用對話模型時。在本範例中，你將比較沒有 grounding 機制的 LLM 回應，以及 grounding 到 Google 搜尋結果和 Vertex AI Search 中資料儲存所產生。

你會詢問一個有關 Vertex AI 的問題，和一個後續關於 Vertex AI 中管理資料集的問題：


In [17]:
PROMPT = "What are managed datasets in Vertex AI?"
PROMPT_FOLLOWUP = "What types of data can I use"

### 無接地的聊天會話

開始聊天會話，並向無接地的 LLM 傳送訊息：


In [18]:
chat = chat_model.start_chat()

response = chat.send_message(PROMPT)
print(response.text)

response = chat.send_message(PROMPT_FOLLOWUP)
print(response.text)

 Managed datasets are a feature of Vertex AI that allows you to easily create, manage, and version your datasets. With managed datasets, you can:

* **Easily create datasets:** You can create datasets from a variety of sources, including Cloud Storage, BigQuery, and CSV files.
* **Manage datasets:** You can view, edit, and delete datasets. You can also add and remove columns from datasets.
* **Version datasets:** You can create new versions of datasets. This allows you to track changes to your datasets over time.
* **Share datasets:** You can share datasets with other users in your organization.
* **Use datasets in Vertex AI models:** You can use managed datasets to train and evaluate Vertex AI models.

Managed datasets are a powerful tool that can help you to improve the performance of your Vertex AI models.
 You can use a variety of data types with managed datasets, including:

* **Structured data:** Structured data is data that is organized in a tabular format. Examples of structure

### 聊天會話建立在 Google 搜尋結果上

現在你可以加入 `grounding_source` 關鍵字參數，並輸入 `GroundingSource.WebSearch()` 的根源字，指示聊天模型先執行 Google 搜尋快速指令，再根據網路搜尋結果建構答案：


In [19]:
chat = chat_model.start_chat()
grounding_source = GroundingSource.WebSearch()

response = chat.send_message(
    PROMPT,
    grounding_source=grounding_source,
)
print(response.text)
print(response.grounding_metadata)

response = chat.send_message(
    PROMPT_FOLLOWUP,
    grounding_source=grounding_source,
)
print(response.text)
print(response.grounding_metadata)

 Managed datasets in Vertex AI are a way to store and manage your data for use in machine learning models. They provide a number of benefits, including:

- **Centralized storage:** Managed datasets are stored in a central location, making them easy to access and manage.
- **Data versioning:** Managed datasets support data versioning, so you can easily track changes to your data over time.
- **Data security:** Managed datasets are encrypted at rest and in transit, so you can be sure that your data is safe.
- **Data processing:** Managed datasets can be processed using a variety of tools, including Vertex AI's built-in data processing tools.
GroundingMetadata(citations=[], search_queries=['Vertex AI managed datasets?'])
 You can use a variety of data types in managed datasets, including:

- **Structured data:** Structured data is data that is organized in a tabular format, such as CSV files or SQL tables.
- **Unstructured data:** Unstructured data is data that is not organized in a tabul

### 對話會談建構在 Vertex AI Search 的檢索結果上

現在你可以加入 `grounding_source` 關鍵字 arg，其中接地來源為 `GroundingSource.VertexAISearch()`，指示對話模型會先在你的自訂資料儲存庫內執行檢索，然後根據相關文件建構答案：


In [20]:
chat = chat_model.start_chat()
grounding_source = GroundingSource.VertexAISearch(
    data_store_id=DATA_STORE_ID, location=DATA_STORE_REGION
)

response = chat.send_message(
    PROMPT,
    grounding_source=grounding_source,
)
print(response.text)
print(response.grounding_metadata)

response = chat.send_message(
    PROMPT_FOLLOWUP,
    grounding_source=grounding_source,
)
print(response.text)
print(response.grounding_metadata)

 Managed datasets in Vertex AI are used to provide the source data for training AutoML and custom models.
GroundingMetadata(citations=[GroundingCitation(start_index=1, end_index=105, url='https://cloud.google.com/vertex-ai/docs/datasets/overview', title=None, license=None, publication_date=None)], search_queries=['Vertex AI managed datasets?'])
 Managed datasets in Vertex AI are used to provide the source data for training AutoML and custom models.
GroundingMetadata(citations=[GroundingCitation(start_index=1, end_index=105, url='https://cloud.google.com/vertex-ai/docs/datasets/overview', title=None, license=None, publication_date=None)], search_queries=['Vertex AI managed datasets'])


## 範例：歸因的非同步文字和聊天回應

你也可以在使用非同步 API 處理文字和聊天模型時，在 Vertex AI 中使用歸因。在這個範例中，你會將 LLM 回應與沒有歸因的回應，與以 Vertex AI Search 中資料儲存結果為基礎的歸因回應進行比較。

你會詢問一個有關 Google Cloud 中不同服務的問題。


In [21]:
PROMPT = "What are the different types of databases available in Google Cloud?"

### 非同步的文字生成，以 Google 搜尋結果為依據


In [22]:
grounding_souce = GroundingSource.WebSearch()

response = await text_model.predict_async(
    PROMPT,
    grounding_source=grounding_souce,
)

response, response.grounding_metadata

( The different types of databases available in Google Cloud are:

1. Cloud Spanner: It provides all the relational database capabilities of Cloud SQL along with horizontal scalability which usually comes with NoSQL databases.

2. Cloud Bigtable: Users can store different types of data, including time-series, marketing, financial, IoT, and graph data. Cloud Bigtable also integrates with popular big data.

3. Cloud SQL: Provides managed MySQL, PostgreSQL, and SQL Server databases on Google Cloud.

4. AlloyDB: It is a fully managed PostgreSQL-compatible database service that offers high performance and scalability for PostgreSQL workloads.,
 GroundingMetadata(citations=[GroundingCitation(start_index=69, end_index=226, url='https://medium.com/google-cloud/choose-the-right-database-service-in-gcp-8e3803245e1d', title=None, license=None, publication_date=None), GroundingCitation(start_index=230, end_index=352, url='https://www.techtarget.com/searchcloudcomputing/feature/7-Google-Cloud-datab

### 在 Vertex AI 搜尋結果中進行非同步文字產生


In [23]:
grounding_souce = GroundingSource.VertexAISearch(
    data_store_id=DATA_STORE_ID, location=DATA_STORE_REGION
)

response = await text_model.predict_async(
    PROMPT,
    grounding_source=grounding_souce,
)

response, response.grounding_metadata

( The different types of databases available in Google Cloud are:

1. **Cloud SQL**: A fully-managed database service that supports MySQL, PostgreSQL, and SQL Server.
2. **BigQuery**: A serverless, highly scalable data warehouse that can handle petabytes of data.
3. **Spanner**: A globally distributed, highly scalable relational database that supports ACID transactions.
4. **Firestore**: A NoSQL document database that is ideal for real-time applications.
5. **Memorystore**: An in-memory data store that is ideal for applications that require fast access to data.,
 GroundingMetadata(citations=[GroundingCitation(start_index=1, end_index=65, url='https://cloud.google.com/learn/what-is-a-cloud-database', title=None, license=None, publication_date=None)], search_queries=['What are the different types of databases available in Google Cloud?']))

### 非同步聊天會話建立在 Google 搜尋結果上


In [24]:
chat = chat_model.start_chat()

grounding_source = GroundingSource.WebSearch()
response = await chat.send_message_async(
    PROMPT,
    grounding_source=grounding_source,
)

response, response.grounding_metadata

( - Cloud Spanner
- Cloud Bigtable
- Cloud SQL
- AlloyDB,
 GroundingMetadata(citations=[], search_queries=['What are the different types of databases available in Google Cloud?']))

### 非同步聊天會話基於 Vertex AI 搜尋結果


In [25]:
chat = chat_model.start_chat()

grounding_source = GroundingSource.VertexAISearch(
    data_store_id=DATA_STORE_ID, location=DATA_STORE_REGION
)
response = await chat.send_message_async(
    PROMPT,
    grounding_source=grounding_source,
)

response, response.grounding_metadata

( The different types of databases available in Google Cloud are:

1. [2] **Cloud SQL**: A fully-managed database service that supports MySQL, PostgreSQL, and SQL Server.
2. [2] **BigQuery**: A serverless, highly scalable data warehouse that can handle large amounts of data.
3. [2] **Spanner**: A globally-distributed, highly available relational database.
4. [2] **Firestore**: A NoSQL document database that is ideal for real-time applications.
5. [2] **Memorystore**: An in-memory data store that is optimized for high-performance applications.,
 GroundingMetadata(citations=[GroundingCitation(start_index=1, end_index=65, url='https://cloud.google.com/learn/what-is-a-cloud-database', title=None, license=None, publication_date=None)], search_queries=['What are the different types of databases available in Google Cloud?']))

## 清除

要避免因這個筆記本使用的資源而向你的 Google Cloud 帳戶收費，請執行以下步驟：

1. 要避免不必要的 Google Cloud 費用，請使用 [Google Cloud 控制台](https://console.cloud.google.com/) 刪除你不需要的專案。如需進一步瞭解，請參閱 Google Cloud 的記錄，了解 [如何管理並刪除專案](https://cloud.google.com/resource-manager/docs/creating-managing-projects)。
1. 如果你使用現有的 Google Cloud 專案，請刪除你建立的資源，以避免向你的帳戶收費。如需進一步瞭解，請參閱 [從 Vertex AI 搜尋資料儲存刪除資料](https://cloud.google.com/generative-ai-app-builder/docs/delete-datastores) 的記錄，然後刪除你的資料儲存。
1. 在 Google Cloud 控制台中停用 [Vertex AI 搜尋和對話 API](https://pantheon.corp.google.com/apis/api/discoveryengine.googleapis.com) 和 [Vertex AI API](https://pantheon.corp.google.com/apis/api/aiplatform.googleapis.com)。
