##### 版權所有 2023 Google LLC.


In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 使用 ChromaDB 進行文件問答


<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/doggy8088/gemini-api-cookbook/blob/zh-tw/examples/vectordb_with_chroma.zh.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />在 Google Colab 中執行</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/doggy8088/gemini-api-cookbook/blob/zh-tw/examples/vectordb_with_chroma.zh.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />在 GitHub 上檢視原始程式碼</a>
  </td>
</table>


## 概述

本教學會教導你如何使用 Gemini API 建立向量資料庫，以及如何從資料庫中取得問題答案。此外，你將會使用 [ChromaDB](https://docs.trychroma.com/){:.external}，一個用 Python 建立內嵌資料庫的開放原始碼工具。ChromaDB 讓你可以：

* 儲存內嵌資料及其的元資料
* 內嵌文件與查詢
* 搜尋內嵌資料庫

在本教學中，你將使用內嵌資料從 ChromaDB 建立的向量資料庫中取得答案。

## 前提條件

你可以在 Google Colab 中執行此快速上手指南。

若要在你自己的開發環境中完成此快速上手指南，請確保你的環境符合下列需求：

-  Python 3.9+
-  安裝 `jupyter` 執行 Jupyter Notebook。


## 安裝

首先，下載並安裝 ChromaDB 和 Gemini API Python 函式庫。


In [None]:
!pip install -U -q google.generativeai

In [None]:
!pip install -q chromadb==0.4.24

然後匯入你將在這項教學課程中使用的模組。


In [None]:
import textwrap
import chromadb
import numpy as np
import pandas as pd

import google.generativeai as genai
import google.ai.generativelanguage as glm

# Used to securely store your API key
from google.colab import userdata

from IPython.display import Markdown
from chromadb import Documents, EmbeddingFunction, Embeddings

### 取得 API 金鑰

在使用 Gemini API 之前，你必須先取得 API 金鑰。如果你尚未取得金鑰，請在 Google AI Studio 按一下即可建立一個金鑰。

<a class="button button-primary" href="https://makersuite.google.com/app/apikey" target="_blank" rel="noopener noreferrer">取得 API 金鑰</a>

在 Colab 中，將金鑰新增至左側面板中「🔑」下的秘密管理員。將其命名為 `API_KEY`。

取得 API 金鑰後，將其傳遞至 SDK。你可以透過兩種方式進行：

* 將金鑰新增至 `GOOGLE_API_KEY` 環境變數 (SDK 將自動從這裡取得金鑰)。
* 將金鑰傳遞至 `genai.configure(api_key=...)`


In [None]:
# Or use `os.getenv('API_KEY')` to fetch an environment variable.
API_KEY=userdata.get('API_KEY')

genai.configure(api_key=API_KEY)

重點：接下來，你將選擇一個模型。任何內嵌模型都適用於本教學課程，但對於實際應用來說，選擇一個特定模型並堅持使用非常重要。不同模型的輸出彼此不相容。

**備註** ：目前，Gemini API [僅在特定區域提供](https://ai.google.dev/available_regions)。


In [None]:
for m in genai.list_models():
  if 'embedContent' in m.supported_generation_methods:
    print(m.name)

models/embedding-001
models/embedding-001


### 資料

以下是你將使用來建立嵌入式資料庫的一小組文件：


In [None]:
DOCUMENT1 = "Operating the Climate Control System  Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console.  Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it."
DOCUMENT2 = "Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the \"Navigation\" icon to get directions to your destination or touch the \"Music\" icon to play your favorite songs."
DOCUMENT3 = "Shifting Gears Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions."

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

## 使用 ChromaDB 建立嵌入資料庫

你將為使用 Gemini API 執行嵌入建立一個 [自訂函式](https://docs.trychroma.com/embeddings#custom-embedding-functions){:.external}。透過將一組文件輸入這個自訂函式，你可以收到向量或文件的嵌入。


### Embeddings API 變更為使用模型 embedding-001 的 Embeddings

對於新的 Embeddings 模型 embedding-001，有一個新的 task type 參數和可選標題 (僅在 task_type=`RETRIEVAL_DOCUMENT` 的情況下有效)。

這些新參數僅套用於最新的 Embeddings 模型。任務類型如下：

任務類型 | 說明
---       | ---
RETRIEVAL_QUERY	| 指出提供的文字是搜尋或擷取設定中的查詢。
RETRIEVAL_DOCUMENT | 指出提供的文字是搜尋或擷取設定中的文件。
SEMANTIC_SIMILARITY	| 指出提供的文字將用於語意文字相似度 (STS)。
CLASSIFICATION	| 指出這些 Embeddings 將用於分類。
CLUSTERING	| 指出這些 Embeddings 將用於分群。


In [None]:
class GeminiEmbeddingFunction(EmbeddingFunction):
  def __call__(self, input: Documents) -> Embeddings:
    model = 'models/embedding-001'
    title = "Custom query"
    return genai.embed_content(model=model,
                                content=input,
                                task_type="retrieval_document",
                                title=title)["embedding"]

現在你將建立向量資料庫。在 `create_chroma_db` 函式中，你將實例化一個 [Chroma 執行個體](https://docs.trychroma.com/getting-started){:.external}。從那裡，你將建立一個收集，你將儲存你的嵌入、文件和任何元資料的地方。請注意，以上來自嵌入函式作為引數傳遞至 `create_collection`。

接下來，你將使用 `add` 方法將文件加入至收集。


In [None]:
def create_chroma_db(documents, name):
  chroma_client = chromadb.Client()
  db = chroma_client.create_collection(name=name, embedding_function=GeminiEmbeddingFunction())

  for i, d in enumerate(documents):
    db.add(
      documents=d,
      ids=str(i)
    )
  return db

In [None]:
# Set up the DB
db = create_chroma_db(documents, "googlecarsdatabase")

透過查看資料庫確認資料已插入：


In [None]:
pd.DataFrame(db.peek(3))

Unnamed: 0,ids,embeddings,metadatas,documents,uris,data
0,0,"[-0.020994942635297775, -0.03876612335443497, ...",,Operating the Climate Control System Your Goo...,,
1,1,"[0.017410801723599434, -0.04757162556052208, -...",,Your Googlecar has a large touchscreen display...,,
2,2,"[-0.03194405511021614, -0.023281503468751907, ...",,Shifting Gears Your Googlecar has an automatic...,,


## 取得相關文件

`db` 是 Chroma 蒐集物件。你可以呼叫 `query` 以執行最近鄰搜尋來尋找類似的嵌入或文件。


In [None]:
def get_relevant_passage(query, db):
  passage = db.query(query_texts=[query], n_results=1)['documents'][0][0]
  return passage

In [None]:
# Perform embedding search
passage = get_relevant_passage("touch screen features", db)
Markdown(passage)

Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the "Navigation" icon to get directions to your destination or touch the "Music" icon to play your favorite songs.

找到文件集中的相關段落後，可以使用它將提示傳遞到 Gemini API 中。


In [None]:
def make_prompt(query, relevant_passage):
  escaped = relevant_passage.replace("'", "").replace('"', "").replace("\n", " ")
  prompt = ("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \
  Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
  However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
  strike a friendly and converstional tone. \
  If the passage is irrelevant to the answer, you may ignore it.
  QUESTION: '{query}'
  PASSAGE: '{relevant_passage}'

    ANSWER:
  """).format(query=query, relevant_passage=escaped)

  return prompt

傳遞查詢給提示:


In [None]:
query = "How do you use the touchscreen in the Google car?"
prompt = make_prompt(query, passage)
Markdown(prompt)

You are a helpful and informative bot that answers questions using text from the reference passage included below.   Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.   However, you are talking to a non-technical audience, so be sure to break down complicated concepts and   strike a friendly and converstional tone.   If the passage is irrelevant to the answer, you may ignore it.
  QUESTION: 'How do you shift gears in the Google car?'
  PASSAGE: 'Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the Navigation icon to get directions to your destination or touch the Music icon to play your favorite songs.'

    ANSWER:
  

現在使用 `generate_content` 方法從模型產生回應。


In [None]:
model = genai.GenerativeModel('gemini-pro')
answer = model.generate_content(prompt)
Markdown(answer.text)

## 後續步驟

如需瞭解如何使用內嵌，請查看可用的 [範例](https://ai.google.dev/examples?keywords=embed)。如需瞭解如何使用 Gemini API 中的其他服務，請參閱 [Python 快速入門](https://ai.google.dev/gemini-api/docs/get-started/python)。
