##### 版權 2024 Google LLC.


In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://ai.google.dev/gemma/docs/integrations/langchain"><img src="https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png" height="32" width="32" />在 ai.google.dev 上檢視</a>
  </td>
    <td>
    <a target="_blank" href="https://colab.research.google.com/github/doggy8088/generative-ai-docs/blob/main/site/zh/gemma/docs/integrations/langchain.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />在 Google Colab 上執行</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/doggy8088/generative-ai-docs/blob/main/site/zh/gemma/docs/integrations/langchain.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />在 GitHub 上檢視原始碼</a>
  </td>
</table>


# 開始使用 Gemma 與 LangChain

此教學課程將展示如何開始使用 [Gemma](https://ai.google.dev/gemma/docs) 和 [LangChain](https://python.langchain.com/docs/get_started/introduction)，在 Google Cloud 或你的 Colab 環境中執行。Gemma 是由與建立 Gemini 模型所用的相同研究和技術建置的輕量級、最先進開放模式家族。LangChain 是建構和部署由語言模型支援的脈絡感知應用程式的架構。

**注意：** 本教學課程在 Google Colab 中的 A100 GPU 上執行。免費的 Colab 硬體加速「不足以」執行所有程式碼。


## 在 Google Cloud 中執行 Gemma

[`langchain-google-vertexai`](https://pypi.org/project/langchain-google-vertexai/) 套件提供 LangChain 與 Google Cloud 模型的整合。


### 安裝依賴


In [None]:
!pip install --upgrade -q langchain langchain-google-vertexai

### 驗證

除非你正在使用 Colab Enterprise，否則你需要進行驗證。


In [None]:
from google.colab import auth
auth.authenticate_user()

### 部署模型

Vertex AI 是一個用於訓練和部署人工智慧模型和應用程式的平台。模型花園是一個策展人模型集合，你可以在 Google Cloud 主控台中瀏覽。

若要部署 Gemma，可以在 Vertex AI 的模型花園中 [開啟模型](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335)，然後完成下列步驟：

1. 選擇 **部署** 。
2. 針對部署表單欄位進行任何想要的變更，如果你對預設值沒意見，也可以維持原狀。記下下列欄位，你稍後會需要用到：
   * **端點名稱** (例如 `google_gemma-7b-it-mg-one-click-deploy`) 
   * **區域** (例如 `us-west1`) 
3. 選擇 **部署** 將模型部署至 Vertex AI。部署作業需要幾分鐘才能完成。

當端點準備就緒時，複製其專案 ID、端點 ID 和位置，並輸入為參數。


In [None]:
# @title Basic parameters
project: str = ""  # @param {type:"string"}
endpoint_id: str = ""  # @param {type:"string"}
location: str = "" # @param {type:"string"}

### 執行模型


In [None]:
from langchain_google_vertexai import GemmaVertexAIModelGarden, GemmaChatVertexAIModelGarden

llm = GemmaVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

output = llm.invoke("What is the meaning of life?")
print(output)

Prompt:
What is the meaning of life?
Output:
Life is a complex and multifaceted phenomenon that has fascinated philosophers, scientists, and


你也可以使用 Gemma 進行多輪對話：


In [None]:
from langchain_core.messages import (
    HumanMessage
)

llm = GemmaChatVertexAIModelGarden(
    endpoint_id=endpoint_id,
    project=project,
    location=location,
)

message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)

message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])

print(answer2)

content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4'
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 4.\n\n2 + 2 = 4<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nSure, the answer is 6.\n\n3 + 3 = 6'


你可以進行後續處理以避免重複：


In [None]:
answer1 = llm.invoke([message1], parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], parse_response=True)

print(answer2)

content='Output:\nSure, here is the answer:\n\n2 + 2 = 4'
content='Output:\nSure, here is the answer:\n\n3 + 3 = 6<'


## 從 Kaggle 下載執行 Gemma


本部分說明如何從 Kaggle 下載 Gemma，再執行模型。

若要完成本部分，你首先需要在 [Gemma 設定](https://ai.google.dev/gemma/docs/setup) 上完成設定說明。

然後繼續進行下一個部分，你會在其中為你的 Colab 環境設定環境變數。

**注意：** 本教學課程的這個部分在 Google Colab 中的 A100 GPU 上執行。


### 設定環境變數

為 `KAGGLE_USERNAME` 和 `KAGGLE_KEY` 設定環境變數。


In [None]:
import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')

### 安裝依賴


In [None]:
# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

### 執行模型


In [None]:
from langchain_google_vertexai import GemmaLocalKaggle

你可以指定 Keras 後端 (預設為 `tensorflow`，但你可以改為 `jax` 或 `torch`)。


In [None]:
# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}

In [None]:
llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...


In [None]:
output = llm.invoke("What is the meaning of life?", max_tokens=30)
print(output)

What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has


### 執行聊天模型


如同上方的 Google Cloud 範例，你可以在 Gemme 的本機部署上使用多輪對話。你可能需要重新啟動筆記本並清除你的 GPU 記憶體，以避免 OOM 錯誤：


In [None]:
from langchain_google_vertexai import GemmaChatLocalKaggle

In [None]:
# @title Basic parameters
keras_backend: str = "jax"  # @param {type:"string"}
model_name: str = "gemma_2b_en" # @param {type:"string"}

In [None]:
llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)

Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'config.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'model.weights.h5' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'tokenizer.json' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...
Attaching 'assets/tokenizer/vocabulary.spm' from model 'keras/gemma/keras/gemma_2b_en/2' to your Colab notebook...


In [None]:
from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=30)
print(answer1)


content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."


In [None]:
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)

print(answer2)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"


如果你想要避免多輪回話，你可以後處理回應：


In [None]:
answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
print(answer2)

content="I'm a model.\n Tampoco\nI'm a model."
content='I can help you with your modeling.\n Tampoco\nI can'


## 從 Hugging Face 下載執行 Gemma


### 安裝

與 Kaggle 相同，Hugging Face 要求你在存取模型前接受 Gemma 條款與服務條件。如要透過 Hugging Face 存取 Gemma，請前往 [Gemma 模型卡](https://huggingface.co/google/gemma-2b)。

你還需要取得具有讀取權限的 [使用者存取權杖](https://huggingface.co/docs/hub/en/security-tokens)，你可以把它輸入在下方。

**注意：** 本教學課程的這個區段是在 Google Colab 的 A100 GPU 上執行。


In [None]:
# @title Basic parameters
hf_access_token: str = ""  # @param {type:"string"}
model_name: str = "google/gemma-2b" # @param {type:"string"}

### 執行模型


In [None]:
from langchain_google_vertexai import GemmaLocalHF, GemmaChatLocalHF

In [None]:
llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)

tokenizer_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/555 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [None]:
output = llm.invoke("What is the meaning of life?", max_tokens=50)
print(output)

What is the meaning of life?

The question is one of the most important questions in the world.

It’s the question that has been asked by philosophers, theologians, and scientists for centuries.

And it’s the question that


正如上述範例，你可使用 Gemma 的本地部署進行多輪對話。你可能需要重新啟動筆記本並清除 GPU 記憶體，以避免 OOM 錯誤：


### 執行聊天模型


In [None]:
llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
from langchain_core.messages import (
    HumanMessage
)

message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"


In [None]:
message2 = HumanMessage(content="What can you help me with?")
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)

print(answer2)

content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"


與先前的範例一樣，你可以後處理回應：


In [None]:
answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
print(answer1)

answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
print(answer2)

content="I'm a model.\n<end_of_turn>\n"
content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'


## 後續步驟

* 瞭解如何 [微調 Gemma 模型](https://ai.google.dev/gemma/docs/lora_tuning)。
* 瞭解如何在 Gemma 模型上執行 [分散式微調和推論](https://ai.google.dev/gemma/docs/distributed_tuning)。
* 瞭解如何 [將 Gemma 模型與 Vertex AI 搭配使用](https://cloud.google.com/vertex-ai/docs/generative-ai/open-models/use-gemma)。
