In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 用 Vertex AI 上的生成式模型問答

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/doggy8088/generative-ai/blob/main/language/prompts/examples/question_answering.zh.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> 於 Colab 中執行
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/doggy8088/generative-ai/blob/main/language/prompts/examples/question_answering.zh.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> 在 GitHub 上檢視
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/doggy8088/generative-ai/blob/main/language/prompts/examples/question_answering.zh.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> 於 Vertex AI Workbench 中開啟
    </a>
  </td>
</table>


| | |
|-|-|
|作者 | [Polong Lin](https://github.com/polong-lin) |


## 概觀

大型語言模型可用於各種自然語言處理任務，包括問答 (Q&A)。這些模型是建構在大量的文本資料上進行訓練，並且可以對各種問題產生高品質的回應。此處需要注意的一件事是，大多數模型都設定了知識截止日期，因此提出任何太新的問題可能會導致不完整、想像的或是不正確的答案 (即幻覺)。

這個筆記本涵蓋了使用生成式模型回答問題的提示基本要素。此外，它展示了「開放域」(網際網路上公開的知識) 和「封閉域」(更私密的知識——通常為企業或個人知識)。

在 [官方文件](https://cloud.google.com/vertex-ai/docs/generative-ai/text/text-overview#prompt_structure) 中進一步瞭解提示設計。


### 目標

在筆記本的結尾，你應該能夠針對下列內容撰寫提示：

* **開放領域** 問題：
    * 零次學習提示
    * 少次學習提示


* **封閉領域** 問題：
    * 提供客製知識作為背景
    * 指令微調輸出
    * 少次學習提示


## 開始使用


### 安裝 Vertex AI SDK


In [None]:
!pip install google-cloud-aiplatform --upgrade --user

**僅 Colab：** 取消下一個Cell註解以重新啟動Kernel或使用按鈕重新啟動Kernel。對於 Vertex AI Workbench，你可以使用頂端的按鈕重新啟動終端機。


In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

### 驗證筆記本環境
* 如果你使用 **Colab** 執行此筆記本，取消註解下方的Cell並繼續。
* 如果你使用 **Vertex AI 工作台** ，請查看[此處](https://github.com/doggy8088/generative-ai/tree/main/setup-env)的設定說明。


In [None]:
# from google.colab import auth
# auth.authenticate_user()

### 匯入函式庫


**僅限 Colab：** 取消下方單元格的註解，以初始化 Vertex AI SDK。對於 Vertex AI Workbench，不需要執行此動作。


In [None]:
# import vertexai

# PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
# vertexai.init(project=PROJECT_ID, location="us-central1")

In [None]:
import pandas as pd
from vertexai.language_models import TextGenerationModel

### 載入模型


In [None]:
generation_model = TextGenerationModel.from_pretrained("text-bison@001")

## 問題解答


問答功能需要提供提示或問題，模型可以藉此產生回應。提示可以是幾個字或幾個完整的句子，視問題的複雜程度而定。

在建立問答提示時，重要的是要具體且提供盡可能多的背景。這有助於模型了解問題背後的意圖並產生相關的回應。例如，如果你想問：

```
「法國的首都是什麼？」

那麼一個很好的提示可能是：

「請告訴我哪個城市是法國的首都。」

```

除了具體之外，提示也應該是語法正確且沒有拼寫錯誤。這有助於模型產生容易理解、錯誤或不準確性較少的回應。

透過提供具體且豐富背景的提示，你可以協助模型了解問題背後的意圖，並產生準確且相關的回應。


以下是問答提示問題的**開放領域** 和**封閉領域** 類別之間的一些差異。

＊**開放領域** ：所有答案已在線上獲得的提問。它們可以屬於任何類別，例如歷史、地理、國家、政治、化學等。這些包括瑣事或常識問題，例如：

```
問：誰在奧運遊泳項目奪得金牌？
問：[特定國家]的總統是誰？
問：誰寫了 [具體書名]？
```

請記住生成式模型的訓練截點，因為涉及模型訓練後比最新資訊的問題可能會給出不正確或天馬行空的答案。

＊**封閉領域** ：如果你有一些不在網際網路供應的內部知識基礎，那麼它們屬於_封閉領域_類別。
你可以將「私人」知識作為脈絡傳遞給模型。如果提示正確，模型更有可能在提供的脈絡中回答，而不太可能超越網路提供範圍以外的答案。

考慮在你的產品內部文件上建立問答機器人的範例。在這種情況下，你可以將完整的文件傳遞給模型，並僅提示它根據文件回答問題。

**封閉領域** 的典型提示：

```
提示：f""" 從以下脈絡中回答： \n\n
     脈絡：{{你的知識庫}} \n
     問題：{{該知識庫的具體問題}} \n
     回答：{{由模型預測}} \n
"""
```

以下是理解這些不同類型的提示的一些範例。


### 開放領域


#### 零次提示


In [None]:
prompt = """Q: Who was President of the United States in 1955? Which party did he belong to?\n
            A:
         """
print(
    generation_model.predict(
        prompt,
        max_output_tokens=256,
        temperature=0.1,
    ).text
)

In [None]:
prompt = """Q: What is the tallest mountain in the world?\n
            A:
         """
print(
    generation_model.predict(
        prompt,
        max_output_tokens=20,
        temperature=0.1,
    ).text
)

#### 少樣本提示


假設你想從模型中獲得簡短答案 (例如，只有一個特定名稱)。為了解決此問題，你可以利用少數提示來提供範例給模型，以說明預期的行為。


In [None]:
prompt = """Q: Who is the current President of France?\n
            A: Emmanuel Macron \n\n

            Q: Who invented the telephone? \n
            A: Alexander Graham Bell \n\n

            Q: Who wrote the novel "1984"?
            A: George Orwell

            Q: Who discovered penicillin?
            A:
         """
print(
    generation_model.predict(
        prompt,
        max_output_tokens=20,
        temperature=0.1,
    ).text
)

#### 零次提示 vs 少次提示


零次提示對於快速為新任務產生文字是有用的，但產生文字的品質可能會比具有精挑細選範例的少量提示還低。少量提示通常更適合需要高度特定性或領域特定知識的任務，但需要一些額外的思考，並可能需要資料來設定提示。


### 封閉領域


#### 將內部知識當作提示文中的內容


想像一種情境：你想建立一個問答機器人，可以擷取內部文件，並讓使用者可以對其提出問題。

在以下範例中，Google Cloud Storage 和內容政策文件會新增到提示中，讓 PaLM API 可以使用其，在提供的脈絡下回答後續問題。


In [None]:
context = """
Storage and content policy \n
How durable is my data in Cloud Storage? \n
Cloud Storage is designed for 99.999999999% (11 9's) annual durability, which is appropriate for even primary storage and
business-critical applications. This high durability level is achieved through erasure coding that stores data pieces redundantly
across multiple devices located in multiple availability zones.
Objects written to Cloud Storage must be redundantly stored in at least two different availability zones before the
write is acknowledged as successful. Checksums are stored and regularly revalidated to proactively verify that the data
integrity of all data at rest as well as to detect corruption of data in transit. If required, corrections are automatically
made using redundant data. Customers can optionally enable object versioning to add protection against accidental deletion.
"""

question = "How is high availability achieved?"

prompt = f"""Answer the question given in the contex below:
Context: {context}?\n
Question: {question} \n
Answer:
"""

print("[Prompt]")
print(prompt)

print("[Response]")
print(
    generation_model.predict(
        prompt,
    ).text
)

#### Instruction-tuning 的輸出


另一種協助語言模型的方式，是在提示中提供額外的指示來建構輸出。為了確保模型不會回應脈絡之外的任何內容，該提示可以指定，如果情況如此，回應應為「所提供脈絡中沒有的資訊。」


In [None]:
question = "What machined are required for hosting Vertex AI models?"
prompt = f"""Answer the question given the context below as {{Context:}}. \n
If the answer is not available in the {{Context:}} and you are not confident about the output,
please say "Information not available in provided context". \n\n
Context: {context}?\n
Question: {question} \n
Answer:
"""

print("[Prompt]")
print(prompt)

print("[Response]")
print(
    generation_model.predict(
        prompt,
        max_output_tokens=256,
        temperature=0.3,
    ).text
)

#### 少樣本提示


In [None]:
prompt = """
Context:
The term "artificial intelligence" was first coined by John McCarthy in 1956. Since then, AI has developed into a vast
field with numerous applications, ranging from self-driving cars to virtual assistants like Siri and Alexa.

Question:
What is artificial intelligence?

Answer:
Artificial intelligence refers to the simulation of human intelligence in machines that are programmed to think and learn like humans.

---

Context:
The Wright brothers, Orville and Wilbur, were two American aviation pioneers who are credited with inventing and
building the world's first successful airplane and making the first controlled, powered and sustained heavier-than-air human flight,
 on December 17, 1903.

Question:
Who were the Wright brothers?

Answer:
The Wright brothers were American aviation pioneers who invented and built the world's first successful airplane
and made the first controlled, powered and sustained heavier-than-air human flight, on December 17, 1903.

---

Context:
The Mona Lisa is a 16th-century portrait painted by Leonardo da Vinci during the Italian Renaissance. It is one of
the most famous paintings in the world, known for the enigmatic smile of the woman depicted in the painting.

Question:
Who painted the Mona Lisa?

Answer:

"""
print(
    generation_model.predict(
        prompt,
    ).text
)

### 萃取式問答

在以下範例中，生成模型會被引導去理解問題和文章的用意，並在文章中找出相關資訊來回答問題。模型會收到一個問題和一段文字，並被要求在文字中找出問題的答案。答案通常是一短語或句子。


In [None]:
prompt = """
Background: There is evidence that there have been significant changes in Amazon rainforest vegetation over the last 21,000 years through the Last Glacial Maximum (LGM) and subsequent deglaciation.
Analyses of sediment deposits from Amazon basin paleo lakes and from the Amazon Fan indicate that rainfall in the basin during the LGM was lower than for the present, and this was almost certainly
associated with reduced moist tropical vegetation cover in the basin. There is debate, however, over how extensive this reduction was. Some scientists argue that the rainforest was reduced to small,
isolated refugia separated by open forest and grassland; other scientists argue that the rainforest remained largely intact but extended less far to the north, south, and east than is seen today.
This debate has proved difficult to resolve because the practical limitations of working in the rainforest mean that data sampling is biased away from the center of the Amazon basin, and both
explanations are reasonably well supported by the available data.

Q: What does LGM stands for?
A: Last Glacial Maximum.

Q: What did the analysis from the sediment deposits indicate?
A: Rainfall in the basin during the LGM was lower than for the present.

Q: What are some of scientists arguments?
A: The rainforest was reduced to small, isolated refugia separated by open forest and grassland.

Q: There have been major changes in Amazon rainforest vegetation over the last how many years?
A: 21,000.

Q: What caused changes in the Amazon rainforest vegetation?
A: The Last Glacial Maximum (LGM) and subsequent deglaciation

Q: What has been analyzed to compare Amazon rainfall in the past and present?
A: Sediment deposits.

Q: What has the lower rainfall in the Amazon during the LGM been attributed to?
A:
"""

print(
    generation_model.predict(
        prompt,
    ).text
)

### 評估


如果每個問題的實際答案可用，你可以評估問答任務的輸出結果。在零次提示中，你只能使用「開放域」問題。不過，對於「封閉域」問題，你可以加入脈絡並以類似方式評估。要展示其運作方式，首先使用問題和實際答案建立一個簡單的資料框。


In [None]:
qa_data = {
    "question": [
        "In a website browser address bar, what does “www” stand for?",
        "Who was the first woman to win a Nobel Prize",
        "What is the name of the Earth’s largest ocean?",
    ],
    "answer_groundtruth": ["World Wide Web", "Marie Curie", "The Pacific Ocean"],
}
qa_data_df = pd.DataFrame(qa_data)
qa_data_df

現在，既然你擁有附帶問題與正解答案的資料，可以使用 `apply` 函式對每個檢閱列呼叫 PaLM 2 生成模型。每列會使用動態提示使用 PaLM API 預測答案。我們會將結果儲存在 `answer_prediction` 欄位。


In [None]:
def get_answer(row):
    prompt = f"""Answer the following question as precise as possible.\n\n
            question: {row}
            answer:
              """
    return generation_model.predict(
        prompt=prompt,
    ).text


qa_data_df["answer_prediction"] = qa_data_df["question"].apply(get_answer)
qa_data_df

你可能想評估 PaLM API 預測的答案。然而，它會比文字分類複雜，因為答案可能與真實情況不同，並且可能以略多或略少的字詞呈現。

例如，你可以觀察問題「地球上最大的海洋是什麼？」並看到模型在真實情況標籤為「太平洋」時預測為「Pacific Ocean」，多了一個「The」。現在，如果你使用簡單的分類指標，你會認為這是一個錯誤的預測，因為原始字串和預測字串有所不同。但是，你可以看到答案是正確的，因為多了一個「The」造成了問題。這是簡單的字串比較問題。

關於字串比較，其中 `ground_thruth` 和 `predicted` 可能有一些額外的或較少的字母，一種方法是使用模糊匹配演算法。

模糊字串匹配使用 [Levenshtein 距離](https://zh.wikipedia.org/wiki/Levenshtein%E8%B7%9D%E7%A6%BB) 來計算兩者之間的差異字串。

例如，"kitten" 和 "sitting" 之間的 Levenshtein 距離為 3，因為以下 3 個編輯將一個字詞轉換另一個字詞，並且無法使用小於 3 個編輯來進行轉換：

* kitten → sitten (將 "k" 替換為 "s")，
* sitten → sittin (將 "e" 替換為 "i")，
* sittin → sitting (在結尾插入 "g")。


以下是另一個範例，但這次使用 `fuzzywuzzy` 函式庫，它在兩個字串之間給予相同的 `Levenshtein 距離`，但以比率方式。比率原始分數將字串的相似性視為範圍 [0, 100] 內的整數。對於兩個字串 X 和 Y，分數定義為 int(round((2.0 * M / T) * 100))，其中 T 是兩個字串中字元的總數，而 M 是兩個字串中匹配的字元數。

在此處進一步了解有關 [比率公式](https://anhaidgroup.github.io/py_stringmatching/v0.3.x/Ratio.html) 的資訊：

你可以參閱一個範例以進一步瞭解這一點。
```
字串 1：「這是一個測試」
字串 2：「這是一個測試！」

模糊比率 => 97  #

模糊部分比率 => 100  # 由於大多數字元相同且序列相似，因此演算法將部分比率計算為 100，並忽略簡單的增減 (新字元)。
```


首先，安裝 package `fuzzywuzzy` 和 `python-Levenshtein`：


In [None]:
!pip install -q python-Levenshtein --upgrade --user
!pip install -q fuzzywuzzy --upgrade --user

然後計算一個分數來執行模糊匹配：


In [None]:
from fuzzywuzzy import fuzz


def get_fuzzy_match(df):
    return fuzz.partial_ratio(df["answer_groundtruth"], df["answer_prediction"])


qa_data_df["match_score"] = qa_data_df.apply(get_fuzzy_match, axis=1)
qa_data_df

現在你已取得個別匹配評分 (部分)，便可計算整欄的平均值，以了解整體數據概況。
接近 100 的分數表示 PaLM 2 可更接近實際預測；如果分數接近 50 或 0，表示未有良好表現。


In [None]:
print(
    "the average match score of all predicted answer from PaLM 2 is : ",
    qa_data_df["match_score"].mean(),
    " %",
)

在這種情況下，即使有些預測遺失了一些字詞，你的平均分數也會得到 100%。這表示你非常接近真實結果，有些答案只是少了真實結果的明確冗長說明。
