這份 Notebook 示範 OpenAI Assistants API

https://platform.openai.com/docs/assistants/overview

In [None]:
# from google.colab import userdata
# openai_api_key = userdata.get('openai_api_key')

In [11]:
# Import necessary libraries
## 設定 OpenAI API Key 變數
from dotenv import load_dotenv
import os

# Load the environment variables from .env file
load_dotenv()

# Access the API key
openai_api_key = os.getenv('OPENAI_API_KEY')


In [2]:
import requests
import json
from pprint import pp

## 0. 上傳 RAG 用的參考檔案 (也可在後台 Playground 完成)

In [69]:
# 找個範例檔案 https://report.nat.gov.tw/ReportFront/ReportDetail/detail?sysId=C11201557

!wget -O 'C11201717_1.pdf' https://dlcenter.gotop.com.tw/PDFSample/A792.pdf


--2025-07-25 19:50:24--  https://dlcenter.gotop.com.tw/PDFSample/A792.pdf
Resolving dlcenter.gotop.com.tw (dlcenter.gotop.com.tw)... 125.227.59.43
Connecting to dlcenter.gotop.com.tw (dlcenter.gotop.com.tw)|125.227.59.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2258954 (2.2M) [application/pdf]
Saving to: ‘C11201717_1.pdf’


2025-07-25 19:50:24 (8.30 MB/s) - ‘C11201717_1.pdf’ saved [2258954/2258954]



## 1. 建立 向量資料庫 上傳檔案

In [None]:
from openai import OpenAI

# 初始化 OpenAI client
client = OpenAI(api_key=openai_api_key)

# 步驟 1：建立 vector store 並上傳檔案
vector_store = client.vector_stores.create(name="My Knowledge Store")

file_paths = ["1130219.pdf", "1130513.pdf"]  # 這裡放你的檔案路徑
file_streams = [open(path, "rb") for path in file_paths]

# 上傳檔案到 vector store，並等待處理完成
file_batch = client.vector_stores.file_batches.upload_and_poll(
    vector_store_id=vector_store.id,
    files=file_streams
)
print("File batch status:", file_batch.status)
print("File count:", file_batch.file_counts)

## 步驟 2：建立 Assistant 並綁定 vector store

In [None]:


# 步驟 2：建立 Assistant 並綁定 vector store
assistant = client.beta.assistants.create(
    name="My FileSearch Assistant",
    instructions="You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
    model="gpt-4o",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)
print("Assistant created. ID:", assistant.id)

## 步驟 3：建立對話 thread 並送出問題

In [None]:

# 步驟 3：建立對話 thread 並送出問題
thread = client.beta.threads.create(
    messages=[{
        "role": "user",
        "content": "資料庫有哪些檔案？"
    }]
)

In [None]:



# 步驟 4：執行 Assistant 回答
# --- 第一次提問 ---
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# 取得並顯示回覆
messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
    message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
    if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

print(message_content.value)
print("\n".join(citations))



File batch status: completed
File count: FileCounts(cancelled=0, completed=2, failed=0, in_progress=0, total=2)
Assistant created. ID: asst_N3PMjKiPC3XarXmUMDg7QULn
已上傳的檔案有兩個：

1. 1130513.pdf
2. 1130219.pdf

這些檔案的內容涉及基金投資風險、經濟走勢預測、非投資等級債券的相關資訊等[0] [1]。
[0] 1130513.pdf
[1] 1130219.pdf


In [111]:
# --- 第二次提問（接續對話）---
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="請摘要知識庫中第一份檔案的內容"
)
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# 取得並顯示回覆
messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
    message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
    if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

print(message_content.value)
print("\n".join(citations))

第一份檔案「1130513.pdf」的摘要內容如下：

1. **風險聲明**：文件中的資訊供內部及客戶參考，並不是任何金融商品的購入或銷售建議，而不應被視為投資建議。基金相關風險請參閱公開說明書，基金的過去績效不保證未來收益[0]。

2. **投資策略**：報告包含多重資產配置的策略，如股票、債券和REITs，並強調這種配置能在市場波動時提供穩定性和潛在收益[1]。

3. **全球市場分析**：分析全球主要市場的經濟狀況，包括美國、日本、歐元區的經濟走勢及預測，例如PMI指數和利率變動預測[2][3]。

4. **醫療市場展望**：探討醫療市場的趨勢，指出創新技術和人口老齡化將推動醫療需求增加，並提到未來的創新醫療會更快和更準確[4]。

5. **美國經濟觀察**：關注美國的就業和消費放緩，降息預期推動美股上升，聯準會的多位官員發言強調通膨持續影響利率政策[5][6]。

總體而言，這份報告強調金融市場的動態及未來預測，並指出不同行業的投資機會和風險，如創新醫療和多重資產配置的優勢[0][1]。
[0] 1130513.pdf
[1] 1130513.pdf
[2] 1130513.pdf
[3] 1130513.pdf
[4] 1130513.pdf
[5] 1130513.pdf
[6] 1130513.pdf
[7] 1130513.pdf
[8] 1130513.pdf


# 1. 上傳檔案

In [112]:
import requests

api_key = openai_api_key
file_path = "1130219.pdf"

headers = {
    "Authorization": f"Bearer {api_key}",
    "OpenAI-Beta": "assistants=v2"
}
files = {
    "file": open(file_path, "rb")
}
data = {
    "purpose": "assistants"
}
response = requests.post(
    "https://api.openai.com/v1/files",
    headers=headers,
    files=files,
    data=data
)
file_id = response.json()["id"]
print("file_id:", file_id)


file_id: file-PptTDEb1GHuqcvp8PjaBcf


# 2. 建立 vector store 並加檔案

In [113]:
import json

payload = {
    "name": "My Knowledge Store",
    "file_ids": [file_id]
}
response = requests.post(
    "https://api.openai.com/v1/vector_stores",
    headers={**headers, "Content-Type": "application/json"},
    data=json.dumps(payload)
)
vector_store_id = response.json()["id"]
print("vector_store_id:", vector_store_id)


vector_store_id: vs_688378248ea08191b6ffab51148a07f7


# 3. 建立 Assistant 並掛上 vector store

In [114]:
payload = {
    "name": "My FileSearch Assistant",
    "instructions": "You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
    "model": "gpt-4o",
    "tools": [{"type": "file_search"}],
    "tool_resources": {
        "file_search": {
            "vector_store_ids": [vector_store_id]
        }
    }
}
response = requests.post(
    "https://api.openai.com/v1/assistants",
    headers={**headers, "Content-Type": "application/json"},
    data=json.dumps(payload)
)
assistant_id = response.json()["id"]
print("assistant_id:", assistant_id)


assistant_id: asst_8ip3gSIBrKsFAVyeW9YMsuW4


# 4. 建立新 thread

In [115]:
payload = {
    "messages": [
        {
            "role": "user",
            "content": "請問你的知識庫有哪些檔案？"
        }
    ]
}
response = requests.post(
    "https://api.openai.com/v1/threads",
    headers={**headers, "Content-Type": "application/json"},
    data=json.dumps(payload)
)
thread_id = response.json()["id"]
print("thread_id:", thread_id)


thread_id: thread_pUpbqy0KucpGW7IUDSBMSd7q


# 5. 建立 run（觸發 assistant 回答）

In [116]:
payload = {
    "assistant_id": assistant_id
}
response = requests.post(
    f"https://api.openai.com/v1/threads/{thread_id}/runs",
    headers={**headers, "Content-Type": "application/json"},
    data=json.dumps(payload)
)
run_id = response.json()["id"]
print("run_id:", run_id)


run_id: run_sg2NFDu1dGWMCVcSrkH3vn3u


# 6. 輪詢 run 狀態直到完成

In [117]:
import time

while True:
    response = requests.get(
        f"https://api.openai.com/v1/threads/{thread_id}/runs/{run_id}",
        headers=headers
    )
    run_status = response.json()["status"]
    print("run status:", run_status)
    if run_status in ["completed", "failed", "cancelled", "expired"]:
        break
    time.sleep(2)


run status: in_progress
run status: completed


# 7. 取得 assistant 回覆

In [118]:
response = requests.get(
    f"https://api.openai.com/v1/threads/{thread_id}/messages",
    headers=headers
)
messages = response.json()["data"]
for msg in messages:
    if msg["role"] == "assistant":
        print("Assistant:", msg["content"][0]["text"]["value"])


Assistant: 目前的知識庫裡包含了一個名為 "1130219.pdf" 的檔案。此檔案主要涵蓋了投資研究週報的內容，包括市場回顧、聚焦議題、資產觀點，以及涉及到的金融市場數據和分析【4:4†1130219.pdf】。如果您有其他問題或需要了解更多細節，隨時告訴我！


# 8. 多輪對話（接續提問）

In [121]:
payload = {
    "role": "user",
    "content": "請摘要知識庫中第一份檔案的內容"
}
response = requests.post(
    f"https://api.openai.com/v1/threads/{thread_id}/messages",
    headers={**headers, "Content-Type": "application/json"},
    data=json.dumps(payload)
)
# 之後重複步驟5~7
# 建立run
payload = {
    "assistant_id": assistant_id
}
response = requests.post(
    f"https://api.openai.com/v1/threads/{thread_id}/runs",
    headers={**headers, "Content-Type": "application/json"},
    data=json.dumps(payload)
)
run_id = response.json()["id"]
print("run_id:", run_id)

# 輪詢
while True:
    response = requests.get(
        f"https://api.openai.com/v1/threads/{thread_id}/runs/{run_id}",
        headers=headers
    )
    run_status = response.json()["status"]
    print("run status:", run_status)
    if run_status in ["completed", "failed", "cancelled", "expired"]:
        break
    time.sleep(2)

# 取得 assistant 回覆
response = requests.get(
    f"https://api.openai.com/v1/threads/{thread_id}/messages",
    headers=headers
)
messages = response.json()["data"]
for msg in messages:
    if msg["role"] == "assistant":
        print("Assistant:", msg["content"][0]["text"]["value"])


run_id: run_rowUXkHNFKXOuueXNVIxGWem
run status: queued
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: in_progress
run status: completed
Assistant: 這份檔案主要是兆豐銀行的投資研究週報。內容包含：

1. **市場回顧與焦點**：
   - 2024年1月份，投資級債獲得大量資金流入，總額達到約300億美金，是過去五年同期平均值的2.3倍。這顯示出投資者的信心，尤其在聯準會可能降息的背景下【8:2†1130219.pdf】。
   - 美國非投資級債的違約率下降至2023年7月以來的新低，並預期未來會隨著企業資金壓力緩解而進一步下降【8:6†1130219.pdf】。

2. **資產觀點**：
   - 美國聯準會預計的降息可能利好市場，尤其是優質債券【8:10†1130219.pdf】。
   - ESG（環境、社會與治理）不再僅是一種口號，能夠實踐ESG的股票具備長期的趨勢優勢【8:4†1130219.pdf】。

3. **全球市場**：
   - 在美國，儘管聯準會利率預期調整，美股短期內震盪，但基本面強勁且具備AI題材支撐，長期展望樂觀【8:15†1130219.pdf】。
   - 台股方面，台積電的表現大幅推動股市，在全球AI供應鏈的加持下，台灣的半導體公司也預期有良好的市場前景【8:14†1130219.pdf】。

該週報強調了各種市場因素和資產類別所面臨的挑戰與機會，從而為投資者的資產配置和策略提供了參考意見。需要注意的是，這些資料僅供參考，並不構成任何投資建議【8:0†1130219.pdf】。
Assistant: 目前的知識庫裡包含了一個名為 