<a href="https://colab.research.google.com/github/NOOTNOOTPINGUUU/NOOTNOOTPINGUUU.github.io/blob/main/2025AIcourse/%E7%AC%AC%E5%85%AB%E9%80%B1%E7%94%9F%E6%88%90%E5%8A%9F%E8%AA%B2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. 安裝並引入必要套件

In [None]:
!pip install -U langchain langchain-community sentence-transformers faiss-cpu gradio openai

Collecting langchain-community
  Downloading langchain_community-0.3.21-py3-none-any.whl.metadata (2.4 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting gradio
  Downloading gradio-5.25.2-py3-none-any.whl.metadata (16 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import gdown

from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

In [None]:
from openai import OpenAI
import gradio as gr

In [None]:
!gdown --folder 'https://drive.google.com/drive/folders/1HC_XVFDa6WVb9en6fpx0z3NrSQHuOa_X?usp=share_link'

Retrieving folder contents
Processing file 170RcPHN1flEbcyBEWRd4IxL9V1vSpvek index.faiss
Processing file 1GBfH01BCUZ3z4Wj9Y6lgc9HauCwehkkT index.pkl
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=170RcPHN1flEbcyBEWRd4IxL9V1vSpvek
To: /content/faiss_db/index.faiss
100% 40.0k/40.0k [00:00<00:00, 70.5MB/s]
Downloading...
From: https://drive.google.com/uc?id=1GBfH01BCUZ3z4Wj9Y6lgc9HauCwehkkT
To: /content/faiss_db/index.pkl
100% 32.1k/32.1k [00:00<00:00, 76.7MB/s]
Download completed


### 2. 自訂 E5 embedding 類別

In [None]:
class CustomE5Embedding(HuggingFaceEmbeddings):
    def embed_documents(self, texts):
        texts = [f"passage: {t}" for t in texts]
        return super().embed_documents(texts)

    def embed_query(self, text):
        return super().embed_query(f"query: {text}")

### 3. 載入 `faiss_db`

In [None]:
embedding_model = CustomE5Embedding(model_name="intfloat/multilingual-e5-small")
db = FAISS.load_local("faiss_db", embedding_model, allow_dangerous_deserialization=True)
retriever = db.as_retriever()

### 4. 設定好我們要的 LLM

In [None]:
import os
from google.colab import userdata

In [None]:
api_key = userdata.get('Groq')

In [None]:
os.environ["OPENAI_API_KEY"] = api_key

In [None]:
model = "llama3-70b-8192"
base_url="https://api.groq.com/openai/v1"

In [None]:
client = OpenAI(
    base_url=base_url # 使用 OpenAI 本身不需要這段
)

### 5. `prompt` 設計

In [None]:
system_prompt = "你是一個專業的「台灣節日小助手」，熟悉台灣的漢人節慶與原住民族歲時祭儀。所有回答必須使用繁體中文，語氣親切、簡潔，結合節日的習俗、食物、活動建議，展現台灣文化魅力。若問題超出知識範圍，誠實表示並提供相關建議。請勿使用英文或其他語言回答。"

prompt_template = """
根據下列資料回答問題：
{retrieved_chunks}

使用者的問題是：{question}

請以繁體中文回答，確保語氣親切、內容簡潔，並結合節日習俗、食物或活動建議。若資料不足以回答問題，請誠實表示並建議相關資源或行動，例如查詢文化部網站。請勿使用英文或其他語言。
"""

### 6. 使用 RAG 來回應

搜尋與使用者問題相關的資訊，根據我們的 prompt 樣版去讓 LLM 回應。

In [None]:
chat_history = []

def chat_with_rag(user_input):
    global chat_history
    # 取回相關資料
    docs = retriever.get_relevant_documents(user_input)
    print(docs)
    retrieved_chunks = "\n\n".join([doc.page_content for doc in docs])

    # 將自定 prompt 套入格式
    final_prompt = prompt_template.format(retrieved_chunks=retrieved_chunks, question=user_input)

    # 呼叫 OpenAI API
    response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": final_prompt},
    ],
    temperature=0.7,  # 控制生成隨機性
    )
    answer = response.choices[0].message.content

    chat_history.append((user_input, answer))
    return answer

### 7. 用 Gradio 打造 Web App

In [None]:
with gr.Blocks() as demo:
    gr.Markdown("##台灣節日小助手")
    chatbot = gr.Chatbot()
    msg = gr.Textbox(placeholder="問我關於台灣節日的問題吧！例如：給我10個台灣節日的簡介")

    def respond(message, chat_history_local):
        response = chat_with_rag(message)
        chat_history_local.append((message, response))
        return "", chat_history_local

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

demo.launch(debug=True)

  chatbot = gr.Chatbot()


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://a26be95e9181160c4d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


  docs = retriever.get_relevant_documents(user_input)


[Document(id='15847955-205d-47ee-9c10-d6b05c1194e4', metadata={'source': 'uploaded_docs/節日.txt'}, page_content='童謠中的春節就是拜年、玩樂，過一個歡樂新年，體現早期農業社會裡從容、樸素的生活情調。當前的過年習俗隨著景氣好轉，是一個創新〈過年謠〉的好年代。\n\n天公生\n\n新年謠有「初九天公生」，為玉皇上帝的生日，古稱「祝天誕」而俗稱「天公生」。天公為古書中的「天帝」，本為帝王所專祀的祭天之禮；後來道教神統譜以三清為宇宙創始運化之象，統領萬神的昊天上帝，就被尊稱為「玉皇上帝」、「玄穹高上帝」。在道壇上有玉皇位，玉皇上帝作帝王像，戴九旒冕冠，著黃色冕服，法像莊嚴而雙手執圭，與萬星之主的紫微大帝，同為至尊的天神。\n\n民間奉祀則有天公壇，都在宮廟的至高處；而廟門外向天處，則有天公爐，向外拜天公後就插於爐內。民家也拜天公，都是桌案朝外或向天，其中閩南族群多在正廳前樑懸掛「天公爐」，客家族群則多奉祀於內埕龍邊牆上。\n\n天公生時值正月初九，時辰則是子時。九為陽數之極，而子時則是十二時辰之始。一陽復始的時刻，在廟埕或家中中庭的向天處，鄭重的擺香案，由家長率同一家大小在案前上香祭拜，祈求天公庇佑新一年家運亨通。這種「春祈」的信仰習俗由來已久，各地皆然，是到天公廟祈願的佳日。\n\n元宵節'), Document(id='1611442a-c5c1-4d72-b74d-e250a61228d2', metadata={'source': 'uploaded_docs/節日.txt'}, page_content='臺灣的地理位置與氣候\n\n南臺灣恆春中元普渡的「搶孤」活動。\n南臺灣恆春中元普渡的「搶孤」活動。（圖／文化部提供）\n\n臺灣位在東亞的大海上，處於亞熱帶氣候中，終年溫熱，除冬季高山地區外，不見霜雪。海島上林木蒼鬱，河流蜿蜒，16世紀的大航海時代，葡萄牙人航海經過，就稱為「福爾摩沙」（Ilha Formosa），意指一個美麗的島嶼。早期先有不同的原住民族散居島上，各有其生活習俗；而位處於中國東南沿海的福建、廣東，山多田少，謀生不易，居民開始渡海而來移居島上。17世紀歐洲人在東方展開海上大貿易，西班牙人、荷蘭人都想擁有島上獨占其資源，經歷一段時間的侵奪

