Performs RAG (Retrieval-Augmented Generation) using the organized course data generated by ChatGPT
<br> Explanation on RAG: https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-retrieval-augmented-generation

In [1]:
# import packages
from dotenv import load_dotenv
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_ibm import WatsonxLLM
from langchain.vectorstores import FAISS
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes, DecodingMethods
from ibm_watsonx_ai import Credentials

In [2]:
# Load the organized textbook data
with open(r"C:\Users\ediso\OneDrive\Desktop\IBM Call for Code\rita-cfc-2024\ai\course-prep\textbook-organized\康數五上_organized.txt", "r", encoding="utf-8") as file:
    extracted_text = file.read()    

In [6]:
# Create a RecursiveCharacterTextSplitter object to split the text into chunks

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Maximum number of characters in each chunk
    chunk_overlap=200,     # Number of characters that overlap between consecutive chunks
    length_function=len    # Function to measure the length of chunks
)

texts = text_splitter.split_text(extracted_text)

# Display the first few chunks to ensure proper splitting
# for i, chunk in enumerate(texts[:5]):
#     print(f"Chunk {i+1}:\n{chunk}\n")

In [7]:
# Convert Text Chunks into Embeddings (dense vector representation of the text that capture semantic information)

# Initialize the embedding model using Model on HuggingFace
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

# Initialize FAISS (Facebook AI Similarity Search) vector store, converting raw text chunks into embeddings
faiss_store = FAISS.from_texts(texts, embedding_model)

# Save FAISS vector store to disk
faiss_store.save_local('faiss-vector-store')

# Load FAISS store from disk
faiss_store = FAISS.load_local('faiss-vector-store', embedding_model, allow_dangerous_deserialization=True)

# Create a retriever chain
retriever = faiss_store.as_retriever()

model.safetensors:  74%|#######3  | 346M/471M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [10]:
# Load sensitive info
load_dotenv()
API_KEY = os.getenv('API_KEY')
URL = os.getenv('URL')
PROJECT_ID = os.getenv('PROJECT_ID')

In [34]:
# Initialize WatsonX LLM Interface

credentials = Credentials.from_dict({
    'url': URL,
    'apikey': API_KEY
})

params = {
    GenParams.MAX_NEW_TOKENS: 4095,
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.REPETITION_PENALTY: 1.2
}

# Initialize the LLM model
llm = WatsonxLLM(
    model_id=ModelTypes.LLAMA_3_70B_INSTRUCT.value,
    params=params,
    # credentials=credentials,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=PROJECT_ID
)

# Define the QA chain
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

In [35]:
# Define the query
query = "跟我說更多關於課程1-1的內容，全部使用繁體中文"

# Get the response using the query embedding
response = qa({"query": query})

print(response['result'])

 

根據單元 1: 多位小數與加減的課程架構，第 1 章「認識多位小數」旨在讓學生理解多位小數的位值和表示方法，以及掌握千分位、萬分位等概念。

在這章節中，老師可能會通過以下方式教學：

* 使用定位板或數位卡片認識小數位值，幫助學生理解小數的概念和表示方法。
* 實際操作：寫出和讀出多位小數，讓學生熟悉小數的書寫和閱讀格式。
* 例題：寫出 3.456 的各個位值並解釋其意義，或者將 0.007 表示為分數，讓學生練習小數的表示和計算。

此外，本章節還包括 Fun Fact 部分，旨在激發學生的興趣和好奇心。在這裡，老師可能會提及一些有趣的事實，例如，大部分現代計算器可以顯示並計算多位小數，這是因為它們內置了高精度的小數運算功能。這些 Fun Facts 可以幫助學生更好地理解小數的概念和應用。


In [36]:
# Define the query
query = "給我一些關於課程1-1的例題，全部使用繁體中文"

# Get the response using the query embedding
response = qa({"query": query})

print(response['result'])

 Here are some example questions related to Lesson 1-1:

* 寫出 4.278 的各個位值並解釋其意義。
* 將 0.009 表示為分數。
* 讀出以下小數：3.145、0.067、9.234
* 寫出小數 6.543 的十倍和百倍。

Let me know if you need more! 😊
