Performs RAG (Retrieval-Augmented Generation) using the organized course data generated by ChatGPT
<br> Explanation on RAG: https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-retrieval-augmented-generation

In [2]:
# import packages
from dotenv import load_dotenv
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_ibm import WatsonxLLM
from langchain.vectorstores import FAISS
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes, DecodingMethods
from ibm_watsonx_ai import Credentials

In [3]:
# Load the organized textbook data
textbook_extracted_path = r"C:\Users\ediso\OneDrive\Desktop\IBM Call for Code\rita-cfc-2024\ai\course-prep\textbook-extracted\nan_math_5th_2nd_extracted.txt"

with open(textbook_extracted_path, "r", encoding="utf-8") as file:
    extracted_text = file.read()    

In [4]:
# Create a RecursiveCharacterTextSplitter object to split the text into chunks

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Maximum number of characters in each chunk
    chunk_overlap=200,     # Number of characters that overlap between consecutive chunks
    length_function=len    # Function to measure the length of chunks
)

texts = text_splitter.split_text(extracted_text)

# Display the first few chunks to ensure proper splitting
# for i, chunk in enumerate(texts[:5]):
#     print(f"Chunk {i+1}:\n{chunk}\n")

In [5]:
# Convert Text Chunks into Embeddings (dense vector representation of the text that capture semantic information)

# Initialize the embedding model using Model on HuggingFace
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

# Initialize FAISS (Facebook AI Similarity Search) vector store, converting raw text chunks into embeddings
faiss_store = FAISS.from_texts(texts, embedding_model)

# save_path = r'C:\Users\ediso\OneDrive\Desktop\IBM Call for Code\rita-cfc-2024\ai\course-prep\RAG\vector-stores'

# TODO relative path
# Define the save path and the name for the vector store
save_path = r'C:\Users\ediso\OneDrive\Desktop\IBM Call for Code\rita-cfc-2024\ai\course-prep\RAG\vector-stores'
vector_store_name = 'nan_math_5th_2nd_vector_store'

full_save_path = os.path.join(save_path, vector_store_name)
os.makedirs(full_save_path, exist_ok=True)

# Save FAISS vector store to disk with a name
faiss_store.save_local(full_save_path)

# Load FAISS store from disk
faiss_store = FAISS.load_local(full_save_path, embedding_model, allow_dangerous_deserialization=True)

# Create a retriever chain
retriever = faiss_store.as_retriever()

In [6]:
# Load sensitive info
load_dotenv()
API_KEY = os.getenv('API_KEY')
URL = os.getenv('URL')
PROJECT_ID = os.getenv('PROJECT_ID')

In [7]:
# Initialize WatsonX LLM Interface

credentials = Credentials.from_dict({
    'url': URL,
    'apikey': API_KEY
})

params = {
    GenParams.MAX_NEW_TOKENS: 4095,
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.REPETITION_PENALTY: 1.2
}

# Initialize the LLM model
llm = WatsonxLLM(
    model_id=ModelTypes.LLAMA_3_70B_INSTRUCT.value,
    params=params,
    # credentials=credentials,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=PROJECT_ID
)

# Define the QA chain
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

In [8]:
# 
# Define the query
query = "跟我說更多關於課程1-1的內容，全部使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

print(response['result'])

 

根據提供的資料，我們可以看到課程1-1的內容如下：

**單元：** 第1單元 分數

**學習重點：**

* 認識分數的概念
* 能夠寫出分數的不同形式（如：1/2、2/4）
* 能夠將分數化簡到最簡形式
* 能夠比較大小不同的分數

**學習目標：**

* 能夠理解分數的基本概念
* 能夠運用分數來表示部分和整體之間的關係
* 能夠將分數轉化為小數，並且能夠進行相關的計算

**教學資源：**

* 南一電子
* 書籍

**評量方式：**

* 觀察評量
* 運算評量
* 實踐評量
* 口頭評量
* 發表評量

**融入議題：**

* 環境教育
* 家庭教育
* 品德教育
* 生涯規劃教育
* 閱讀素養教育
* 戶外教育
* 性別平等教育
* 人權教育

如果您需要更多信息，請隨時詢問！


In [9]:
# Define the query
query = "給我一些關於課程1-1的例題，全部使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

print(response['result'])

  warn_deprecated(


 

Based on the provided curriculum outline, here are some example questions related to Lesson 1-1:

**Topic:** 整數除以整數 (Integer Division)

**Example Questions:**

1. 48 ÷ 6 = ?
Answer: 8
2. 24 ÷ 4 = ?
Answer: 6
3. 90 ÷ 15 = ?
Answer: 6
4. 72 ÷ 9 = ?
Answer: 8
5. 120 ÷ 20 = ?
Answer: 6

These questions assess students' understanding of integer division and their ability to perform calculations accurately.


In [10]:
# Define the query
query = "給我一些關於課程1-1的例題，全使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

print(response['result'])

 Here are some example questions related to Lesson 1-1:

**Example Questions**

1. 整數除以整數，商為三位小數以內，沒有餘數的計算：
   (e.g.) 45 ÷ 15 = ?

2. 使用直式解決整數除以整數，商為兩位小數沒有餘數的計算：
   (e.g.) 24 ÷ 6 = ?

3. 解決生活中的除法問題：
   (e.g.) 如果有一袋麵粉重30kg，每包需要250g，那麼可以裝多少包？

Let me know if you need more examples! 😊


In [11]:
# Define the query
query = "給我一些關於課程1-1的例題，結合一些生活情境，全部使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

print(response['result'])

 Here are some examples related to Course 1-1:

**Example 1:** 
Tommy has 3 bags of apples. Each bag contains 1/4 kilograms of apples. How many kilograms of apples does Tommy have in total?

**Answer:** 3 x 1/4 = 3/4 kg

**Example 2:** 
A bookshelf has 5 shelves, and each shelf can hold 1/2 boxes of books. If the bookshelf is currently empty, how many boxes of books can it hold in total?

**Answer:** 5 x 1/2 = 5/2 or 2 1/2 boxes

**Example 3:** 
May wants to buy a cake that weighs 3/4 kilograms. The bakery only sells cakes by whole kilograms. Can May buy the cake she wants? Why or why not?

**Answer:** No, because the bakery only sells cakes by whole kilograms, but May wants a cake that weighs 3/4 kilograms, which is less than 1 kilogram.

These examples incorporate real-life scenarios with fractions, making them relatable and engaging for students! 😊


In [12]:
# Define the query
query = "給我一些關於課程1-1的例題，全使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

print(response['result'])

 Here are some example questions related to Lesson 1-1:

**Example Questions**

1. 整數除以整數，商為三位小數以內，沒有餘數的計算：
   (e.g.) 45 ÷ 15 = ?

2. 使用直式解決整數除以整數，商為兩位小數沒有餘數的計算：
   (e.g.) 24 ÷ 6 = ?

3. 解決生活中的除法問題：
   (e.g.) 如果有一袋麵粉重30kg，每包需要250g，那麼可以裝多少包？

Let me know if you need more examples! 😊
