Performs RAG (Retrieval-Augmented Generation) using the organized course data generated by ChatGPT
<br> Explanation on RAG: https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-retrieval-augmented-generation

In [1]:
# import packages
from dotenv import load_dotenv
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_ibm import WatsonxLLM
from langchain.vectorstores import FAISS
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes, DecodingMethods
from ibm_watsonx_ai import Credentials

In [2]:
# Load the organized textbook data
textbook_extracted_path = r"C:\Users\ediso\OneDrive\Desktop\ibm-cfc-2024\rita-cfc-2024\ai\course-prep\textbook-extracted\azure_document_intelligence\kang_math_5th_1st_extracted.txt"

with open(textbook_extracted_path, "r", encoding="utf-8") as file:
    extracted_text = file.read()    

In [3]:
# Create a RecursiveCharacterTextSplitter object to split the text into chunks

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Maximum number of characters in each chunk
    chunk_overlap=200,     # Number of characters that overlap between consecutive chunks
    length_function=len    # Function to measure the length of chunks
)

texts = text_splitter.split_text(extracted_text)

# Display the first few chunks to ensure proper splitting
# for i, chunk in enumerate(texts[:5]):
#     print(f"Chunk {i+1}:\n{chunk}\n")

In [9]:
# Convert Text Chunks into Embeddings (dense vector representation of the text that capture semantic information)

# Initialize the embedding model using Model on HuggingFace
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

# Initialize FAISS (Facebook AI Similarity Search) vector store, converting raw text chunks into embeddings
faiss_store = FAISS.from_texts(texts, embedding_model)

# Define the save path and the name for the vector store
# TODO: convert to relative path from root
save_path = r'C:\Users\ediso\OneDrive\Desktop\ibm-cfc-2024\rita-cfc-2024\ai\course-prep\RAG\vector-stores'
vector_store_name = 'kang_math_5th_1st_vector_store_with_info'



full_save_path = os.path.join(save_path, vector_store_name)
os.makedirs(full_save_path, exist_ok=True)

# Save FAISS vector store to disk with a name
faiss_store.save_local(full_save_path)

# Load FAISS store from disk
faiss_store = FAISS.load_local(full_save_path, embedding_model, allow_dangerous_deserialization=True)

# Create a retriever chain
retriever = faiss_store.as_retriever()

In [10]:
# Load sensitive info 
# TODO: API Key should not be pushed up but its whatever for now
load_dotenv()
API_KEY = os.getenv('API_KEY')
URL = os.getenv('URL')
PROJECT_ID = os.getenv('PROJECT_ID')

In [26]:
# organize prompt

# LLM OUTPUT
# type MessageRitaResponse = {
#  reply: string;
#  // The 'extra' field is the content that will be parsed in the widget in gui
#  extra: {
#    widgetId: string;
#    content: JSON;
#  };
# };

# LLM INPUT
# type MessageRitaRequestServerToWatson = {
#  prompt: string;
#  widget: {
#    id: string;
#    type: number;
#    content: JSON;
#  };
#  // use classroomId and lectureId
#  // to query for actual data from db
#  classroom: {
#    name: string;
#    subject: string;
#    grade: string;
#    publisher: string;
#    credits: number;
#  };
#  lecture: {
#    name: string;
#    type: number;
#  };
# };

# sample
# type MessageRitaRequestServerToWatson = {
#  prompt: "跟我說更多關於課程1-1的內容，全部使用繁體中文";
#  widget: {
#    id: string;
#    type: number;
#    content: JSON;
#  };
#  // use classroomId and lectureId
#  // to query for actual data from db
#  classroom: {
#    name: 親愛的511班;
#    subject: 數學;
#    grade: 五上;
#    publisher: 康軒;
#    credits: 5;
#  };
#  lecture: {
#    name: string;
#    type: number;
#  };
# };

def create_prompt(input):
    input_output_instruction = """

    Sample LLM Input: 

    input = {
        "prompt": "幫我在這個計畫的第一周及第三周後裡面安插第一次和第二次段考",
        "widget": {
            "id": "12",
            "type": 1,
            "content": {
                "headings": ["週目", "目標", "教材"],
                "rows": [
                    {"週目": 1, "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
                    {"週目": 2, "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
                    {"週目": 3, "目標": "讓學生了解小數與概數", "教材": "1-4"}]
            }
        },
        "classroom": {
            "name": "親愛的511班",
            "subject": "數學",
            "grade": "五上",
            "publisher": "康軒",
            "credits": 5
        },
        "lecture": {
            "name": "string",
            "type": 0
        }
    }

    Sample LLM Output:

    output = {
        "reply": '沒問題!已幫你在這個計畫的第一周及第三周後裡面安插第一次和第二次段考',
        "widgetId": '12',
        "content": {
            "headings": ["週目", "目標", "教材"],
            "rows": [
                {"週目": "1", "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
                {"週目": "2", "目標": "第一次段考", "教材": "無"},
                {"週目": "3", "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
                {"週目": "4", "目標": "讓學生了解小數與概數", "教材": "無"},
                {"週目": "5", "目標": "第二次段考", "教材": "1-3, 1-4"},
            ]
        }
    }
    
    Use the fields "subject", "grade" and "publisher" to locate the relevant information in the textbook when performing the retrieval. Fill the "reply" field with the answer to the question and "widgetId" and "content" fields with the "id" and "content" from input's "widget".
    
    Help me generate a output based on my input below:"""

    return input_output_instruction + input


In [27]:
# Initialize WatsonX LLM Interface

credentials = Credentials.from_dict({
    'url': URL,
    'apikey': API_KEY
})

params = {
    GenParams.MAX_NEW_TOKENS: 4095,
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.REPETITION_PENALTY: 1.0
}

# Initialize the LLM model
llm = WatsonxLLM(
    model_id=ModelTypes.LLAMA_3_70B_INSTRUCT.value,
    params=params,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=PROJECT_ID
)

# Define the QA chain
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

In [28]:
# Define the query
# query = "跟我說更多關於課程1-1的內容，全部使用繁體中文\n"

# Get the response using the query embedding
# response = qa.invoke({"query": query})

sample_input_1 = """{
    "prompt": "幫我在這個計畫裡的第二周後連續安插第一次和第二次段考",

    "widget": {
        "id": "12",
        "type": 1,
        "content": {
            "headings": ["週目", "目標", "教材"],
            "rows": [
                {"週目": 1, "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
                {"週目": 2, "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
                {"週目": 3, "目標": "讓學生了解小數與概數", "教材": "1-4"}]
        }
    },
    "classroom": {
        "name": "我最討厭的的607班",
        "subject": "數學",
        "grade": "五上",
        "publisher": "康軒",
        "credits": 5
    },
    "lecture": {
        "name": "string",
        "type": 0
    }
} """

response = qa.run(create_prompt(sample_input_1))

# print(response['result'])

In [29]:
print(response)

 

output = {
    "reply": '沒問題!已幫你在這個計畫裡的第二周後連續安插第一次和第二次段考',
    "widgetId": '12',
    "content": {
        "headings": ["週目", "目標", "教材"],
        "rows": [
            {"週目": "1", "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
            {"週目": "2", "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
            {"週目": "3", "目標": "第一次段考", "教材": "無"},
            {"週目": "4", "目標": "第二次段考", "教材": "無"},
            {"週目": "5", "目標": "讓學生了解小數與概數", "教材": "1-4"}
        ]
    }
}


In [30]:
sample_input_2 = """{
    "prompt": "幫我刪除第二周和第三周",

    "widget": {
        "id": "12",
        "type": 1,
        "content": {
            "headings": ["週目", "目標", "教材"],
            "rows": [
                {"週目": 1, "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
                {"週目": 2, "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
                {"週目": 3, "目標": "讓學生了解小數與概數", "教材": "1-4"}]
        }
    },
    "classroom": {
        "name": "我最討厭的的607班",
        "subject": "數學",
        "grade": "五上",
        "publisher": "康軒",
        "credits": 5
    },
    "lecture": {
        "name": "string",
        "type": 0
    }
} """

response = qa.run(create_prompt(sample_input_2))

In [32]:
print(response)

 

output = {
    "reply": '已刪除第二周和第三周',
    "widgetId": '12',
    "content": {
        "headings": ["週目", "目標", "教材"],
        "rows": [
            {"週目": "1", "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"}
        ]
    }
}


In [33]:
sample_input_3 = """{
    "prompt": "幫我在第二周後連續增加三次段考",

    "widget": {
        "id": "12",
        "type": 1,
        "content": {
            "headings": ["週目", "目標", "教材"],
            "rows": [
                {"週目": 1, "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
                {"週目": 2, "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
        }
    },
    "classroom": {
        "name": "我最討厭的的607班",
        "subject": "數學",
        "grade": "五上",
        "publisher": "康軒",
        "credits": 5
    },
    "lecture": {
        "name": "string",
        "type": 0
    }
} """

response = qa.run(create_prompt(sample_input_3))

In [34]:
print(response)

 

output = {
    "reply": '沒問題!已幫你在第二周後連續增加三次段考',
    "widgetId": '12',
    "content": {
        "headings": ["週目", "目標", "教材"],
        "rows": [
            {"週目": "1", "目標": "讓學生能認識多位小數與比較小數", "教材": "1-1, 1-2"},
            {"週目": "2", "目標": "讓學生學習多位小數的加減及日常應用", "教材": "1-3"},
            {"週目": "3", "目標": "第一次段考", "教材": "無"},
            {"週目": "4", "目標": "第二次段考", "教材": "無"},
            {"週目": "5", "目標": "第三次段考", "教材": "無"},
        ]
    }
}


Query 1: "跟我說更多關於課程1-1的內容，全部使用繁體中文"
 
<details>
<summary>1. REPETITION_PENALTY: 1.2</summary> 
根據提供的內容，我可以看到課程1-1是「倍數」，它位於第三單元「倍數與公倍數」。以下是我找到的一些信息：

* 活動名稱：倍數
* 教學節數：1節
* 頁碼：31、32
* 活動目標：
	+ 能了解倍數的意義
	+ 能找出某數的倍數
* 內容包括：
	+ 了解倍數的意義
	+ 找出某數的倍數
	+ 在指定區間中，找出某數的倍數
	+ 透過乘與除，了解倍數和因數的關係
	+ 解決倍數的應用問題

如果你需要更多信息，請告訴我！
</details>
 
<details>
<summary>2. REPETITION_PENALTY: 1.1</summary> 
根據提供的內容，我可以看到課程1-1是「倍數」的第一個活動，名為「了解倍數的意義」。這個活動的目標是讓學生能夠：

* 認識倍數的意義
* 找出某數的倍數

在這個活動中，學生們會通過一些問題和活動，例如：

* 找出某數的倍數，例如找出10的倍數
* 了解倍數和因數的關係
* 透過乘與除，了解倍數和因數的關係

這個活動的目的是讓學生初步了解倍數的概念，並且能夠找到某數的倍數。
</details>

<details>
<summary>3. REPETITION_PENALTY: 1.0</summary> 
根據提供的內容，課程1-1是「倍數」的第一個活動，名為「了解倍數的意義」。這個活動的目標是讓學生認識倍數的意義，並能找出某數的倍數。

在這個活動中，學生將通過以下步驟來學習倍數：

1. 開門活動：透過「土鳳梨酥特價中」的情境，複習整數的乘法計算。
2. 活動1-1：了解倍數的意義，讓學生認識倍數的概念。
3. 活動1-2：找出某數的倍數，讓學生能夠找出某數的倍數。

這個活動的知識脈絡是認識倍數，找出某數的倍數，並了解倍數和因數的關係。

在教學指導計畫中，這個活動的教學節數是1節，頁碼是31、32。教師可以根據學生的需求和進度，調整教學的步驟和時間。
</details>

In [33]:
# TODO: try repetition penaly between 1.0 - 1.1
# Define the query
query = "給我一些關於課程1-1的例題，全部使用繁體中文"
# teacher need to specify what 1-1 is so model can refer to the correct section

# Get the response using the query embedding
response = qa.invoke({"query": query})

# print(response['result'])

Query 2: "給我一些關於課程1-1的例題，全部使用繁體中文"


 
<details>
<summary>1. REPETITION_PENALTY: 1.2</summary> 
 Here are some example questions related to Lesson 1-1 "認識多位小數" in a Taiwanese math textbook:

**Example Questions**

1. 下列表示的是哪種小數？
（A）0.45    （B）4.05   （C）405     （D）40.5

Answer: （B）4.05

2. 計算：2.56 + 1.78 = ?

Answer: 4.34

3. 比較下列小數的大小：
（A）3.42 和 3.49      （B）2.15 和 2.51

Answers:
(A) 3.42 < 3.49
(B) 2.15 < 2.51

These examples illustrate basic understanding and operations with multi-digit decimals, which is the focus of Lesson 1-1.
</details>
 
<details>
<summary>2. REPETITION_PENALTY: 1.1</summary> 
 Here are some example questions related to Lesson 1-1 "認識多位小數" (Recognizing Multi-Digit Decimals):

**Example 1**
What is the value of each digit in the decimal number 45.27?

* 十位 (tens place): _______
* 個位 (units place): _______
* 十分位 (tenths place): _______
* 百分位 (hundredths place): _______

**Example 2**
Write the decimal number 3.456 in words.

**Example 3**
Compare the two decimal numbers 2.58 and 2.59. Which one is larger?

**Example 4**
Round the decimal number 4.823 to the nearest tenth.

These examples are designed to help students recognize and understand multi-digit decimals, including their values, notation, and comparisons.
</details>

<details>
<summary>3. REPETITION_PENALTY: 1.0</summary> 
 Here are some example questions related to Lesson 1-1 "認識多位小數" (Recognizing Multi-Digit Decimals):

**Example 1**
小明買了一本書，價格是12.5元。小明付了15元，找回多少錢？

(Answer: 2.5元)

**Example 2**
一瓶水的重量是3.75公斤。你需要3瓶水，總重量是多少公斤？

(Answer: 11.25公斤)

**Example 3**
一個蛋糕的長度是2.8米，寬度是1.4米。蛋糕的周長是多少米？

(Answer: 8.4米)

**Example 4**
小紅有12.8元錢，她買了一個玩具需要8.5元錢，剩下多少錢？

(Answer: 4.3元)

These examples are designed to help students recognize and understand multi-digit decimals in real-life situations.
</details>

In [31]:
# Define the query
query = "給我一些關於課程1-1的例題，結合一些生活情境，全部使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

# print(response['result'])

Query 3: "給我一些關於課程1-1的例題，結合一些生活情境，全部使用繁體中文"
<br>In this case the model thought 1-1 is "認識線對稱圖形", which is incorrect
 
<details>
<summary>REPETITION_PENALTY: 1.2</summary> 
 Here are some example questions related to Lesson 1-1 "認識線對稱圖形" with real-life scenarios:

**Example Question 1**
🌊 海灘上的沙堡可以被看作線對稱圖形嗎？如果可以，请畫出它的對稱軸。

Translation: Can a sandcastle on the beach be considered as a line-symmetric figure? If so, please draw its axis of symmetry.

**Example Question 2**
🏠 家裡的窗戶通常都是線對稱圖形。你能找到你家窗戶的對稱軸嗎？

Translation: The windows in our homes are usually line-symmetric figures. Can you find the axis of symmetry of your home window?

**Example Question 3**
👜 女孩們喜歡穿戴對稱的服裝。如果她們穿了一件線對稱圖形的裙子，你能畫出它的對稱軸嗎？

Translation: Girls like to wear symmetrical clothes. If they wore a dress with a line-symmetric pattern, can you draw its axis of symmetry?

These examples aim to help students connect mathematical concepts to their everyday lives and develop problem-solving skills.
</details>
 
<details>
<summary>2. REPETITION_PENALTY: 1.1</summary> 
 Here are some example questions related to Lesson 1-1 "認識線對稱圖形" with real-life scenarios:

**Example 1:** 南北朝的建築設計
In ancient Chinese architecture, symmetrical designs were often used in building facades. Identify the line of symmetry in the following diagram:



**Answer:** The vertical line in the middle of the diagram is the line of symmetry.

**Example 2:** 湖中倒影
When taking a photo of a mountain reflected in a lake, we can observe symmetry. Identify the line of symmetry in the following diagram:



**Answer:** The horizontal line where the mountain meets its reflection is the line of symmetry.

**Example 3:** 花卉圖案
Symmetry is also found in flower patterns. Identify the line of symmetry in the following diagram:



**Answer:** The diagonal line from top-left to bottom-right is the line of symmetry.

These examples illustrate how symmetry appears in various aspects of life, making it easier for students to understand and appreciate the concept of line symmetry.
</details>

<details>
<summary>3. REPETITION_PENALTY: 1.0</summary> 
 Here are some example questions related to Lesson 1-1 "認識線對稱圖形" (Recognizing Line Symmetry) with real-life scenarios, all in Traditional Chinese:

**Example 1:**
在一家餐廳的菜單上，有一個線對稱的圖形，請問這個圖形是否具有線對稱性？(Is the shape on the menu of a restaurant line symmetric?)

[Image of a menu with a line symmetric shape]

**Example 2:**
小明的妹妹喜歡穿線對稱的衣服，請問這件衣服是否具有線對稱性？(Does the dress worn by Little Ming's sister have line symmetry?)

[Image of a dress with line symmetry]

**Example 3:**
在一個公園裡，有一個線對稱的遊戲設施，請問這個設施是否具有線對稱性？(Is the playground equipment in the park line symmetric?)

[Image of a playground equipment with line symmetry]

**Example 4:**
小華的爸爸是一個設計師，他設計了一個線對稱的Logo，請問這個Logo是否具有線對稱性？(Does the logo designed by Little Hua's father have line symmetry?)

[Image of a logo with line symmetry]

These examples aim to help students recognize line symmetry in real-life scenarios, making the concept more relatable and interesting.
</details>

In [41]:
# Define the query
query = "給我一些關於課程1-1(了解倍數的意義)的例題，結合一些生活情境，全部使用繁體中文"

# Get the response using the query embedding
response = qa.invoke({"query": query})

# print(response['result'])

Query 4: "給我一些關於課程1-1(了解倍數的意義)的例題，結合一些生活情境，全部使用繁體中文"
<br>In this case we tell the model 1-1 is "了解倍數的意義"
 
<details>
<summary>1. REPETITION_PENALTY: 1.2</summary> 
 Here are some example questions related to Lesson 1-1 (Understanding Multiples) with real-life scenarios:

**Example Question 1**
Tommy has 18 pencils in his pencil case. He wants to put them into boxes of 3 pencils each. How many boxes can he fill?

**Answer**: Tommy can fill 6 boxes because 18 ÷ 3 = 6.

**Real-Life Scenario**: Imagine you have a bunch of toys or stationery and want to organize them into smaller groups. Understanding multiples helps us divide things evenly!

**Example Question 2**
A bakery sells bread loaves in packs of 4. If they need to pack 24 loaves for delivery, how many packs do they need?

**Answer**: The bakery needs 6 packs because 24 ÷ 4 = 6.

**Real-Life Scenario**: Think about buying items like snacks or drinks in bulk packaging. Knowing multiples helps us calculate how many packages we need!

**Example Question 3**
Lily's mom is making cookies for her birthday party. She needs to package 15 cookies into bags of 5 cookies each. Can she do it?

**Answer**: Yes, Lily's mom can package all 15 cookies because 15 ÷ 5 = 3.

**Real-Life Scenario**: When preparing food or treats for events, understanding multiples ensures we have enough portions for everyone!

These examples illustrate how understanding multiples applies to everyday situations, such as organizing objects, packing goods, and portioning food.
</details>
 
<details>
<summary>2. REPETITION_PENALTY: 1.1</summary> 
 Here are some example questions related to Lesson 1-1 (Understanding the concept of multiples) with real-life scenarios:

**Example 1**
Tommy has 18 pencils in his pencil case. He wants to put them into boxes of 3 pencils each. How many boxes can he fill?

**Answer**: Since 3 is a multiple of 18 (3 × 6 = 18), Tommy can fill 6 boxes.

**Example 2**
A bakery sells bread loaves in packs of 4. If they have 24 bread loaves, how many packs can they make?

**Answer**: Since 4 is a multiple of 24 (4 × 6 = 24), the bakery can make 6 packs.

**Example 3**
Lily has 15 stickers and she wants to share them equally among her 5 friends. How many stickers will each friend get?

**Answer**: Since 5 is a multiple of 15 (5 × 3 = 15), each friend will get 3 stickers.

**Example 4**
A bookshelf has 21 books on it, arranged in rows of 3 books each. How many rows of books are there?

**Answer**: Since 3 is a multiple of 21 (3 × 7 = 21), there are 7 rows of books.

These examples illustrate how the concept of multiples is used in everyday life, making it more relatable and engaging for students.
</details>

<details>
<summary>3. REPETITION_PENALTY: 1.0</summary> 
 Here are some example questions related to Lesson 1-1 (Understanding the concept of multiples) with real-life scenarios, all in Traditional Chinese:

**Example 1:**
情境：小明的生日派對上有12個朋友，每個人要分配3個蛋糕。你需要準備多少個蛋糕？
問題：如果每個人需要3個蛋糕，12個朋友需要多少個蛋糕？

**Answer:** 12 x 3 = 36個蛋糕

**Example 2:**
情境：一盒麵包有12個麵包，每個麵包需要2個麵包包裝。你需要準備多少個麵包包裝？
問題：如果每個麵包需要2個麵包包裝，12個麵包需要多少個麵包包裝？

**Answer:** 12 x 2 = 24個麵包包裝

**Example 3:**
情境：一本書有15頁，每頁有3個圖片。你需要準備多少個圖片？
問題：如果每頁有3個圖片，15頁需要多少個圖片？

**Answer:** 15 x 3 = 45個圖片

**Example 4:**
情境：一個水果店需要準備24個水果盒，每個盒需要4個水果。你需要準備多少個水果？
問題：如果每個盒需要4個水果，24個盒需要多少個水果？

**Answer:** 24 x 4 = 96個水果

These examples aim to help students understand the concept of multiples in real-life scenarios, making it more relatable and engaging.
</details>