### Load data

In [27]:
import PyPDF2

file_path = "../data/data.pdf"

try:
    with open(file_path, "rb") as f:
        reader = PyPDF2.PdfReader(f)

        text = ""
        for i, page in enumerate(reader.pages):
            try:
                text += page.extract_text() + "\n"
            except Exception:
                print(f"‚ö†Ô∏è Extract failed on page {i}")

    print("PDF loaded successfully!")
    print(text[:200])

except FileNotFoundError:
    print("‚ùå File not found:", file_path)
except Exception as e:
    print("‚ùå Unexpected error:", e)


PDF loaded successfully!
(Translation)
 
 
1
 
 
 
 
 
 
Kasetsart
 
University Regulations on
 
Kasetsart University Undergraduate Studies
 
B.E. 25
66
 
(20
23
)
 
--------------------------------------
 
Whereas it is expe


### Clean text
- Remove unnecessary blank lines

In [28]:
import re

def clean_text(text):
    lines = [line.strip() for line in text.splitlines()]
    
    cleaned_lines = []
    for line in lines:
        if line != "":
            cleaned_lines.append(line)
        elif len(cleaned_lines) > 0 and cleaned_lines[-1] != "":
            cleaned_lines.append("")
    
    cleaned = "\n".join(cleaned_lines)
    cleaned = re.sub(r" {2,}", " ", cleaned)
    return cleaned


In [29]:
cleaned_text = clean_text(text)
print(cleaned_text[:200])


(Translation)

1

Kasetsart

University Regulations on

Kasetsart University Undergraduate Studies

B.E. 25
66

(20
23
)

--------------------------------------

Whereas it is expedient to establish t


### Chunking

In [32]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=950,
    chunk_overlap=130
)

chunks = splitter.split_text(cleaned_text)
print(f"Total chunks: {len(chunks)}\n")
print(chunks[1])


Total chunks: 46

3.

The fol
lowing announcements shall be cancelled:

3.1

The Announcement of Kasetsart University Council on Kasetsart University
Undergraduate Studies, B.E. 2548 (2005)

3.2

The Announcement of Kasetsart University Council on Kasetsart

University
Undergraduate Studies (No. 2), B.E. 2548 (2005)

3.3

The Announcement of Kasetsart University Council on Kasetsart University
Undergraduate Studies (No. 3), B.E. 2557 (2014)

4.

In this set of regulations:

‚ÄúUniversity‚Äù refers to Kasetsart University.

‚ÄúFaculty‚Äù refers to faculty or college.

‚ÄúStudent‚Äù refers to a student of Kasetsart University.

‚ÄúAcademic Committee‚Äù refers to Kasetsart University Academic Committee.

‚Äú
Working Unit

Committee‚Äù refers to
a

faculty or college committee.


### Embedding

In [33]:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vectordb = Chroma.from_texts(
    texts=chunks,
    embedding=embeddings,
    persist_directory="chroma_db"
)

print(vectordb._collection.count())

46


In [109]:
vectordb.persist()

  vectordb.persist()


### testing vector search

In [110]:
results = vectordb.similarity_search("good students")
for r in results:
    print(r.page_content)

credits throu ghout the curriculum.
29.2 Students awarded with honors are allowed to wear a medal of honor.
Section 6
Student Conduct and Discipline
---------------------------
30. Student conduct
30.1 All students must follow the laws, rules, announcements, and regulations of the
University in all respect and always strictly observe discipline.
30.2 All students must always behave morally in accordance with the Thai social norm .
30.3 All students must protect the reputation of th e University by refraining from any
behavior which might bring or lead to damage to the students themselves and the University.
30.4 Students must be able to present their student identification card promptly when
they are in the University area or upon request by the University staff.
30.5 Students must inform the University immediately when they change their personal
or address information.
26.3.11 Being sentenced to imprisonment, except for petty offenses or offenses from
negligence.
26.4 Those with a stu

### Creating RAG model

- Create retrieve search closely 3 chunks

In [79]:
retriever = vectordb.as_retriever(
    search_kwargs={"k": 3}
)

- select LLM model

In [80]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.2
)



- create system prompt

In [104]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
Use ONLY the information provided in the context to answer the question.
Extract relevant details and summarize clearly. Do not invent any information.

If the answer cannot be found in the context, reply:
"No information"

Context:
{context}

Question:
{question}
"""
)



In [105]:
def rag_chain(question: str, show_context=True):
    # üîç Retrieve docs matched to question
    docs = retriever.invoke(question)
    context = "\n".join([d.page_content for d in docs])
    
    # üß© Format prompt
    formatted_prompt = prompt.format(
        context=context,
        question=question
    )

    # ü§ñ LLM generate answer
    response = llm.invoke(formatted_prompt)
    answer = response.content

    # üñ® Display nicely
    print("üìå Input Question:")
    print(question)
    print("\nüìö Retrieved Context:")
    if show_context:
        print(context)
    else:
        print("(hidden)")
    print("\nüß† Model Output:")
    print(answer)

    return answer


In [106]:
ans = rag_chain("How many credits are there per semester?")

üìå Input Question:
How many credits are there per semester?

üìö Retrieved Context:
per one regular semester).
8.3 Internship or field training (career internship) which spans three ‚Äìsix hours p er week
or 45 ‚Äì90 hours in one regular semester is equivalent to one credit in the bi -semester system .
8.4 An assigned project or any other academic activity that requires a minimum of 45
hours of time during one regular semester is equivalent to one credit in the bi -semester system.
8.5 Lectures, discussions or laboratory sessions take 50 minutes per one hour.
8.6 Approval from the Academic Committee and University Council is required for the
administration of academic semesters that differ from Items 8.1 -8.2, or in situations
where there are variations in instruction or teaching administration.
9. Registration
9.1 The schedule and method of registration shall be in accordance with the University
prescription in each semester.
9.2 To register courses of study, the sched ule of each 

In [102]:
ans = rag_chain("What are the grading criteria?")

üìå Input Question:
What are the grading criteria?

üìö Retrieved Context:
accordance with the University regulations and are not able to withdraw such
course.
14. Evaluation and assessment
14.1 Assessment of each course can be done by evaluating the learning outco mes of
students as specified in each course and is in the form of grades, which can be
interpreted as follows:
(Translation)
6 Grade Meaning Points
A Excellent 4.0
B+ Very good 3.5
B Good 3.0
C+ Fairly good 2.5
C Fair 2.0
D+ Poor 1.5
D Very poor 1.0
F Fail 0.0
I Incomplete -
S Satisfactory -
U Unsatisfactory -
P Passed -
NP Not passed -
N Grade not reported -
Grade ‚ÄúI‚Äù is used only in the case where some works of a student in a particular course
are incomplete, but such student has been assessed for other tasks throughout the semester, which met
the instructor‚Äôs satisfaction.
Grades ‚ÄúS‚Äù and ‚ÄúU‚Äù are used for audit courses.
Grade ‚ÄúP‚Äù and ‚ÄúNP‚Äù is used for the courses whose grades are not calculated in th

In [107]:
ans = rag_chain("Is my friend named Jimmy studying here?")

üìå Input Question:
Is my friend named Jimmy studying here?

üìö Retrieved Context:
26.3.11 Being sentenced to imprisonment, except for petty offenses or offenses from
negligence.
26.4 Those with a student status must possess a student identification card as prescribed by
the University to avail themselves of the privileges and benefits offered by the University.
Section 5
Graduation, Degree Conferral , and Giving of Academic Excellence A wards
------------------------
27. Giving of Academic Excellence Awards
Students entitled to receive an academic excellence award must obtain the GPA for that
academic year of 3.50 or over and pass every course. However, the study result s of the summer
(Translation)
13 session shall not be included for calculation. Students must register not less than 32 credits throughout
the two regular semesters of that academic year, excluding the credits from internship. In addition, the
registered courses must not be retaken course s, courses awarded with an 

In [108]:
ans = rag_chain("How to be a good student?")

üìå Input Question:
How to be a good student?

üìö Retrieved Context:
30.5 Students must inform the University immediately when they change their personal
or address information.
30.6 Students must maintain unity and refrain from any behavior which might bring or
lead to disunity.
30.7 Students must follow the exa mination regulations and not commit any act as a sign
of dishonesty or as dishonesty.
30.8 Students must not possess, take, or sell liquor and drugs in the University area.
30.9 Students must not carry weapons or explosives when they are in the University
area.
30.10 Students must not quarrel with one another or with other persons inside or outside
the University area.
30.11 Students must not engage in gambling of all kinds with or without wagers in the
University area.
30.12 Students must not publish any printed matters, drawings, writings or electronic
media which may affect others without seeking approval from the University.
(Translation)
credits throu ghout the curricu