# Week 11: Capstone Project Part 4

- Done by: A Alkaff Ahamed
- Grade: Pending
- 25 June 2025


## Learning Outcome Addressed
- Design and implement a Retrieval-Augmented Generation (RAG) pipeline using generative AI tools. 
- Load and preprocess domain-specific documents, generate vector embeddings, and store them in a vector database. 
- Deploy a memory-enabled AI assistant capable of delivering contextually grounded responses. 
- Analyse the application of generative AI in real-world workflows and critically reflect on creative opportunities and ethical considerations. 


## Module Overview 

This capstone marks a significant step in developing your AI assistant. Moving beyond basic model outputs, you will integrate document ingestion, semantic search, and short-term conversational memory. 

Using the RAG architecture, your assistant will combine generative capabilities with real-time access to domain-specific documents. This approach reflects how modern AI systems are deployed in industries such as human resources, customer service, legal support, healthcare, and education. 

Additionally, you will compare this approach to other generative models like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), apply it to creative domains, and evaluate the ethical implications associated with such systems. 


### Main Tasks 

#### Task 1: Document Loading and Preprocessing 

- Select a document in PDF, TXT, or DOCX format related to a professional or creative field (e.g., HR policy, legal documentation, training material, or artistic critique). 
- Load the document using LangChain loaders such as PyMuPDFLoader or UnstructuredFileLoader. 
- Preprocess the text using RecursiveCharacterTextSplitter to divide content into meaningful chunks (approximately 500 tokens each). 
- Display at least three representative chunks from the processed text. 
 

#### Task 2: Text Embedding and Vector Store Setup 

- Use a model such as OpenAIEmbeddings, InstructorEmbeddings, or an equivalent HuggingFace embedding model to generate vector representations of each chunk. 
- Store the embeddings in a vector database such as FAISS or ChromaDB. 
- Execute a semantic search query and retrieve the top three most relevant document chunks. 
- Present the query, the retrieved results, and a brief explanation of your embedding and vector search setup. 
 

#### Task 3: Retrieval-Augmented Generation with Conversational Memory 

- Integrate the vector store with a language model (e.g., GPT-3.5 or GPT-4) using LangChain‚Äôs RetrievalQA or conversationalRetrievalChain. 
- Add memory capability using ConversationBufferMemory or a similar tool. 
- Simulate a three-turn conversation where each user query builds upon previous context. Ensure that the assistant references retrieved document content accurately. 
- Highlight how conversational memory was used to enhance context-awareness and continuity.


**Estimated time:** 60-90 minutes

**Submission Instructions:**

- Select the **Start Assignment** button at the top right of this page.
- Upload your answers in the form of a Word or PDF file.
- Upload a zipped folder to the learning platform containing:
- **Jupyter Notebook (.ipynb)**
  - Code for document loading, chunking, embeddings 
  - Semantic search queries and outputs 
  - RAG conversation pipeline with memory 
- Select the **Submit Assignment** button to submit your responses.

*This is a graded and counts towards programme completion. You may attempt this assignment only once.*


## Import Libraries and Setup

In [1]:
import os
#from dotenv import load_dotenv
os.environ["TRANSFORMERS_NO_TF"] = "1"

from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
#from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.schema import Document
from langchain.memory import ConversationBufferMemory
from langchain.llms import HuggingFacePipeline

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
import torch

print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("Using GPU:", torch.cuda.get_device_name(0))


CUDA available: False


**NOTE:**

No environment varibales because entire setup is local


## üìå Task 1: Document Loading and Preprocessing 

- Select a document in PDF, TXT, or DOCX format related to a professional or creative field (e.g., HR policy, legal documentation, training material, or artistic critique). 
- Load the document using LangChain loaders such as PyMuPDFLoader or UnstructuredFileLoader. 
- Preprocess the text using RecursiveCharacterTextSplitter to divide content into meaningful chunks (approximately 500 tokens each). 
- Display at least three representative chunks from the processed text. 

In [5]:
# Load PDF
# --------

pdf_path = "company_policy.pdf"  # Update this if the path differs
loader = PyMuPDFLoader(pdf_path)
documents = loader.load()

print(f"Total pages: {len(documents)}\n") # 1 LangChain document per page


Total pages: 24



In [6]:
print("=== Page 1 Contents ===")
print(documents[0].page_content[:1000], "...")   # preview first 1 000 chars


=== Page 1 Contents ===
SPIL Corporate HR Policies  
 
 
SIRCA PAINTS INDIA LTD 
NEW DELHI  
 
 
 
 
CORPORATE  
  HUMAN RESOURCES 
POLICIES & MANUALS ...


In [7]:
print("=== Page 2 Contents ===")
print(documents[1].page_content[:1000], "...")   # preview first 1 000 chars


=== Page 2 Contents ===
SPIL Corporate HR Policies  
 
 
 
 
Section 1: Introduction  
 
This handbook is the summary of the policies, procedures, guidance and benefits to the employees 
and organization. It is an introduction to our vision, mission, values, what you expect from us and 
what we expect from you. We believe that employees are the assets of the organization and to 
understand them the positive work environment play an important role. 
 
This Employee Hand Book(EHB) is the confidential property of Sirca Paints India Limited (SPIL) 
 
and any use, distributing, copying or disclosure by any person to outsiders without any proper 
authorization is strictly prohibited. 
 
Any query or doubt concerning the content of the EHB should be forwarded to the Human 
Resources Department of SPIL.  
 
Applicability  
 
This EHB will be applicable to the employees working in Sirca Paints India Limited (SPIL) w.e.f 
August 21, 2020. This book contains all the notices/circulars/extracts/mee

In [8]:
# Split into Chunks
# -----------------

splitter = RecursiveCharacterTextSplitter(
    chunk_size    = 500,
    chunk_overlap = 100,
    #separators    = ["\n\n", "\n", ".", " ", ""],   # fall-back hierarchy
)

chunks = splitter.split_documents(documents)

print(f"Total chunks created: {len(chunks)}\n")


Total chunks created: 122



In [9]:
print("=== Chunk 1 ===")
print(chunks[0].page_content)


=== Chunk 1 ===
SPIL Corporate HR Policies  
 
 
SIRCA PAINTS INDIA LTD 
NEW DELHI  
 
 
 
 
CORPORATE  
  HUMAN RESOURCES 
POLICIES & MANUALS


In [10]:
print("\n=== Chunk 2 ===")
print(chunks[1].page_content)



=== Chunk 2 ===
SPIL Corporate HR Policies  
 
 
 
 
Section 1: Introduction  
 
This handbook is the summary of the policies, procedures, guidance and benefits to the employees 
and organization. It is an introduction to our vision, mission, values, what you expect from us and 
what we expect from you. We believe that employees are the assets of the organization and to 
understand them the positive work environment play an important role.


In [11]:
print("\n=== Last Chunk ===")
print(chunks[-1].page_content)



=== Last Chunk ===
Section : 20  Review and Amendment  
Management shall review this policy periodically and amendments required, if any shall be made 
accordingly.  
Section : 21 Residual Power 
This policy is basically guidelines and the management reserves the right to withdraw / modify to 
suit organization‚Äôs philosophy at any time without assigning any reason whatsoever. 
EFFECTIVE 
Commencement Of Policy  
August 21, 2018  
 
 
 
Approved By : ___________SD/-_______________ 
Mr Sanjay Agarwal - CMD


## üìå Task 2: Text Embedding and Vector Store Setup 

- Use a model such as OpenAIEmbeddings, InstructorEmbeddings, or an equivalent HuggingFace embedding model to generate vector representations of each chunk. 
- Store the embeddings in a vector database such as FAISS or ChromaDB. 
- Execute a semantic search query and retrieve the top three most relevant document chunks. 
- Present the query, the retrieved results, and a brief explanation of your embedding and vector search setup.

In [12]:
# Embedding - BGE Small
# ---------------------

embedder = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",
    model_kwargs={"device": "cpu"}        # use "cuda" if you prefer GPU
)


  embedder = HuggingFaceEmbeddings(


In [13]:
# Vector DB - FAISS
# -----------------

vector_db = FAISS.from_documents(chunks, embedder)


In [14]:
# Semantic Search Query
# ---------------------

query = "What is the probation period for new employees?"
results: list[Document] = vector_db.similarity_search(query, k=3)

print(f"Query: {query}\n")
for i, doc in enumerate(results, 1):
    print(f"=== Result {i} ===")
    print(doc.page_content[:800], "...\n")


Query: What is the probation period for new employees?

=== Result 1 ===
a) Appointment Letter given at the time of joining shows the clause of Probation Period. Every 
Employee has to complete the Probation Period on the basis of the following parameters.  
Job Knowledge, Quality of Work, Initiative and Creativity, Punctuality, Interpersonal Skills  
b) On the beginning of the six month, the email will given to the Departmental Heads for the 
assessment of the new entrants on the above parameters. ...

=== Result 2 ===
SPIL Corporate HR Policies  
 
 
 
b) The main purpose of the probation period is to bring an effective employee on board and 
thorough monitoring and performance management process. 
 
c) It covers all the on roll new entrants in the organization and the candidate will be on 
probation of six months  
 
Process of Confirmation  
a) Appointment Letter given at the time of joining shows the clause of Probation Period. Every ...

=== Result 3 ===
productivity.  
 
f) If t

### ‚úçÔ∏è Explanation (brief, for the rubric)

- **Embedding model**: `BAAI/bge-small-en-v1.5` (104 M params). Chosen because it‚Äôs state-of-the-art for English semantic similarity, light-weight and runs locally (no cloud/API keys).
- **Vector DB**: **FAISS** in L2-normalized cosine space. Chosen for single-file persistence, speed, and zero server overhead.
- **Workflow**
  1. Every ~500-token chunk from Task 1 is embedded into a 768-dimensional vector.
  2. Vectors are indexed with FAISS‚Äôs `IndexFlatIP` (inner-product) backend.
  3. A user query is embedded the same way; FAISS returns the top-k nearest vectors, which we map back to the original chunk text.


## üìå Task 3: Retrieval-Augmented Generation with Conversational Memory 

- Integrate the vector store with a language model (e.g., GPT-3.5 or GPT-4) using LangChain‚Äôs RetrievalQA or conversationalRetrievalChain. 
- Add memory capability using ConversationBufferMemory or a similar tool. 
- Simulate a three-turn conversation where each user query builds upon previous context. Ensure that the assistant references retrieved document content accurately. 
- Highlight how conversational memory was used to enhance context-awareness and continuity.

In [15]:
# Load LLM - Qwen 1.5 1.8B Chat
# -----------------------------

model_id = "Qwen/Qwen1.5-1.8B-Chat"
tok  = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
            model_id,
            trust_remote_code=True,
            device_map="auto",          # "cuda" if GPU, else "cpu"
            torch_dtype="auto"
        )

gen_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tok,
    max_new_tokens=512,
    temperature=0.2,
    #device=0  # device=0 for GPU
)

llm = HuggingFacePipeline(pipeline=gen_pipe)


Device set to use cpu
  llm = HuggingFacePipeline(pipeline=gen_pipe)


In [16]:
# Setup Retriever and Memory
# --------------------------

retriever = vector_db.as_retriever(search_kwargs={"k": 3})

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key      = "answer"
)


  memory = ConversationBufferMemory(


In [17]:
# Build the RAG Chain

rag_chain = ConversationalRetrievalChain.from_llm(
    llm          = llm,
    retriever    = retriever,
    memory       = memory,
    #verbose      = True,
    return_source_documents = True
)

def rag_agent(query):
    response = rag_chain.invoke({"question": query})   # <-- use .invoke()

    raw_answer = response["answer"]
    sources    = response.get("source_documents", [])

    # Remove the unnecessary junk
    if "Helpful Answer:" in raw_answer:
        answer = raw_answer.split("Helpful Answer:")[-1].strip()
    else:
        answer = raw_answer.strip()

    # Print Everything
    print(f"\nüó£Ô∏è  USER: {query}\n")
    print(f"ü§ñ AGENT:\n{answer}\n")
    
    if sources:
        print("üìö  Source chunks used:")
        for idx, doc in enumerate(sources, 1):
            preview = doc.page_content[:250].replace("\n", " ")
            print(f"  [{idx}] {preview} ...")
    else:
        print("üìö  No source chunks returned.")


In [18]:
# Turn 1
# ------

rag_agent("Summarise the probation policy for new hires.")



üó£Ô∏è  USER: Summarise the probation policy for new hires.

ü§ñ AGENT:
c) It covers all the on roll new entrants in the organization and the candidate will be on probation of six months. The main purpose of the probation period is to bring an effective employee on board and thorough monitoring and performance management process. The department head will assess the performance and submit the review of the employee to the HR Department. The employee has to earn the minimum category of "Average" and maximum category of "Excellent" for the confirmation. Depending on the performance of the probationers and discretion of the management, the probationer's compensation, grade, designation can be reviewed for motivation and better productivity.

üìö  Source chunks used:
  [1] a) Appointment Letter given at the time of joining shows the clause of Probation Period. Every  Employee has to complete the Probation Period on the basis of the following parameters.   Job Knowledge, Quality of Work,

In [19]:
# Turn 2
# ------

rag_agent("Does the policy mention how performance is evaluated during probation?")



üó£Ô∏è  USER: Does the policy mention how performance is evaluated during probation?

ü§ñ AGENT:
Yes, the policy mentions that the performance of the probationers is evaluated by the department head and submitted to the HR Department for review. The employee has to earn the minimum category of "Average" and maximum category of "Excellent" for the confirmation, depending on their performance during the probation period. This evaluation is crucial as it helps in determining the employee's eligibility for promotion or further development within the company. The HR department reviews the performance based on these criteria and provides feedback to both the employee and the department head to ensure continuous improvement and growth.

üìö  Source chunks used:
  [1] SPIL Corporate HR Policies         b) The main purpose of the probation period is to bring an effective employee on board and  thorough monitoring and performance management process.    c) It covers all the on roll new entran

In [20]:
# Turn 3
# ------

rag_agent("Based on that, what happens if an employee's performance is below expectations?")



üó£Ô∏è  USER: Based on that, what happens if an employee's performance is below expectations?

ü§ñ AGENT:
If an employee's performance is below expectations, they may face a probationary period during which they are expected to improve their skills and meet the required standards set by the company. During this period, the employee may receive additional training, mentorship, or coaching from experienced colleagues to help them enhance their performance. The probationary period typically lasts for six months, after which the HR department will evaluate the employee's progress and determine whether they have met the necessary criteria for promotion or further development within the company. If the employee fails to meet the expectations, their performance may be reviewed again, and if necessary, they may be terminated or offered a lower position with less responsibilities. The goal of the probationary period is to provide an opportunity for employees to learn from their mistakes, ide

In [26]:
# Check Memory History
# --------------------

for i, m in enumerate(memory.load_memory_variables({})["chat_history"]):
    print(f"[{i}]  {m}\n")


[0]  content='Summarise the probation policy for new hires.' additional_kwargs={} response_metadata={}

[1]  content='Use the following pieces of context to answer the question at the end. If you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\n\na) Appointment Letter given at the time of joining shows the clause of Probation Period. Every \nEmployee has to complete the Probation Period on the basis of the following parameters.  \nJob Knowledge, Quality of Work, Initiative and Creativity, Punctuality, Interpersonal Skills  \nb) On the beginning of the six month, the email will given to the Departmental Heads for the \nassessment of the new entrants on the above parameters.\n\nSPIL Corporate HR Policies  \n \n \n \nb) The main purpose of the probation period is to bring an effective employee on board and \nthorough monitoring and performance management process. \n \nc) It covers all the on roll new entrants in the organization and the candidate wi

**How conversational memory improves continuity?**

 The `ConversationBufferMemory` keeps a running log of user-assistant messages (see cell above).

- Turn 2 references ‚Äúthe policy‚Äù without re-stating which policy; because the buffer already contains turn 1, the chain can rewrite the follow-up question into a standalone form and retrieve the correct chunks.
- Turn 3, user asks ‚ÄúBased on that‚Ä¶‚Äù, again relying on prior context kept in memory. Without the buffer, the retriever would not have enough information to link the pronoun ‚Äúthat‚Äù to the probation policy.