# Generative AI Study Assistant using RAG

**Team Name:** Rahaf Kanaan, Shifaa Al-zu'bi, Thabet Zamari, Rafah Ali, Tasneem Alassaf.

This Jupyter Notebook presents a Retrieval-Augmented Generation (RAG) based study assistant developed as part of a Generative AI course project.  
The system is designed to answer student questions by retrieving relevant information from course lecture materials and generating accurate context-aware responses using large language models.

The notebook demonstrates the complete pipeline, including document preprocessing, semantic retrieval, prompt engineering, and response generation.  
Both API-based and local open-source language models are supported, allowing flexibility in experimentation while maintaining the same RAG architecture.


## 1. Environment Setup and Library Installation

In this step, we install all the required Python libraries needed to build the Generative AI project.  
These libraries support:

- LangChain framework for building the RAG pipeline
- Document loading and text splitting
- Embedding models and vector storage using FAISS
- Integration with OpenAI / GitHub Models APIs
- Evaluation utilities and PDF processing

This setup ensures that the environment contains all dependencies before starting the implementation.


In [1]:
!pip -q install -U langchain langchain-community langchain-text-splitters

!pip -q install -U faiss-cpu sentence-transformers

!pip -q install -U langchain-openai tiktoken python-dotenv

!pip -q install -U rouge-score

!pip install pypdf

!pip -q install gradio



## 2. Language Model Selection and Initialization

This step initializes the Large Language Model (LLM) used for answer generation.  
The implementation is designed to be flexible, allowing the system to switch between:

- **API-based models** (GPT-4o via GitHub Models API) for advanced reasoning and explanation.
- **Local open-source models** (flan-t5-base) for offline inference without external API dependencies.

The selection is controlled using a configuration variable (`LLM_MODE`), enabling seamless comparison between different LLM backends while keeping the rest of the RAG pipeline unchanged.


In [2]:
LLM_MODE = "api"

if LLM_MODE == "api":
    from getpass import getpass
    import os
    from langchain_openai import ChatOpenAI

    os.environ["OPENAI_API_KEY"] = getpass("Paste your GitHub PAT: ")

    llm = ChatOpenAI(
        model="gpt-4o",
        base_url="https://models.github.ai/inference/v1",
        api_key=os.environ["OPENAI_API_KEY"],
        temperature=0.2
    )

    print("LLM loaded: GPT-4o via GitHub Models API")

elif LLM_MODE == "local":
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
    from langchain_community.llms import HuggingFacePipeline

    model_name = "google/flan-t5-base"

    tokenizer = AutoTokenizer.from_pretrained(model_name)

    model = AutoModelForSeq2SeqLM.from_pretrained(
        model_name,
        device_map="auto"
    )

    pipe = pipeline(
        "text2text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=1024,
        temperature=0.3
    )

    llm = HuggingFacePipeline(pipeline=pipe)

    print("LLM loaded: Flan-T5-base (local HuggingFace)")

Paste your GitHub PAT: ··········
LLM loaded: GPT-4o via GitHub Models API


## 3. Uploading Course Materials (PDF Files)

In this step, the course lecture slides and reference materials are uploaded to the Google Colab environment.  
These PDF files serve as the **knowledge source** for the Retrieval-Augmented Generation (RAG) system and will later be processed, indexed, and queried by the chatbot.


In [3]:
from google.colab import files

uploaded = files.upload()

Saving HTU - CPD - GenAI - Module2-A.pdf to HTU - CPD - GenAI - Module2-A.pdf
Saving HTU - CPD - GenAI - Module2-B.pdf to HTU - CPD - GenAI - Module2-B.pdf
Saving HTU - CPD - GenAI - Module3.pdf to HTU - CPD - GenAI - Module3.pdf
Saving HTU - CPD - GenAI - Module4.pdf to HTU - CPD - GenAI - Module4.pdf
Saving HTU - CPD - GenAI - Module5.pdf to HTU - CPD - GenAI - Module5.pdf
Saving HTU - CPD - GenAI - Module6.pdf to HTU - CPD - GenAI - Module6.pdf
Saving HTU - CPD - GenAI - Module6-B-RAG.pdf to HTU - CPD - GenAI - Module6-B-RAG.pdf
Saving HTU - CPD - GenAI - Module1.pdf to HTU - CPD - GenAI - Module1.pdf


## 4. Organizing Uploaded Files into a Data Directory

After uploading the PDF files, they are organized into a dedicated directory (`data/`).  
This step ensures a clean and structured project layout, making it easier to load, process, and manage the documents consistently throughout the pipeline.



In [4]:
import os
import shutil

os.makedirs("data", exist_ok=True)

for file in os.listdir():
    if file.endswith(".pdf"):
        shutil.move(file, "data/" + file)

print("Files inside data:", os.listdir("data"))

Files inside data: ['HTU - CPD - GenAI - Module2-A.pdf', 'HTU - CPD - GenAI - Module6.pdf', 'HTU - CPD - GenAI - Module3.pdf', 'HTU - CPD - GenAI - Module5.pdf', 'HTU - CPD - GenAI - Module1.pdf', 'HTU - CPD - GenAI - Module2-B.pdf', 'HTU - CPD - GenAI - Module6-B-RAG.pdf', 'HTU - CPD - GenAI - Module1 (1).pdf', 'HTU - CPD - GenAI - Module4.pdf']


## 5. Loading and Parsing PDF Documents

This step loads the uploaded PDF files from the data directory and extracts their textual content.  
Each PDF is processed page by page, converting the raw documents into structured text objects that can be further analyzed and indexed by the RAG pipeline.


In [5]:
import os
from langchain_community.document_loaders import PyPDFLoader

DATA_PATH = "data/"
documents = []

for file_name in os.listdir(DATA_PATH):
    if file_name.endswith(".pdf"):
        file_path = os.path.join(DATA_PATH, file_name)
        loader = PyPDFLoader(file_path)
        documents.extend(loader.load())

print(f"Loaded {len(documents)} pages from PDFs")



Loaded 514 pages from PDFs


## 6. Text Chunking for Efficient Retrieval

In this step, the extracted document text is divided into smaller overlapping chunks.  
Chunking improves retrieval accuracy by allowing the system to match user queries with the most relevant portions of the documents rather than entire pages.

The overlap between chunks helps preserve contextual continuity across adjacent text segments.



In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

chunks = text_splitter.split_documents(documents)

print(f"Created {len(chunks)} text chunks")

Created 640 text chunks


## 7. Creating Embeddings and Building the Vector Store

In this step, semantic embeddings are generated for each text chunk using a pre-trained sentence transformer model.  
These embeddings represent the meaning of the text in a numerical vector space, enabling effective similarity-based retrieval.

The vectors are then stored in a FAISS index, which allows fast and efficient retrieval of the most relevant document chunks in response to user queries.


In [7]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

vectorstore = FAISS.from_documents(
    chunks,
    embedding=embeddings
)

retriever = vectorstore.as_retriever()

  embeddings = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## 8. Prompt Engineering and Instruction Design

This step defines the prompt template that controls how the language model interprets and answers user questions.  
The prompt is carefully designed to encourage clear reasoning and explanatory responses while strictly limiting the model to the provided course context.

By specifying detailed instructions, the system ensures that answers remain accurate, grounded in the retrieved documents, and free from external or hallucinated information.


In [8]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([

    (
        "system",
        "You are a study assistant for a Generative AI course. "
        "Your task is to answer questions strictly using the provided course context.\n\n"
        "Instructions:\n"
        "- Carefully analyze the question, even if it is indirect, rephrased, or explanatory.\n"
        "- You are allowed to reason, explain, and infer logically, but ONLY using information from the context.\n"
        "- If the answer requires combining multiple parts of the context, do so clearly.\n"
        "- If the question cannot be fully answered using the context, say exactly:\n"
        "  'I don't know based on the course material.'\n"
        "- Do NOT use any external knowledge.\n"
        "- Do NOT leave the question unanswered.\n"
        "- Answer briefly and directly.\n"
        "- Do NOT provide detailed explanations unless the user explicitly asks for explanation, details, examples, or clarification.\n"
        "- If a short answer is sufficient, limit the response to 1–3 sentences.\n"
        "-if the question is hello, you can answer it: Hello, how can I help you with the course material?\n"
        "-if the question is thank you, you can answer You are welcome, have a nice day ^-^.\n"
        "- if the question is goodbye, you can answer it: goodbye, have a nice day.\n"
        "-if the question is exit, you can answer it: goodbye, have a nice day."
    ),

    (

        "human",
        "Context:\n{context}\n\nQuestion:\n{question}\n\n"
        "Please provide a clear, step-by-step explanation."
    )

])

## 9. Constructing the Retrieval-Augmented Generation (RAG) Pipeline

This step combines all previously defined components into a single Retrieval-Augmented Generation (RAG) pipeline.  
The pipeline retrieves the most relevant document chunks based on the user query, injects them into the prompt as context, and then generates a grounded response using the selected language model.

This modular chain ensures a clear separation between retrieval, prompting, and generation stages.


In [9]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {
        "context": retriever,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

## 10. Graphical User Interface (GUI) for the Study Assistant

This step extends the RAG-based study assistant with a simple and user-friendly graphical interface to enable real-time interaction with the chatbot. The interface allows users to enter questions through a text-based input and receive responses instantly, demonstrating the practical usability of the system.

The GUI is implemented as a custom web-based interface using HTML/CSS and Gradio, and it is connected directly to the same LangChain RAG backend without any modification to the retrieval or generation components. This ensures that all responses are generated using the existing RAG pipeline, preserving the system’s reliability while enhancing accessibility and user experience.

In [10]:
import gradio as gr

def respond(message, history):
    if message.strip() == "":
        return history, ""
    answer = rag_chain.invoke(message)
    history.append((message, answer))
    return history, ""

with gr.Blocks(
    title="Generative AI Chatbot Assistant",
    theme=gr.themes.Soft(),
    css="""
        body {
            background: linear-gradient(135deg, #eef2ff, #f4f7fb);
        }

        .platform-header {
            background: linear-gradient(90deg, #7f7cff, #a0a0ff); /* lighter gradient for logo */
            padding: 30px;
            border-radius: 14px;
            color: white;
            text-align: center;
            margin-bottom: 25px;
            box-shadow: 0 8px 20px rgba(0,0,0,0.12);
        }

        .platform-header h1 {
            margin-bottom: 6px;
            font-size: 32px;
        }

        .platform-header p {
            font-size: 15px;
            opacity: 0.95;
            max-width: 700px;
            margin: 0 auto;
        }

        .chat-card {
            background: white;
            border-radius: 14px;
            padding: 20px;
            box-shadow: 0 10px 30px rgba(0,0,0,0.08);
        }

        .footer-text {
            text-align: center;
            font-size: 12px;
            color: #6b7280;
            margin-top: 15px;
        }

        /* Send Button aligned with textbox */
        button.primary {
            background: linear-gradient(90deg, #7f7cff, #a0a0ff) !important;
            border: none !important;
            padding: 6px 16px !important;
            font-size: 13px !important;
            min-height: 36px !important;
            min-width: 100px !important;
        }
    """
) as demo:

    # Header with Logo
    with gr.Column(elem_classes="platform-header"):
        gr.Image(
            value="LogoGUI.png",
            height=100,
            show_label=False,
            show_download_button=False,
            container=False
        )

        gr.Markdown(
            """
            <h1>Generative AI Study Assistant</h1>
            <p>
                A professional retrieval-augmented learning platform that delivers
                accurate, context-aware answers from approved academic resources.
            </p>
            """
        )

    # Chat card
    with gr.Column(elem_classes="chat-card"):
        chatbot = gr.Chatbot(
            label="AI Assistant",
            height=300,
            show_copy_button=True
        )

        with gr.Row():
            user_input = gr.Textbox(
                placeholder="Ask a question about the course content...",
                label="Your Question",
                scale=4
            )
            send_btn = gr.Button(
                "Send",
                variant="primary",
                scale=0.1
            )

    gr.Markdown(
        """
        <div class="footer-text">
            Powered by Retrieval-Augmented Generation (RAG).
            Responses are generated exclusively from approved academic resources.
        </div>
        """
    )

    # Event handlers
    send_btn.click(
        fn=respond,
        inputs=[user_input, chatbot],
        outputs=[chatbot, user_input]
    )

    user_input.submit(
        fn=respond,
        inputs=[user_input, chatbot],
        outputs=[chatbot, user_input]
    )

demo.launch()


  with gr.Blocks(
  with gr.Blocks(
  chatbot = gr.Chatbot(
  chatbot = gr.Chatbot(
  chatbot = gr.Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://ad0b686119eeb3e1ca.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## 10. Interactive Chatbot Interface

In this final step, an interactive command-line chatbot is implemented.  
Users can dynamically input questions related to the course material, and the system responds in real time using the constructed RAG pipeline.

This interface demonstrates the practical application of the system as a study assistant rather than a static question-answering script.


In [None]:
print("Generative AI Study Chatbot")
print("Type your question below.")
print("Type 'exit' to stop.\n")

while True:
    user_question = input("Your question: ")

    if user_question.lower() == "exit":
        print("Chatbot session ended.")
        break

    answer = rag_chain.invoke(user_question)

    print("\nAnswer:")
    print(answer)
    print("\n" + "-" * 60 + "\n")


Generative AI Study Chatbot
Type your question below.
Type 'exit' to stop.



## Short Reflection

The RAG-based study assistant performed effectively in retrieving relevant course content and generating grounded, context-aware answers, with prompt engineering playing a key role in improving explanation quality and handling indirect or rephrased questions. One limitation encountered was the selection of local open-source language models: *flan-t5-small* produced fast but often imprecise responses, while *mistralai/Mistral-7B-Instruct-v0.2* delivered stronger reasoning at the cost of very slow inference. After experimentation, *flan-t5-base* provided the best balance between accuracy and response time for local execution. The use of Retrieval-Augmented Generation significantly improved answer reliability by grounding responses in the course materials rather than relying on the model’s parametric knowledge, which reduced hallucinations and increased the relevance and consistency of the generated answers.
