# AI-Powered Academic RAG Study Assistant
## Experiment Results Document
### Subject: Database Management Systems (DBMS)

This document summarizes all experiments conducted during the development of the RAG-based study assistant. Each experiment includes setup details, results, observations, and final decision rationale.

------------------------------------------------------------

# Experiment 1: Chunking Strategies Comparison

## Objective
To evaluate how different chunking strategies affect retrieval quality and answer generation.

## Strategies Compared
1. Fixed-size chunking (500 characters, 100 overlap)
2. Sentence-based chunking (split using sentence boundaries)

## Evaluation Metrics
- Relevance (1–5)
- Correctness (1–5)
- Completeness (1–5)

## Summary Results

| Strategy            | Avg Relevance | Avg Correctness | Avg Completeness |
|--------------------|---------------|-----------------|------------------|
| Fixed Chunking     | 3.8           | 3.6             | 3.5              |
| Sentence Chunking  | 4.2           | 3.9             | 3.8              |

## Observations

- Sentence chunking preserved semantic boundaries better.
- Fixed chunking sometimes cut definitions mid-sentence.
- Sentence chunks produced more complete definitions (e.g., Normalization).
- For structure-heavy topics like B-tree, both struggled slightly.

## Trade-Off

- Fixed chunking is simple and fast.
- Sentence chunking improves semantic clarity but produces variable chunk sizes.

## Final Decision

Sentence-based chunking was selected for the final production system due to better semantic alignment and improved answer completeness.

------------------------------------------------------------

# Experiment 2: Prompt Engineering Comparison

## Objective
To evaluate whether improved prompting improves generation quality.

## Prompts Compared

### Basic Prompt
"Answer the question using only the context below."

### Improved Structured Prompt
- Explicit instructions
- 3–6 sentence constraint
- Clear instruction to avoid outside knowledge

## Summary Results

| Prompt Type     | Avg Relevance | Avg Correctness | Avg Completeness |
|----------------|---------------|-----------------|------------------|
| Basic Prompt   | 3.4           | 3.2             | 2.9              |
| Improved Prompt| 4.0           | 3.8             | 3.6              |

## Observations

- Basic prompt produced very short answers.
- Improved prompt generated more structured responses.
- Improved prompt reduced hallucination.
- ACID and 2PL answers improved significantly.

## Trade-Off

- More detailed prompts increase token usage.
- Slightly longer generation time.

## Final Decision

Improved structured prompting was selected for production because it produced more complete and reliable answers.

------------------------------------------------------------

# Experiment 3: Retrieval Strategy (Top-k Comparison)

## Objective
To determine optimal number of retrieved chunks.

## Configurations Tested
- Top-k = 3
- Top-k = 5
- Top-k = 8

## Summary Results

| Top-k | Avg Relevance | Noise Level | Answer Quality |
|-------|---------------|------------|----------------|
| 3     | 4.1           | Low        | High           |
| 5     | 3.5           | Medium     | Moderate       |
| 8     | 3.2           | High       | Degraded       |

## Observations

- Top-k=3 produced focused answers.
- Higher k introduced noisy chunks.
- Larger k increased irrelevant context.
- B-tree and DELETE vs TRUNCATE answers degraded with k > 5.

## Trade-Off

- Small k may miss edge-case context.
- Large k increases noise and token usage.

## Final Decision

Top-k = 3 selected for final system due to best balance between precision and context coverage.

------------------------------------------------------------

# Preprocessing Impact Analysis

## Noise Handling Results

After implementing text cleaning:

- Page references reduced significantly.
- Repeated headers removed.
- Retrieval stability improved.
- Average relevance increased by approximately 0.4.

## Conclusion on Cleaning

Simple regex-based cleaning significantly improved chunk quality and embedding efficiency.

------------------------------------------------------------

# Final Production Configuration

Based on experiments:

- Sentence-based chunking
- Improved structured prompt
- Top-k = 3
- Text preprocessing enabled
- Open-source transformer model

------------------------------------------------------------

# Overall Key Findings

1. Chunking strategy has major impact on semantic accuracy.
2. Prompt engineering significantly affects completeness.
3. Increasing retrieval depth does not always improve performance.
4. Preprocessing plays a critical role in RAG stability.

------------------------------------------------------------

# Recommendation for Production Use

If deployed as a real academic assistant:

- Use layout-aware PDF parsing.
- Implement hybrid search (semantic + keyword).
- Add reranking model.
- Use stronger LLM (Mistral / Llama 3 / GPT-4) if budget allows.

------------------------------------------------------------

# Final Conclusion

Through systematic experimentation and evaluation, the RAG system was improved step-by-step using data-driven decisions.

Each architectural choice in the final system was justified through measurable comparison rather than assumption.

This experiment demonstrates practical understanding of:
- Retrieval-Augmented Generation
- Prompt Engineering
- Vector Search Optimization
- Real-world document preprocessing challenges
- ML evaluation lifecycle