# 04 - RAG Implementation (Knowledge Retrieval)

This notebook focuses on building and testing your Retrieval-Augmented Generation (RAG) system.

## Objective:
- Prepare text chunks from your extracted documents (`data/processed/extracted_raw_documents.jsonl`).
- Embed these chunks using a Sentence Transformer model.
- Index the embeddings and chunks into ChromaDB.
- Test the retrieval mechanism with sample queries.

## Instructions:
1.  **Ensure extracted data is ready:** Make sure `data/processed/extracted_raw_documents.jsonl` exists.
2.  **Run `prepare_rag_chunks.py`:**
    ```bash
    python scripts/prepare_rag_chunks.py
    ```
3.  **Run `build_vector_db.py`:**
    ```bash
    python scripts/build_vector_db.py
    ```
4.  **Test retrieval:** Use the `scripts/rag_inference.py` script or implement a simple retrieval test here.

## Code (Example of what you'd put here):

```python
# import sys
# sys.path.append('scripts')
# from prepare_rag_chunks import prepare_chunks_for_embedding
# from build_vector_db import build_vector_database
# from rag_inference import run_rag_inference

# # Step 1: Prepare RAG Chunks
# print("Preparing RAG chunks...")
# prepared_chunks = prepare_chunks_for_embedding()
# if prepared_chunks:
#     print(f"Sample chunk: {prepared_chunks[0]['content'][:200]}...")
# else:
#     print("No chunks prepared.")

# # Step 2: Build Vector Database
# print("\nBuilding vector database...")
# build_vector_database()

# # Step 3: Test RAG Inference
# print("\nTesting RAG inference...")
# query = "What is the Fibonacci sequence?"
# run_rag_inference(query, n_results=3)

# query_code = "How does the calculate_fibonacci function work?"
# run_rag_inference(query_code, n_results=3)
