# RAGKit Demo

This notebook shows how to use RAGKit to build a simple document Q&A system.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IIIDman/ragkit/blob/main/examples/RAGKit_Demo.ipynb)

## Installation

In [None]:
!pip install -q ragkit pypdf

## Getting started

The simplest way to use RAGKit:

In [None]:
from ragkit import RAGKit

# This downloads the embedding model on first run (~90MB)
rag = RAGKit()
print(f"Initialized: {rag}")

## Adding content

Let's add some text to query against:

In [None]:
content = """
Retrieval-Augmented Generation (RAG) Explained

RAG is a technique that combines retrieval-based and generation-based approaches
for natural language processing tasks. It was introduced by Facebook AI Research
in their 2020 paper.

How RAG Works:
1. Document Indexing: Documents are split into chunks and converted to vector embeddings
2. Query Processing: User queries are also converted to embeddings
3. Retrieval: The most similar document chunks are retrieved using vector similarity
4. Generation: An LLM generates an answer using the retrieved context

Benefits of RAG:
- Access to up-to-date information beyond the LLM's training data
- Reduced hallucination by grounding responses in actual documents
- Ability to cite sources for generated answers
- Cost-effective alternative to fine-tuning

Common Use Cases:
- Question answering over documents
- Customer support chatbots
- Research assistants
- Code documentation search
"""

num_chunks = rag.add_text(content, metadata={"source": "rag_explainer.txt"})
print(f"Added {num_chunks} chunks")

## Querying

In [None]:
answer = rag.query("What are the benefits of RAG?")

print("Q: What are the benefits of RAG?")
print(f"\nA: {answer.text}")
print(f"\nSources: {answer.sources}")

In [None]:
answer = rag.query("How does the retrieval step work?")

print("Q: How does the retrieval step work?")
print(f"\nA: {answer.text}")

## Searching without generation

You can also just search for relevant chunks without generating an answer:

In [None]:
chunks = rag.search("use cases", top_k=2)

print("Search results for 'use cases':")
for i, chunk in enumerate(chunks, 1):
    print(f"\n--- Result {i} ---")
    print(chunk.content[:200] + "...")

## Working with PDFs

Adding a PDF is straightforward:

In [None]:
# If you're in Colab, you can upload a file:
# from google.colab import files
# uploaded = files.upload()
# pdf_path = list(uploaded.keys())[0]
# rag.add_document(pdf_path)

print("To add a PDF: rag.add_document('your_file.pdf')")

## Saving your index

You can save the index to disk and load it later:

In [None]:
# Save
rag.save("my_ragkit_index")

# Load later with:
# rag = RAGKit.load("my_ragkit_index")

## Summary

RAGKit handles:
- Loading documents (PDF, text, markdown)
- Creating embeddings
- Vector search
- Answer generation with sources
- Saving/loading indexes

All running locally.