A comprehensive collection of Retrieval-Augmented Generation (RAG) implementations using different methods, embedding models, and data sources.
This repository contains various RAG implementation examples that demonstrate different approaches to building question-answering systems with document retrieval. Each example showcases a different combination of:
- Embedding Models: Google Gemini, HuggingFace, OpenAI
- Data Sources: PDFs, Web pages, Local documents
- Vector Stores: In-memory vector stores
- LLM Models: Google Gemini, OpenAI GPT
free-RAG/
├── app/
│ ├── gemini-localembed.py # RAG with Gemini embeddings + local docs
│ ├── gemini+huggingapi.py # RAG with Gemini + HuggingFace embeddings
│ ├── gemini+websedeta.py # RAG with web data using Gemini
│ └── normal-openai.py # RAG implementation with OpenAI
├── retreival.py # Main retrieval example with PDF
├── indexing.py # Document indexing utilities
├── AI_rec.pdf # Sample PDF document
├── example.env # Environment variables template
├── pyproject.toml # Project dependencies
└── README.md # This file
- Python 3.13+
- UV package manager
-
Clone the repository:
git clone <your-repo-url> cd free-RAG
-
Install dependencies using UV:
uv sync
-
Set up environment variables:
cp example.env .env
Edit
.envand add your API keys:GOOGLE_API_KEY="your-google-api-key" HUGGINGFACEHUB_API_TOKEN="your-huggingface-token" OPENAI_API_KEY="your-openai-key"
- Description: Basic RAG implementation using PDF documents
- Embedding: HuggingFace Sentence Transformers
- Model: Supports both Gemini and local embeddings
- Usage:
uv run python retreival.py
- Description: RAG system that loads data from multiple web pages
- Data Source: Multiple documentation URLs
- Embedding: Google Gemini embeddings
- Usage:
uv run python app/gemini+websedeta.py
- Description: Hybrid approach using both Gemini and HuggingFace
- Features: Combines different embedding approaches
- Usage:
uv run python app/gemini+huggingapi.py
- Description: RAG with local Gemini embeddings
- Features: Optimized for local deployment
- Usage:
uv run python app/gemini-localembed.py
- Description: Traditional RAG implementation using OpenAI
- Embedding: OpenAI text embeddings
- Model: GPT models for generation
- Usage:
uv run python app/normal-openai.py
The project uses the following key dependencies:
langchain-community- Document loaders and utilitieslangchain-core- Core LangChain functionalitylangchain-google-genai- Google Gemini integrationlangchain-huggingface- HuggingFace integrationlangchain-text-splitters- Text chunking utilitiessentence-transformers- Local embedding modelspypdf- PDF processingpython-dotenv- Environment variable management
- Multiple Embedding Options: Compare different embedding models
- Flexible Data Sources: Support for PDFs, web pages, and local documents
- Vector Store Integration: In-memory vector storage for fast retrieval
- Interactive Querying: Command-line interfaces for testing
- Environment Configuration: Easy setup with environment variables
# Load your documents
loader = PyPDFLoader("your_document.pdf")
docs = loader.load()
# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(documents=docs)
# Query the system
query = "Your question here"
results = vector_store.similarity_search(query)from langchain_community.document_loaders import WebBaseLoader
# Load multiple web pages
loader = WebBaseLoader([
"https://example.com/page1",
"https://example.com/page2"
])
docs = loader.load()- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source and available under the MIT License.
- Make sure to set up your API keys before running the examples
- Some examples require internet connectivity for web scraping
- The
USER_AGENTenvironment variable is recommended for web scraping - Each example is designed to be self-contained and runnable independently
If you encounter any issues or have questions, please:
- Check the existing issues in the repository
- Create a new issue with a detailed description
- Include error messages and environment information
Happy coding! 🎉