GitHub - Parrot90/RAG_AI_Agent_in_Python: RAG AI Agent in Python

RAG AI Agent in Python

Production-Ready Retrieval-Augmented Generation (RAG) AI Agent in Python This repository contains a production-oriented reference implementation of a Retrieval-Augmented Generation (RAG) AI agent built in Python. It's designed to serve as a practical starting point for building, deploying, and operating RAG-powered assistants that combine vector search over your knowledge sources with a large language model (LLM) for accurate, up-to-date responses.

Key goals:

Modular, maintainable architecture suitable for production.
Clear examples for ingesting data, building a vector store, and serving a queryable RAG agent.
Best-practice considerations for configuration, logging, testing, and deployment.

Table of contents

Features
Architecture overview
Quickstart
Installation
Configuration
Typical workflows
- Ingesting documents
- Building/using a vector store
- Querying the agent
Deployment
Testing & CI
Observability & Monitoring
Security & privacy
Contributing
License
Acknowledgements & References

Features

Document ingestion pipeline (file system, PDFs, text, HTML — extendable)
Embeddings + vector store (FAISS / Chroma / Milvus / Weaviate friendly)
LLM orchestration layer for RAG responses (plug any LLM: OpenAI, Hugging Face, local)
Conversation memory/session management
API server example (FastAPI) for production endpoints
Dockerfile and container-friendly configuration
Example scripts for common tasks (ingest, index, query)
Tests and example CI workflow

Architecture overview

Data ingestion: preprocess raw docs -> text chunks -> metadata
Embeddings: convert text chunks to vectors via an embeddings model
Vector store: store and query embeddings (approximate nearest neighbors)
RAG agent: retrieve context, construct prompts, call LLM, optionally perform chain-of-thought / tool use
API & orchestration: expose endpoints, session handling, rate-limiting
Observability: logging, request tracing, metrics for latency / error rates

Quickstart (fast path) Prerequisites

Python 3.10+
Git
Optional: Docker for containerized deployment
LLM/embedding credentials (e.g., OPENAI_API_KEY) or local model runtime

Clone and install

git clone https://github.com/Parrot90/Production_Ready_RAG_AI_Agent_in_Python.git
cd Production_Ready_RAG_AI_Agent_in_Python
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Ingest your documents (example)

python scripts/ingest.py --source ./data/docs --output ./data/chunks

Build or update vector index

python scripts/index.py --chunks ./data/chunks --store ./data/vectorstore

Run the API server

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Installation (detailed)

Create a virtual environment: python -m venv .venv source .venv/bin/activate
Install dependencies: pip install -r requirements.txt
Optional: install system packages required by some vector stores (e.g., FAISS).

Configuration The project uses environment variables and a small config loader (config.py). Typical config options:

OPENAI_API_KEY or other provider credentials
EMBEDDING_MODEL (provider/model for embeddings)
LLM_MODEL (model for generation)
VECTOR_STORE (faiss | chroma | milvus | weaviate)
PERSIST_DIRECTORY (where to persist vector stores / metadata)
MAX_CONTEXT_TOKENS, CHUNK_SIZE, CHUNK_OVERLAP, etc.

Typical workflows

Ingesting documents

Purpose: convert raw documents into cleaned text chunks with metadata (filename, source, modified time).
Steps:
- Detect file type (txt, md, pdf, html, docx)
- Extract text
- Split into chunks (size + overlap tuned for your LLM context window)
- Save chunk JSON/CSV ready for embedding

Building/using a vector store

Create embeddings for chunks using your chosen embedder
Save vectors & metadata to a vector index (FAISS/Chroma)
For production, consider a database-backed vector store (Milvus/Weaviate) for persistence, scaling, and multi-node support

Querying the agent (RAG flow)

Receive a user query via API
Embed the query and retrieve top-N relevant chunks
Construct a prompt combining retrieved context and a system/instruction template
Call the LLM for generation
Optionally store the conversational turn and any analytics/metrics

Example usage snippet (pseudo)

from rag_agent import RAGAgent

agent = RAGAgent(config)
response = agent.query("How do I rotate a vector in 3D?", session_id="user-123")
print(response.answer)

Deployment

Containerization: Dockerfile provided. Build and push image to your registry.
Orchestration: Kubernetes manifests are not included by default, but you can use a simple Deployment + HPA, Service, and Ingress.
Consider autoscaling, GPU-equipped nodes for local LLMs, or managed hosting for third-party LLM APIs.
Secrets management: use your cloud provider or HashiCorp Vault for keys.
Persistent volumes: for vector store persistence (if using self-hosted FAISS/Chroma).

Testing & CI

Unit tests: pytest
Integration tests: small sample document set + local vector store + mocked LLM responses
Example GitHub Actions workflow (ci.yml) may include:
- linting (flake8/ruff)
- unit tests
- build and push Docker image on release

Observability & Monitoring

Logging: structured JSON logs via Python logging; configurable level via LOG_LEVEL
Metrics: expose Prometheus metrics (request latency, error count, embedding/LLM call latencies)
Tracing: add OpenTelemetry for distributed tracing through ingestion -> retrieval -> LLM calls
Alerts: setup alerts for high error rate, high latency, low recall (if you have test queries)

Security & privacy

Do not log sensitive user data; sanitize before logging.
If storing user conversations, provide retention policy and opt-out mechanisms.
Secure APIs with authentication (API keys, OAuth).
Network: use TLS in production; limit access to vector store endpoints.
Rate limit LLM/API calls to avoid cost spikes.

Performance & scaling tips

Cache embeddings for repeated queries or common documents.
Use approximate nearest neighbor (ANN) index with HNSW or IVF for FAISS.
Use batching for embedding requests to reduce API overhead.
Use streaming responses (if your LLM supports) to reduce perceived latency.

Best practices for production

Keep ingestion reproducible: store hashes & source metadata so you can rebuild the index deterministically.
Version your vector store and model choices.
Have a small set of evaluation queries for continuous monitoring of retrieval quality.
Implement canary releases when changing LLMs or prompt templates.

Contributing Contributions welcome! Please follow these steps:

Fork the repo and create a branch: git checkout -b feat/my-feature
Write tests for any new behavior.
Run linting and tests: tox or pytest
Submit a pull request describing the change.

Code of conduct: Be respectful. See CODE_OF_CONDUCT.md (add if needed).

Recommended repository layout

app/ # API server and agent orchestration
scripts/ # Command-line ingestion / index / utils
libs/ # core RAG logic: retriever, prompt templates, llm wrappers
tests/ # unit and integration tests
Dockerfile
requirements.txt
README.md

Troubleshooting

If embeddings or LLM calls fail: check credentials and network access.
If retrieval quality is poor: tune chunk size/overlap, increase top_k, experiment with different embeddings.
If performance is slow: profile embedding calls and vector queries, consider batching and index tuning.

License This repository is provided under the MIT License. See LICENSE for details.

Acknowledgements & references

LangChain: Patterns and chain design influenced by the LangChain ecosystem
FAISS, Chroma, Milvus: popular vector stores
OpenAI / Hugging Face: LLM and embeddings providers

Contact Maintainer: Parrot90 For questions and help, open an issue in this repository.

If you’d like, I can:

Generate example scripts (ingest.py, index.py, app/main.py).
Create a Dockerfile, GitHub Actions CI, and a basic test suite.
Tailor the README with exact commands and config after I inspect the repository contents.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
qdrant_storage		qdrant_storage
uploads		uploads
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
custom_types.py		custom_types.py
data_loader.py		data_loader.py
main.py		main.py
pyproject.toml		pyproject.toml
streamlit_app.py		streamlit_app.py
uv.lock		uv.lock
vector_db.py		vector_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Parrot90/RAG_AI_Agent_in_Python

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages