Skip to content

Parrot90/RAG_AI_Agent_in_Python

Repository files navigation

RAG AI Agent in Python

Production-Ready Retrieval-Augmented Generation (RAG) AI Agent in Python This repository contains a production-oriented reference implementation of a Retrieval-Augmented Generation (RAG) AI agent built in Python. It's designed to serve as a practical starting point for building, deploying, and operating RAG-powered assistants that combine vector search over your knowledge sources with a large language model (LLM) for accurate, up-to-date responses.

Key goals:

  • Modular, maintainable architecture suitable for production.
  • Clear examples for ingesting data, building a vector store, and serving a queryable RAG agent.
  • Best-practice considerations for configuration, logging, testing, and deployment.

Table of contents

  • Features
  • Architecture overview
  • Quickstart
  • Installation
  • Configuration
  • Typical workflows
    • Ingesting documents
    • Building/using a vector store
    • Querying the agent
  • Deployment
  • Testing & CI
  • Observability & Monitoring
  • Security & privacy
  • Contributing
  • License
  • Acknowledgements & References

Features

  • Document ingestion pipeline (file system, PDFs, text, HTML — extendable)
  • Embeddings + vector store (FAISS / Chroma / Milvus / Weaviate friendly)
  • LLM orchestration layer for RAG responses (plug any LLM: OpenAI, Hugging Face, local)
  • Conversation memory/session management
  • API server example (FastAPI) for production endpoints
  • Dockerfile and container-friendly configuration
  • Example scripts for common tasks (ingest, index, query)
  • Tests and example CI workflow

Architecture overview

  1. Data ingestion: preprocess raw docs -> text chunks -> metadata
  2. Embeddings: convert text chunks to vectors via an embeddings model
  3. Vector store: store and query embeddings (approximate nearest neighbors)
  4. RAG agent: retrieve context, construct prompts, call LLM, optionally perform chain-of-thought / tool use
  5. API & orchestration: expose endpoints, session handling, rate-limiting
  6. Observability: logging, request tracing, metrics for latency / error rates

Quickstart (fast path) Prerequisites

  • Python 3.10+
  • Git
  • Optional: Docker for containerized deployment
  • LLM/embedding credentials (e.g., OPENAI_API_KEY) or local model runtime

Clone and install

git clone https://github.com/Parrot90/Production_Ready_RAG_AI_Agent_in_Python.git
cd Production_Ready_RAG_AI_Agent_in_Python
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Ingest your documents (example)

python scripts/ingest.py --source ./data/docs --output ./data/chunks

Build or update vector index

python scripts/index.py --chunks ./data/chunks --store ./data/vectorstore

Run the API server

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Installation (detailed)

  1. Create a virtual environment: python -m venv .venv source .venv/bin/activate
  2. Install dependencies: pip install -r requirements.txt
  3. Optional: install system packages required by some vector stores (e.g., FAISS).

Configuration The project uses environment variables and a small config loader (config.py). Typical config options:

  • OPENAI_API_KEY or other provider credentials
  • EMBEDDING_MODEL (provider/model for embeddings)
  • LLM_MODEL (model for generation)
  • VECTOR_STORE (faiss | chroma | milvus | weaviate)
  • PERSIST_DIRECTORY (where to persist vector stores / metadata)
  • MAX_CONTEXT_TOKENS, CHUNK_SIZE, CHUNK_OVERLAP, etc.

Typical workflows

  1. Ingesting documents
  • Purpose: convert raw documents into cleaned text chunks with metadata (filename, source, modified time).
  • Steps:
    • Detect file type (txt, md, pdf, html, docx)
    • Extract text
    • Split into chunks (size + overlap tuned for your LLM context window)
    • Save chunk JSON/CSV ready for embedding
  1. Building/using a vector store
  • Create embeddings for chunks using your chosen embedder
  • Save vectors & metadata to a vector index (FAISS/Chroma)
  • For production, consider a database-backed vector store (Milvus/Weaviate) for persistence, scaling, and multi-node support
  1. Querying the agent (RAG flow)
  • Receive a user query via API
  • Embed the query and retrieve top-N relevant chunks
  • Construct a prompt combining retrieved context and a system/instruction template
  • Call the LLM for generation
  • Optionally store the conversational turn and any analytics/metrics

Example usage snippet (pseudo)

from rag_agent import RAGAgent

agent = RAGAgent(config)
response = agent.query("How do I rotate a vector in 3D?", session_id="user-123")
print(response.answer)

Deployment

  • Containerization: Dockerfile provided. Build and push image to your registry.
  • Orchestration: Kubernetes manifests are not included by default, but you can use a simple Deployment + HPA, Service, and Ingress.
  • Consider autoscaling, GPU-equipped nodes for local LLMs, or managed hosting for third-party LLM APIs.
  • Secrets management: use your cloud provider or HashiCorp Vault for keys.
  • Persistent volumes: for vector store persistence (if using self-hosted FAISS/Chroma).

Testing & CI

  • Unit tests: pytest
  • Integration tests: small sample document set + local vector store + mocked LLM responses
  • Example GitHub Actions workflow (ci.yml) may include:
    • linting (flake8/ruff)
    • unit tests
    • build and push Docker image on release

Observability & Monitoring

  • Logging: structured JSON logs via Python logging; configurable level via LOG_LEVEL
  • Metrics: expose Prometheus metrics (request latency, error count, embedding/LLM call latencies)
  • Tracing: add OpenTelemetry for distributed tracing through ingestion -> retrieval -> LLM calls
  • Alerts: setup alerts for high error rate, high latency, low recall (if you have test queries)

Security & privacy

  • Do not log sensitive user data; sanitize before logging.
  • If storing user conversations, provide retention policy and opt-out mechanisms.
  • Secure APIs with authentication (API keys, OAuth).
  • Network: use TLS in production; limit access to vector store endpoints.
  • Rate limit LLM/API calls to avoid cost spikes.

Performance & scaling tips

  • Cache embeddings for repeated queries or common documents.
  • Use approximate nearest neighbor (ANN) index with HNSW or IVF for FAISS.
  • Use batching for embedding requests to reduce API overhead.
  • Use streaming responses (if your LLM supports) to reduce perceived latency.

Best practices for production

  • Keep ingestion reproducible: store hashes & source metadata so you can rebuild the index deterministically.
  • Version your vector store and model choices.
  • Have a small set of evaluation queries for continuous monitoring of retrieval quality.
  • Implement canary releases when changing LLMs or prompt templates.

Contributing Contributions welcome! Please follow these steps:

  1. Fork the repo and create a branch: git checkout -b feat/my-feature
  2. Write tests for any new behavior.
  3. Run linting and tests: tox or pytest
  4. Submit a pull request describing the change.

Code of conduct: Be respectful. See CODE_OF_CONDUCT.md (add if needed).

Recommended repository layout

  • app/ # API server and agent orchestration
  • scripts/ # Command-line ingestion / index / utils
  • libs/ # core RAG logic: retriever, prompt templates, llm wrappers
  • tests/ # unit and integration tests
  • Dockerfile
  • requirements.txt
  • README.md

Troubleshooting

  • If embeddings or LLM calls fail: check credentials and network access.
  • If retrieval quality is poor: tune chunk size/overlap, increase top_k, experiment with different embeddings.
  • If performance is slow: profile embedding calls and vector queries, consider batching and index tuning.

License This repository is provided under the MIT License. See LICENSE for details.

Acknowledgements & references

  • LangChain: Patterns and chain design influenced by the LangChain ecosystem
  • FAISS, Chroma, Milvus: popular vector stores
  • OpenAI / Hugging Face: LLM and embeddings providers

Contact Maintainer: Parrot90 For questions and help, open an issue in this repository.


If you’d like, I can:

  • Generate example scripts (ingest.py, index.py, app/main.py).
  • Create a Dockerfile, GitHub Actions CI, and a basic test suite.
  • Tailor the README with exact commands and config after I inspect the repository contents.

About

RAG AI Agent in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages