A modular Retrieval-Augmented Generation (RAG) pipeline for document search, chunking, embedding, and conversational AI.
- Document Ingestion: Flexible ingestion for markdown, text, and more.
- Chunking: Smart document chunking for optimal retrieval.
- Embeddings: Integrates with state-of-the-art embedding models.
- Vector Search: Fast, scalable semantic search using ChromaDB.
- Conversational Engine: Chat interface for natural language queries.
- Extensible: Modular core for easy customization and extension.
ai-tech-rag/
├── assets/ # Visuals, custom commands, and static assets
├── chroma_db/ # ChromaDB vector database files
├── qwerty/ # Python virtual environment
├── rag-assistant/ # Main app and core modules
│ ├── app.py # Flask app entry point
│ ├── core/ # Core RAG logic (chat, vector, document)
│ ├── static/ # Frontend static files (CSS, JS)
│ └── templates/ # Jinja2 HTML templates
├── techcorp-docs/ # Example documents for ingestion
├── test_*.py # Unit and integration tests
├── requirements.txt # Python dependencies
└── README # This file
# 1. Clone the repo
git clone https://github.com/MansurPro/RAG-assistant.git
cd ai-tech-rag
# 2. Create and activate a virtual environment (optional)
python3 -m venv qwerty
source qwerty/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the app
python rag-assistant/app.py- Ingest Documents:
- Place your files in
techcorp-docs/or useingest_documents.py.
- Place your files in
- Search & Chat:
- Use the web UI at http://localhost:5000 to chat and search.
- Test the Pipeline:
- Run
pytestor use the provided test scripts.
- Run
core/document_processor.py– Document loading, chunking, and preprocessingcore/vector_engine.py– Embedding and vector search logiccore/chat_engine.py– Conversational AI and retrieval logicapp.py– Flask app and API endpoints
pytest
# or run individual test files, e.g.
python test_chunking.pyPull requests and issues are welcome! For major changes, please open an issue first to discuss what you would like to change.
MIT License. See LICENSE for details.

