askme

Production-ready hybrid RAG (Retrieval-Augmented Generation) system with intelligent reranking, multi-backend vector database support, and comprehensive evaluation framework.

Features

Hybrid Search: Combines BM25/sparse and dense vector retrieval with configurable fusion methods (Alpha, RRF, relative scoring)
Intelligent Reranking: Local BGE-reranker-v2.5 with Cohere Rerank 3.5 fallback for optimal relevance scoring
Query Enhancement: HyDE and RAG-Fusion techniques for improved recall and comprehensive coverage
Multi-Backend Support: Weaviate (primary), Milvus 2.5+ (with sparse BM25), and Qdrant vector databases
Comprehensive Evaluation: TruLens RAG Triad, Ragas v0.2+, offline local LLM judges, and embedding similarity metrics with A/B testing capabilities
Production Ready: FastAPI service with health checks, Docker deployment, monitoring, and extensive configuration

Architecture

Documents → Ingest → Vector DB (Hybrid Index) → Query → Retrieve (topK=50) →
Rerank (topN=8) → LLM Generate → Answer with Citations → Evaluate

Core Technologies

Embeddings: BGE-M3 multilingual model (dense + sparse support)
Vector Database: Weaviate (primary), Milvus 2.5+/Qdrant alternatives with hybrid search support
Reranking: BAAI/bge-reranker-v2-m3 cross-encoder (local), Cohere Rerank 3.5 (cloud fallback)
Framework: FastAPI with Python 3.10+, uvicorn ASGI server
Evaluation: TruLens + Ragas with local embedding metrics, configurable LLM judges, and automated quality thresholds
Generation: OpenAI-compatible, local Ollama, or template-based approaches

Quick Start

Prerequisites

Python 3.10 or higher
uv package manager (recommended) or pip
Docker and Docker Compose (for vector database and full deployment)

Installation

# Clone the repository
git clone https://github.com/deadjoe/askme.git
cd askme

# Install dependencies
uv sync --dev

# Start vector database (Weaviate)
docker compose -f docker/docker-compose.yaml --profile weaviate up -d weaviate

# Start API server (development mode)
ASKME_SKIP_HEAVY_INIT=1 uv run uvicorn askme.api.main:app --port 8080 --reload

Basic Usage

# Ingest documents
./scripts/ingest.sh /path/to/documents --tags="project,documentation"

# Ask questions
./scripts/answer.sh "What is machine learning?"

# Retrieve documents only (for debugging/tuning)
./scripts/retrieve.sh "hybrid search techniques" --topk=100 --alpha=0.7

# Quick end-to-end smoke test (requires curl + jq)
./query_test.sh "Who created BM25?"

# Run evaluation
./scripts/evaluate.sh --suite=baseline

Query Script Parameters

The ./scripts/answer.sh script supports comprehensive parameter tuning for optimal results:

Parameter	Default	Range/Options	Description	Impact
`--topk=N`	100	1-100	Number of initial retrieval candidates	Higher = better recall, slower response
`--alpha=X`	0.5	0.0-1.0	Hybrid search weight (0=sparse, 1=dense)	0.0=keyword matching, 1.0=semantic similarity
`--rrf` / `--no-rrf`	`--rrf`	boolean	Use RRF vs alpha fusion	RRF=stable ranking, alpha=direct weighting
`--rrf-k=N`	60	1-200	RRF fusion smoothing parameter	Lower=aggressive reranking, higher=conservative
`--reranker=TYPE`	`bge_local`	`bge_local`, `cohere`	Reranking model selection	Local=private, Cohere=higher quality
`--max-passages=N`	8	1-20	Final passages for LLM generation	More=richer context, risk of attention dilution
`--hyde`	disabled	boolean	Enable HyDE query expansion	Better for abstract/conceptual queries
`--rag-fusion`	disabled	boolean	Multi-query generation and fusion	Better coverage for complex questions
`--debug`	disabled	boolean	Include retrieval debug information	Shows timing and score details
`--format=FORMAT`	`text`	`text`, `json`, `markdown`	Output format selection	Choose based on consumption needs
`--verbose`	disabled	boolean	Enable verbose logging	Detailed execution information
`--api-url=URL`	`localhost:8080`	URL	Target API base URL	Override for remote deployments

Query Examples

# Dense semantic search for conceptual queries
./scripts/answer.sh "Explain machine learning principles" --alpha=0.8 --max-passages=12

# Sparse keyword search for specific terms
./scripts/answer.sh "BGE-M3 model architecture" --alpha=0.2 --topk=80

# Enhanced query with expansion techniques
./scripts/answer.sh "How does attention mechanism work?" --hyde --rag-fusion --format=markdown

# Debug mode with detailed retrieval information
./scripts/answer.sh "Vector database comparison" --debug --verbose --format=json

# High-quality reranking with cloud fallback
./scripts/answer.sh "Production RAG best practices" --reranker=cohere --max-passages=10

Configuration

Configuration is managed through configs/askme.yaml with environment variable overrides. Key parameters:

# Vector backend selection
vector_backend: weaviate  # weaviate | milvus | qdrant

# Hybrid search configuration
hybrid:
  mode: rrf             # rrf | alpha | relative_score | ranked
  alpha: 0.5           # 0=sparse only, 1=dense only, 0.5=balanced
  rrf_k: 60            # RRF fusion parameter
  topk: 50             # Initial retrieval candidates

# Embedding model
embedding:
  model: BAAI/bge-m3
  dimension: 1024
  normalize_embeddings: true

# Reranking
rerank:
  local_model: BAAI/bge-reranker-v2-m3
  local_enabled: true
  cohere_enabled: false  # Enable via ASKME_ENABLE_COHERE=1
  top_n: 8

# Generation
generation:
  provider: ollama       # simple | ollama | openai
  ollama_model: gpt-oss:20b
  ollama_endpoint: http://localhost:11434
  openai_model: gpt-4o-mini

Environment Variables

ASKME_SKIP_HEAVY_INIT=1 - Skip heavy service initialization during development
ASKME_API_URL / ASKME_API_KEY - Target API base URL and optional auth for CLI scripts
ASKME_ENABLE_COHERE=1 + COHERE_API_KEY - Enable Cohere reranking
ASKME_ENABLE_OLLAMA=1 - Enable local Ollama generation
ASKME_RAGAS_LLM_MODEL - Override local LLM judge used for evaluations (default gpt-oss:20b)
ASKME_RAGAS_EMBED_MODEL - Override embedding model for Ragas metrics (default BAAI/bge-m3)
ASKME_TRULENS_LLM_MODEL - Override TruLens evaluation model (falls back to ASKME_RAGAS_LLM_MODEL)
OPENAI_BASE_URL / OPENAI_API_KEY - OpenAI-compatible endpoints for evaluation

API Reference

Health Endpoints

GET /health/ - Basic health check
GET /health/ready - Readiness check for orchestration
GET /health/live - Liveness check for orchestration

Document Ingestion

POST /ingest/ - Universal document ingestion (file/directory)
POST /ingest/file - Single file ingestion
POST /ingest/directory - Directory ingestion with recursion
GET /ingest/status/{task_id} - Task status monitoring
GET /ingest/stats - Global ingestion statistics

Query & Retrieval

POST /query/ - Hybrid search + reranking + generation pipeline
POST /query/retrieve - Retrieval-only endpoint for debugging
GET /query/similar/{doc_id} - Similar document discovery
POST /query/explain - Retrieval explanation (debugging)

Evaluation

POST /eval/run - Execute evaluation pipeline with TruLens + Ragas
GET /eval/runs/{run_id} - Retrieve evaluation results
POST /eval/compare - A/B test comparison between runs
GET /eval/runs - List recent evaluation runs
DELETE /eval/runs/{run_id} - Delete evaluation run
GET /eval/metrics - Available evaluation metrics

Evaluation Toolkit

./scripts/evaluate.sh provides a unified CLI for starting suites, overriding retrieval parameters (--alpha, --topk, --topn), and choosing output formats (text, json, table).
Embedding similarity metrics run locally when embedding_service is available, adding groundedness, context precision/recall, and answer relevance without leaving the host.
Local LLM judge metrics (faithfulness, answer relevancy, context precision/recall) default to Ollama via the OpenAI-compatible API; override with ASKME_RAGAS_LLM_MODEL, OPENAI_BASE_URL, or standard OpenAI keys as needed.
TruLens metrics automatically fall back to LiteLLM/OpenAI providers and respect ASKME_TRULENS_LLM_MODEL, enabling fully offline evaluation when paired with Ollama.
Use ASKME_RAGAS_EMBED_MODEL to swap in alternative embedding backends for evaluation-only workloads without touching production retrieval settings.

Example Query Request

curl -X POST "http://localhost:8080/query/" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "What is machine learning?",
    "topk": 50,
    "alpha": 0.5,
    "use_rrf": true,
    "reranker": "bge_local",
    "max_passages": 8,
    "include_debug": true
  }'

Deployment

Docker Deployment

# Full stack with Milvus (recommended)
docker compose -f docker/docker-compose.yaml up -d

# Alternative vector databases
docker compose -f docker/docker-compose.yaml --profile weaviate up -d
docker compose -f docker/docker-compose.yaml --profile qdrant up -d

# With monitoring
docker compose -f docker/docker-compose.yaml --profile monitoring up -d

Production Considerations

Performance Targets: P95 < 1500ms retrieval, < 1800ms with reranking
Scaling: ~50k documents per node, horizontal scaling via load balancing
Security: Local-only by default, cloud services require explicit opt-in
Monitoring: Prometheus metrics and Grafana dashboards included

Development

Setup Development Environment

# Install development dependencies
uv sync --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests with coverage
uv run pytest --cov=askme --cov-report=term --cov-report=html

# Code formatting and type checking
uv run black askme tests && uv run isort askme tests
uv run mypy askme

Testing

The project includes comprehensive test coverage with pytest:

# Run all tests
uv run pytest -ra

# Run specific test categories
uv run pytest -m "unit and not slow"
uv run pytest -m integration
uv run pytest -m "slow"

# Run with coverage reporting
uv run pytest --cov=askme --cov-report=html

Test Markers:

unit: Unit tests for individual components
integration: Cross-component integration tests
slow: Time-intensive tests (model loading, evaluation)

Code Quality

The project maintains high code quality standards:

Formatting: Black and isort for consistent code style
Type Checking: MyPy with strict configuration
Linting: Flake8 with Black-compatible settings
Security: Bandit security analysis
Pre-commit Hooks: Automated quality checks on commit

Evaluation

Built-in Evaluation Suites

# Comprehensive baseline evaluation
./scripts/evaluate.sh --suite=baseline

# Quick evaluation for CI/CD
./scripts/evaluate.sh --suite=quick --sample-size=3

# Custom dataset evaluation
./scripts/evaluate.sh --dataset="/path/to/qa_dataset.jsonl" --metrics="faithfulness,context_precision"

# Parameter tuning evaluation
./scripts/evaluate.sh --suite=baseline --alpha=0.3 --topk=75 --topn=10

Available Metrics

TruLens RAG Triad:

Context Relevance: How relevant retrieved context is to the query
Groundedness: How well the answer is supported by retrieved context
Answer Relevance: How relevant the answer is to the original query

Ragas Metrics:

Faithfulness: Factual consistency of answer with retrieved context
Answer Relevancy: Semantic relevance of answer to query
Context Precision: Precision of retrieved context chunks
Context Recall: Recall of relevant context chunks

Quality Thresholds

TruLens Triad: ≥ 0.7 (configurable)
Ragas Faithfulness: ≥ 0.7 (configurable)
Context Precision: ≥ 0.6 (configurable)
Answer Consistency: ≥ 90% with fixed seeds

Contributing

We welcome contributions to the askme project! Please follow these guidelines:

Development Workflow

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes following the code style guidelines
Add or update tests for your changes
Ensure all tests pass and code quality checks succeed
Update documentation as needed
Submit a pull request with a clear description

Code Style Guidelines

Language: All code, documentation, and commit messages must be in English
Formatting: Use Black (line length 88) and isort for imports
Type Hints: All functions must include proper type annotations
Documentation: Follow Google-style docstrings
Testing: Maintain or improve test coverage (currently ~88%)

Quality Gates

Before submitting a PR, ensure:

# All tests pass
uv run pytest

# Code formatting
uv run black askme tests && uv run isort askme tests

# Type checking
uv run mypy askme

# Security check
uv run bandit -r askme

# Basic evaluation passes
./scripts/evaluate.sh --suite=quick

Troubleshooting

Common Issues

Milvus Container Startup Issues
- If Milvus fails to start with port binding errors, try using Weaviate instead:
```
docker run --name weaviate -p 8081:8080 -p 8082:50051 -d \
  cr.weaviate.io/semitechnologies/weaviate:1.24.1 \
  --host 0.0.0.0 --port 8080 --scheme http
```
- Update configs/askme.yaml to use vector_backend: weaviate
- Ensure both HTTP (8081) and gRPC (8082) ports are exposed for Weaviate

Script Command Syntax

Use --param=value format for script parameters:

# Correct
./scripts/retrieve.sh "query" --topk=25 --alpha=0.7
# Incorrect
./scripts/retrieve.sh "query" --topk 25 --alpha 0.7

Slow Retrieval Performance
- Check hybrid search parameters (alpha, RRF vs alpha fusion)
- Verify vector database connection and indexing
- Monitor embedding service latency
Poor Reranking Quality
- Ensure local BGE-reranker model is properly loaded
- Check Cohere API key and fallback configuration
- Verify reranking score thresholds
Memory Issues
- Adjust batch sizes in configs/askme.yaml
- Use ASKME_SKIP_HEAVY_INIT=1 for development
- Monitor model memory usage (BGE-M3 + reranker)
Evaluation Failures
- Check TruLens and Ragas library versions
- Verify evaluation dataset format (JSONL with required fields)
- Ensure OpenAI-compatible API access for evaluation LLM

Getting Help

Submit issues via GitHub Issues
Review CLAUDE.md for development context
Check the codebase for implementation details and examples

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

BAAI for BGE-M3 embeddings and reranker models
Milvus for hybrid search capabilities with sparse BM25 support
TruLens and Ragas for evaluation frameworks
FastAPI for the high-performance web framework

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
askme-tui		askme-tui
askme		askme
configs		configs
data/eval_runs		data/eval_runs
docker		docker
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
logo.png		logo.png
main.py		main.py
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
tested_modules.txt		tested_modules.txt
uv.lock		uv.lock

License

deadjoe/askme

Folders and files

Latest commit

History

Repository files navigation

askme

Features

Architecture

Core Technologies

Quick Start

Prerequisites

Installation

Basic Usage

Query Script Parameters

Query Examples

Configuration

Environment Variables

API Reference

Health Endpoints

Document Ingestion

Query & Retrieval

Evaluation

Evaluation Toolkit

Example Query Request

Deployment

Docker Deployment

Production Considerations

Development

Setup Development Environment

Testing

Code Quality

Evaluation

Built-in Evaluation Suites

Available Metrics

Quality Thresholds

Contributing

Development Workflow

Code Style Guidelines

Quality Gates

Troubleshooting

Common Issues

Getting Help

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages