🔍 Nexus Search — Smart Semantic Search Engine with RAG

Production-grade semantic search system for internal documents using embeddings, pgvector, and Retrieval-Augmented Generation.

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│    React      │────▶│   FastAPI     │────▶│  PostgreSQL   │
│   Frontend    │     │   REST API    │     │  + pgvector   │
└──────────────┘     └──────┬───────┘     └──────────────┘
                           │
                    ┌──────┴───────┐
                    │              │
              ┌─────▼─────┐ ┌─────▼─────┐
              │ Embedding  │ │   RAG      │
              │ Service    │ │ Generator  │
              │ (MiniLM)   │ │(Flan-T5-L) │
              └───────────┘ └───────────┘
                    │              │
              ┌─────▼─────┐ ┌─────▼─────┐
              │  Redis     │ │  MLflow    │
              │  Cache     │ │  Tracking  │
              └───────────┘ └───────────┘

Tech Stack

Component	Technology
API	FastAPI (async)
Database	PostgreSQL + pgvector
Embeddings	sentence-transformers/all-MiniLM-L6-v2
LLM	google/flan-t5-large
Experiment Tracking	MLflow
Cache	Redis
Auth	JWT (python-jose + passlib)
Frontend	React + Vite
Containerization	Docker + Docker Compose
Orchestration	Kubernetes

Quick Start

Prerequisites

Python 3.11+ (tested on 3.13)
Docker & Docker Compose
Node.js 18+ (for frontend)

1. Start Infrastructure

docker compose -f infra/docker/docker-compose.yml up -d postgres redis mlflow

This starts:

PostgreSQL 16 with pgvector extension on port 5432
Redis 7 on port 6379
MLflow tracking server on port 5000

2. Install Python Dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Run the API

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

First request will download ML models (~90MB for embeddings, ~3GB for flan-t5-large). Subsequent starts use cached models.

4. Run the Frontend

cd frontend
npm install
npm run dev

Opens at http://localhost:3000 with API proxy to port 8000.

5. Index Sample Documents

First grab a token, then upload:

TOKEN=$(curl -s -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"admin"}' | python -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

curl -X POST http://localhost:8000/upload -H "Authorization: Bearer $TOKEN" -F "file=@data/sample_ml_basics.txt"
curl -X POST http://localhost:8000/upload -H "Authorization: Bearer $TOKEN" -F "file=@data/sample_kubernetes.txt"

6. Full Docker Deployment

docker compose -f infra/docker/docker-compose.yml up --build

API Reference

Health Check

curl http://localhost:8000/health
# {"status":"healthy","version":"1.0.0","database":"connected"}

Get Auth Token

curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "admin"}'
# {"access_token":"eyJ...","token_type":"bearer"}

Upload a Document

curl -X POST http://localhost:8000/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@data/sample_ml_basics.txt"
# {"document_id":"...","filename":"sample_ml_basics.txt","chunk_count":3,"message":"Document indexed successfully with 3 chunks."}

Supported formats: PDF, TXT, DOCX. Max file size: 50MB.

List Documents

curl http://localhost:8000/documents \
  -H "Authorization: Bearer $TOKEN"

Returns all indexed documents (paginated via skip and limit query params).

Delete a Document

curl -X DELETE http://localhost:8000/documents/{document_id} \
  -H "Authorization: Bearer $TOKEN"

Deletes the document and all its chunks.

Semantic Search

curl -X POST http://localhost:8000/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is RAG?", "top_k": 3}'

Response includes:

generated_answer — AI-generated answer grounded in retrieved context
retrieved_chunks — top-k source chunks with similarity scores
latency_ms — end-to-end latency
model_info — embedding model, LLM, and top-k used

Kubernetes Deployment

kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/postgres.yaml
kubectl apply -f infra/k8s/redis.yaml
kubectl apply -f infra/k8s/app.yaml

Includes Ingress config at search.local. Update the host or add your domain.

Project Structure

project-root/
├── app/
│   ├── api/            # FastAPI route handlers (health, auth, upload, search)
│   ├── core/           # Config, logging, JWT auth
│   ├── services/       # Ingestion, retrieval, chunking, caching, MLflow tracking
│   ├── models/         # SQLAlchemy models + Pydantic schemas
│   └── db/             # Async session + DB initialization
├── ml/
│   ├── embedding/      # sentence-transformers embedding service
│   └── rag/            # Flan-T5 answer generation with prompt engineering
├── infra/
│   ├── docker/         # Dockerfile + docker-compose
│   └── k8s/            # Namespace, Postgres, Redis, App + Ingress
├── frontend/           # React (Vite) — premium warm-tone UI
│   └── src/components/ # Header, SearchBar, ResultsPanel, UploadModal, HealthBadge
├── data/               # Sample documents for testing
├── tests/              # Unit tests (chunker, parser, API)
├── uploads/            # Temporary upload directory (gitignored)
├── requirements.txt
├── .env.example        # Environment variable template
└── README.md

Configuration

All settings are in .env (or environment variables):

Variable	Default	Description
`DATABASE_URL`	`postgresql+asyncpg://...`	Async DB connection string
`SYNC_DATABASE_URL`	`postgresql://...`	Sync DB connection string (used by MLflow)
`REDIS_URL`	`redis://localhost:6379/0`	Redis cache URL
`EMBEDDING_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	HF embedding model
`LLM_MODEL`	`google/flan-t5-large`	HF text generation model
`EMBEDDING_DIMENSION`	`384`	Embedding vector dimension
`MLFLOW_TRACKING_URI`	`http://localhost:5000`	MLflow server URL
`CHUNK_SIZE`	`200`	Words per chunk
`CHUNK_OVERLAP`	`30`	Overlapping words between chunks
`TOP_K`	`5`	Default retrieval count
`SECRET_KEY`	—	JWT signing key (change in production)
`ACCESS_TOKEN_EXPIRE_MINUTES`	`60`	JWT token lifetime
`UPLOAD_DIR`	`./uploads`	Temp directory for uploaded files
`LOG_LEVEL`	`INFO`	Application log level

Running Tests

pip install pytest pytest-asyncio anyio httpx
pytest tests/ -v

Performance Notes

First query is slow (~20s on CPU) due to model loading. Subsequent queries are 2-5s.
With GPU, inference drops to under 1s.
Redis caching returns repeated queries instantly (300s TTL).
Chunk deduplication prevents duplicate results from re-uploaded documents.

Rate Limiting

Endpoints are rate-limited per client IP via Redis:

Endpoint	Limit
`POST /upload`	5 requests / 60s
`POST /search`	20 requests / 60s

If Redis is unavailable, rate limiting is silently skipped.

MLflow Dashboard

Visit http://localhost:5000 after starting MLflow to view:

Search query metrics (latency, result count, top similarity score)
Document ingestion tracking (chunk count, model params)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Nexus Search — Smart Semantic Search Engine with RAG

Architecture

Tech Stack

Quick Start

Prerequisites

1. Start Infrastructure

2. Install Python Dependencies

3. Run the API

4. Run the Frontend

5. Index Sample Documents

6. Full Docker Deployment

API Reference

Health Check

Get Auth Token

Upload a Document

List Documents

Delete a Document

Semantic Search

Kubernetes Deployment

Project Structure

Configuration

Running Tests

Performance Notes

Rate Limiting

MLflow Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
data		data
frontend		frontend
infra		infra
ml		ml
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔍 Nexus Search — Smart Semantic Search Engine with RAG

Architecture

Tech Stack

Quick Start

Prerequisites

1. Start Infrastructure

2. Install Python Dependencies

3. Run the API

4. Run the Frontend

5. Index Sample Documents

6. Full Docker Deployment

API Reference

Health Check

Get Auth Token

Upload a Document

List Documents

Delete a Document

Semantic Search

Kubernetes Deployment

Project Structure

Configuration

Running Tests

Performance Notes

Rate Limiting

MLflow Dashboard

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages