Skip to content

Hrk84ya/Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Nexus Search β€” Smart Semantic Search Engine with RAG

Production-grade semantic search system for internal documents using embeddings, pgvector, and Retrieval-Augmented Generation.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    React      │────▢│   FastAPI     │────▢│  PostgreSQL   β”‚
β”‚   Frontend    β”‚     β”‚   REST API    β”‚     β”‚  + pgvector   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚              β”‚
              β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
              β”‚ Embedding  β”‚ β”‚   RAG      β”‚
              β”‚ Service    β”‚ β”‚ Generator  β”‚
              β”‚ (MiniLM)   β”‚ β”‚(Flan-T5-L) β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚              β”‚
              β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
              β”‚  Redis     β”‚ β”‚  MLflow    β”‚
              β”‚  Cache     β”‚ β”‚  Tracking  β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Component Technology
API FastAPI (async)
Database PostgreSQL + pgvector
Embeddings sentence-transformers/all-MiniLM-L6-v2
LLM google/flan-t5-large
Experiment Tracking MLflow
Cache Redis
Auth JWT (python-jose + passlib)
Frontend React + Vite
Containerization Docker + Docker Compose
Orchestration Kubernetes

Quick Start

Prerequisites

  • Python 3.11+ (tested on 3.13)
  • Docker & Docker Compose
  • Node.js 18+ (for frontend)

1. Start Infrastructure

docker compose -f infra/docker/docker-compose.yml up -d postgres redis mlflow

This starts:

  • PostgreSQL 16 with pgvector extension on port 5432
  • Redis 7 on port 6379
  • MLflow tracking server on port 5000

2. Install Python Dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Run the API

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

First request will download ML models (~90MB for embeddings, ~3GB for flan-t5-large). Subsequent starts use cached models.

4. Run the Frontend

cd frontend
npm install
npm run dev

Opens at http://localhost:3000 with API proxy to port 8000.

5. Index Sample Documents

First grab a token, then upload:

TOKEN=$(curl -s -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"admin"}' | python -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

curl -X POST http://localhost:8000/upload -H "Authorization: Bearer $TOKEN" -F "file=@data/sample_ml_basics.txt"
curl -X POST http://localhost:8000/upload -H "Authorization: Bearer $TOKEN" -F "file=@data/sample_kubernetes.txt"

6. Full Docker Deployment

docker compose -f infra/docker/docker-compose.yml up --build

API Reference

Health Check

curl http://localhost:8000/health
# {"status":"healthy","version":"1.0.0","database":"connected"}

Get Auth Token

curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "admin"}'
# {"access_token":"eyJ...","token_type":"bearer"}

Upload a Document

curl -X POST http://localhost:8000/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@data/sample_ml_basics.txt"
# {"document_id":"...","filename":"sample_ml_basics.txt","chunk_count":3,"message":"Document indexed successfully with 3 chunks."}

Supported formats: PDF, TXT, DOCX. Max file size: 50MB.

List Documents

curl http://localhost:8000/documents \
  -H "Authorization: Bearer $TOKEN"

Returns all indexed documents (paginated via skip and limit query params).

Delete a Document

curl -X DELETE http://localhost:8000/documents/{document_id} \
  -H "Authorization: Bearer $TOKEN"

Deletes the document and all its chunks.

Semantic Search

curl -X POST http://localhost:8000/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is RAG?", "top_k": 3}'

Response includes:

  • generated_answer β€” AI-generated answer grounded in retrieved context
  • retrieved_chunks β€” top-k source chunks with similarity scores
  • latency_ms β€” end-to-end latency
  • model_info β€” embedding model, LLM, and top-k used

Kubernetes Deployment

kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/postgres.yaml
kubectl apply -f infra/k8s/redis.yaml
kubectl apply -f infra/k8s/app.yaml

Includes Ingress config at search.local. Update the host or add your domain.

Project Structure

project-root/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/            # FastAPI route handlers (health, auth, upload, search)
β”‚   β”œβ”€β”€ core/           # Config, logging, JWT auth
β”‚   β”œβ”€β”€ services/       # Ingestion, retrieval, chunking, caching, MLflow tracking
β”‚   β”œβ”€β”€ models/         # SQLAlchemy models + Pydantic schemas
β”‚   └── db/             # Async session + DB initialization
β”œβ”€β”€ ml/
β”‚   β”œβ”€β”€ embedding/      # sentence-transformers embedding service
β”‚   └── rag/            # Flan-T5 answer generation with prompt engineering
β”œβ”€β”€ infra/
β”‚   β”œβ”€β”€ docker/         # Dockerfile + docker-compose
β”‚   └── k8s/            # Namespace, Postgres, Redis, App + Ingress
β”œβ”€β”€ frontend/           # React (Vite) β€” premium warm-tone UI
β”‚   └── src/components/ # Header, SearchBar, ResultsPanel, UploadModal, HealthBadge
β”œβ”€β”€ data/               # Sample documents for testing
β”œβ”€β”€ tests/              # Unit tests (chunker, parser, API)
β”œβ”€β”€ uploads/            # Temporary upload directory (gitignored)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example        # Environment variable template
└── README.md

Configuration

All settings are in .env (or environment variables):

Variable Default Description
DATABASE_URL postgresql+asyncpg://... Async DB connection string
SYNC_DATABASE_URL postgresql://... Sync DB connection string (used by MLflow)
REDIS_URL redis://localhost:6379/0 Redis cache URL
EMBEDDING_MODEL sentence-transformers/all-MiniLM-L6-v2 HF embedding model
LLM_MODEL google/flan-t5-large HF text generation model
EMBEDDING_DIMENSION 384 Embedding vector dimension
MLFLOW_TRACKING_URI http://localhost:5000 MLflow server URL
CHUNK_SIZE 200 Words per chunk
CHUNK_OVERLAP 30 Overlapping words between chunks
TOP_K 5 Default retrieval count
SECRET_KEY β€” JWT signing key (change in production)
ACCESS_TOKEN_EXPIRE_MINUTES 60 JWT token lifetime
UPLOAD_DIR ./uploads Temp directory for uploaded files
LOG_LEVEL INFO Application log level

Running Tests

pip install pytest pytest-asyncio anyio httpx
pytest tests/ -v

Performance Notes

  • First query is slow (~20s on CPU) due to model loading. Subsequent queries are 2-5s.
  • With GPU, inference drops to under 1s.
  • Redis caching returns repeated queries instantly (300s TTL).
  • Chunk deduplication prevents duplicate results from re-uploaded documents.

Rate Limiting

Endpoints are rate-limited per client IP via Redis:

Endpoint Limit
POST /upload 5 requests / 60s
POST /search 20 requests / 60s

If Redis is unavailable, rate limiting is silently skipped.

MLflow Dashboard

Visit http://localhost:5000 after starting MLflow to view:

  • Search query metrics (latency, result count, top similarity score)
  • Document ingestion tracking (chunk count, model params)

About

Production-grade semantic search system for internal documents using embeddings, pgvector, and Retrieval-Augmented Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors