A high-performance Neural Archive & Retrieval-Augmented Generation (RAG) engine — built for precision, speed, and a bold Neo-Brutalist aesthetic.
The Stanza Engine is a decoupled, production-ready RAG pipeline designed to eliminate latency and ensure reliability by separating ingestion from query-time processing.
| Layer | Technology |
|---|---|
| Frontend | Next.js 15 (App Router), Tailwind CSS |
| Backend | FastAPI (Python 3.10+) |
| Vector DB | Qdrant |
| Background Jobs | Inngest |
| Embeddings | BGE-Small-EN (ONNX, local) |
- PDFs are queued, not processed instantly
- Background processing via Inngest
- Zero UI blocking
- Uses quantized
bge-small-en-v1.5 - Runs locally via ONNX
- Reduces cost and latency
- Cosine similarity search
- Semantic matching (not just keywords)
- 4px borders
- Hard shadows (
#000000) - Monospace-first design
- Upload a PDF document
- Automatically queued for background processing
- No waiting — system stays responsive
- Ask natural language questions
- Query is embedded locally
- Relevant chunks retrieved from Qdrant
- Context-aware answers generated from your documents
- Uses retrieved "stanzas" for grounded responses
- Smart context trimming for token efficiency
- Full Neo-Brutalist design preserved
- Clean high-contrast light theme
- Readable and minimal
| Dark Mode | Light Mode |
|---|---|
![]() |
![]() |
- User uploads PDF
- Backend stores file
- Emits
stanza.ingestevent → Inngest - Processing runs asynchronously
- Recursive character splitting
- Preserves semantic meaning
- Converts text into 384-dimensional vectors
- Collection:
stanzas_archive - Distance metric: Cosine Similarity
- Enables semantic retrieval
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique chunk ID |
vector |
Float[384] | Embedding |
content |
String | Raw text chunk |
source |
String | File origin |
page |
Integer | Page reference |
| Endpoint | Description |
|---|---|
POST /api/ingest |
Start ingestion pipeline |
POST /api/query |
Retrieve + generate response |
docker run -p 6333:6333 qdrant/qdrantCreate .env inside backend/:
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your_optional_cloud_key
OPENAI_API_KEY=your_key_here
INNGEST_EVENT_KEY=your_key_herecd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reloadnpx inngest-cli@latest dev -u http://127.0.0.1:8000/api/inngestcd frontend
npm install
npm run dev- ~3× faster than PyTorch
- Lower memory usage
- Token-aware pruning
- Keeps only most relevant chunks
- Prevents LLM overflow
Stanza embraces Neo-Brutalism, rejecting soft UI trends.
Borders
border-4 border-blackShadows
shadow-[8px_8px_0px_0px_rgba(0,0,0,1)]Palette
#FFFFFF (White)
#000000 (Black)
# 1. Fork repository
# 2. Create feature branch
git checkout -b feature/YourFeature
# 3. Commit changes
git commit -m "Add feature"
# 4. Push branch
git push origin feature/YourFeatureOpen a Pull Request 🚀
Muhammad Magdy
Stanza Engine is built to:
- ⚡ Eliminate ingestion bottlenecks
- 🔍 Enable fast semantic retrieval
- 🧠 Run efficiently with local embeddings
- 🎨 Deliver a bold, opinionated UI



