Skip to content

FErArg/PardusDB

 
 

Repository files navigation

PardusDB

A fast, SQLite-like embedded vector database with graph-based approximate nearest neighbor search

Version License: MIT Rust Python

PardusDB is designed for developers building local AI applications — RAG pipelines, semantic search, recommendation systems, or any project that needs lightweight, persistent vector storage without external dependencies.

While Pardus AI gives non-technical users a powerful no-code platform to ask questions of their CSV, JSON, and PDF data in plain English, PardusDB gives developers the same speed and privacy in an embeddable, fully open-source vector database.

Contributors

Features

  • Single-file storage — Everything lives in one .pardus file, just like SQLite
  • Multiple tables — Store different vector dimensions and metadata in the same database
  • Familiar SQL-like syntax — CREATE, INSERT, SELECT, UPDATE, DELETE feel natural
  • UNIQUE constraints — O(1) duplicate detection using HashSet
  • GROUP BY with aggregates — O(n) hash aggregation with COUNT, SUM, AVG, MIN, MAX
  • JOINs — O(n+m) hash join algorithm for INNER, LEFT, RIGHT joins
  • Fast vector similarity search — Graph-based approximate nearest neighbor search
  • Thread-safe — Safe concurrent reads in multi-threaded applications
  • Full transactions — BEGIN/COMMIT/ROLLBACK for atomic operations
  • Optional GPU acceleration — For large batch inserts and queries
  • Python MCP server — 17 tools for AI agent integration (OpenCode, Claude Desktop)
  • Import documents from disk — PDF, CSV, DOCX, XLSX, XLS, JSON, JSONL, MD, TXT with auto-embeddings and parent-child tracking
  • Smart sentence-aware chunking — Intelligent text splitting at sentence boundaries with configurable overlap
  • Async document ingestion — Background thread processing for large files (50MB+) without timeout
  • Joplin note integration — Direct ingestion of Joplin notes with metadata preservation
  • Tmp directory conversion — All file conversions performed in isolated tmp directories before DB ingestion, cleaned up on success
  • Database health checks — Verify integrity, detect orphans, check dimensions
  • Optional dependency installers — Install document parsing libraries and sentence-transformers via setup.sh/install.sh

Installation

Two installers are provided. Both install the binary, helper script, MCP server, Python SDK, and config — the only difference is how the binary is obtained.

Option 1: setup.sh — Build from source (requires Rust)

git clone https://github.com/FErArg/PardusDB
cd pardusdb
./setup.sh --install

Compiles pardusdb from Rust source with cargo build --release. Use this if you want the latest code or have modified the source. Rust is installed automatically if missing.

Option 2: install.sh — Use precompiled binary (no Rust)

git clone https://github.com/FErArg/PardusDB
cd pardusdb
./install.sh --install

Copies the precompiled binary from bin/pardus-v0.4.21-linux-x86_64 to ~/.local/bin/pardusdb. No Rust compilation — faster but requires a pre-existing binary in the repo.

Option 3: install-macos.sh — macOS with venv-based MCP (auto-installs Python 3.10+ if needed)

git clone https://github.com/FErArg/PardusDB
cd pardusdb
./install-macos.sh --install

Requires the precompiled macOS binary bin/pardus-v0.4.21-darwin-arm64 in the repo. If not present, compile on your Mac with cargo build --release and copy to that path. Installs the MCP server inside a Python virtual environment (~/.pardus/mcp/venv/). If Python < 3.10 is detected, automatically offers to install Python 3.13 via Homebrew.

setup.sh install.sh install-macos.sh
Requires Rust Yes (auto-installed) No No
Requires Python 3.10+ No No Yes (auto-installed via Homebrew)
Compiles source Yes No Only if macOS binary missing
Binary from bin/pardus-v*-{platform}-{arch} bin/pardus-v*-linux-x86_64 bin/pardus-v*-darwin-arm64
MCP installation global pip global pip virtual environment
macOS compatibility Partial Partial Recommended
Speed ~1-3 min <1 sec <1 sec + Python install if needed

See INSTALL.md for detailed instructions.

Quick Start

Using the Helper (Recommended)

The pardus helper automatically manages the default database at ~/.pardus/pardus-rag.db. This is the recommended way to use PardusDB:

pardus                    # Opens ~/.pardus/pardus-rag.db (creates if missing)
pardus mi.db              # Opens specific file
pardus                    # Exit with: quit or Ctrl+C

Using the Binary Directly

The pardusdb binary has two modes:

File-backed mode (with path argument):

pardusdb mi.db            # Opens file, processes SQL from stdin, saves on quit

REPL mode (no arguments):

pardusdb                  # Opens project database (database.pardus in CWD) or in-memory

REPL Session Example

╔═══════════════════════════════════════════════════════════════╗
║                    PardusDB REPL                         ║
║          Vector Database with SQL Interface               ║
╚═══════════════════════════════════════════════════════════════╝

pardusdb [memory]> CREATE TABLE docs (embedding VECTOR(384), content TEXT);
Table 'docs' created

pardusdb [memory]> INSERT INTO docs (embedding, content)
VALUES ([0.1, 0.2, ...], 'Hello World');
Inserted row

pardusdb [memory]> SELECT * FROM docs
WHERE embedding SIMILARITY [0.1, 0.2, ...] LIMIT 5;

Found 1 similar rows:
  id=1, distance=0.0000, content=Hello World

pardusdb [memory]> .open mi.db
Database opened: mi.db

pardusdb [mi.db]> .save
Saved to: mi.db

pardusdb [mi.db]> quit
Goodbye!

Helper vs Binary: What's the Difference?

Command Behavior
pardus Helper script that ensures ~/.pardus/pardus-rag.db exists and opens it
pardusdb (no args) REPL with in-memory DB or project database.pardus if found in CWD
pardusdb <path> Opens specific file, reads SQL from stdin until quit

SQL Syntax

Data Types

Type Description Example
VECTOR(n) n-dimensional float vector VECTOR(768)
TEXT UTF-8 string 'hello world'
INTEGER 64-bit integer 42
FLOAT 64-bit float 3.14
BOOLEAN true/false true

Basic Operations

CREATE TABLE documents (
    id INTEGER PRIMARY KEY,
    embedding VECTOR(768),
    title TEXT,
    category TEXT,
    score FLOAT
);

INSERT INTO documents (embedding, title, category, score)
VALUES ([0.1, 0.2, ...], 'Introduction to Rust', 'tutorial', 0.95);

SELECT * FROM documents WHERE category = 'tutorial' LIMIT 10;

UPDATE documents SET score = 0.99 WHERE id = 1;

DELETE FROM documents WHERE id = 1;

Vector Similarity Search

SELECT * FROM documents
WHERE embedding SIMILARITY [0.12, 0.24, ...]
LIMIT 10;

Results are automatically ordered by distance (closest first).

UNIQUE Constraint

CREATE TABLE users (
    embedding VECTOR(128),
    id INTEGER PRIMARY KEY,
    email TEXT UNIQUE
);

-- This will fail - duplicate email
INSERT INTO users (embedding, id, email) VALUES ([0.1, ...], 1, 'test@example.com');
-- Error: Duplicate value for UNIQUE column 'email'

GROUP BY with Aggregates

SELECT category, COUNT(*), AVG(score), SUM(amount)
FROM sales
GROUP BY category;

SELECT category, SUM(amount) as total
FROM sales
GROUP BY category
HAVING SUM(amount) > 1000;

JOINs

SELECT * FROM orders
INNER JOIN users ON orders.user_id = users.id;

SELECT users.email, orders.product
FROM users
LEFT JOIN orders ON users.id = orders.user_id;

REPL Commands

Command Description
.create <file> Create and open a new database
.open <file> Open an existing database
.save Force save current database
.tables List tables
.clear Clear screen
help Show help
quit Exit (auto-saves if file open)

MCP Server for AI Agents

PardusDB includes an MCP server that allows AI agents (OpenCode, Claude Desktop, etc.) to interact with the database using natural language.

Tools Available

Tool Description
pardusdb_create_database Create a new database file
pardusdb_open_database Open an existing database
pardusdb_create_table Create a new table
pardusdb_insert_vector Insert a single vector
pardusdb_batch_insert Batch insert multiple vectors
pardusdb_search_similar Search by vector similarity
pardusdb_execute_sql Execute raw SQL
pardusdb_list_tables List all tables
pardusdb_use_table Set active table
pardusdb_status Show connection status
pardusdb_import_text Import documents from a directory (PDF, CSV, DOCX, XLSX, JSON, JSONL, MD, TXT) with auto-embeddings
pardusdb_health_check Run integrity checks on tables and data
pardusdb_get_schema Show table schema and structure
pardusdb_import_status View or manage import history
pardusdb_ingest_chunked Ingest a document with smart sentence-aware chunking
pardusdb_ingest_joplin Ingest a Joplin note (use after joplin_read_note)
pardusdb_ingest_async Async ingest for large PDFs (avoids timeout)
pardusdb_ingest_status Check async ingest job progress

OpenCode Configuration

Add to your opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "pardusdb": {
      "type": "local",
      "command": ["python3", "/home/${USER}/.pardus/mcp/server.py"],
      "enabled": true
    }
  }
}

Adjust the path to match your installation. Tools are automatically available to the LLM.

SDKs

Python SDK

pip install -e sdk/python
from pardusdb import PardusDB

client = PardusDB()
client.create_table("docs", vector_dim=768, metadata_schema={"content": "TEXT"})
client.insert("docs", [0.1, 0.2, ...], {"content": "Hello"})
results = client.search("docs", [0.1, 0.2, ...], k=10)

Benchmarks

For detailed benchmarks, see BENCHMARKS.md.

Performance Summary (Apple Silicon M-series)

Operation Time
Single insert ~160 µs/doc
Batch insert (1,000 docs) ~6 ms
Query (k=10) ~3 µs

Speed Comparison

vs Neo4j PardusDB Advantage
Insert 1983x faster
Search 431x faster
vs HelixDB PardusDB Advantage
Insert 200x faster
Search 62x faster
Batch Size Speedup vs Individual
100 45x
500 149x
1000 220x

Examples

Rust

cargo run --example simple_rag --release

Python

cd examples/python
pip install requests
python simple_rag.py

Why We Built PardusDB

The Pardus AI team built PardusDB because we believe private, local-first AI tools should be accessible to everyone — from individual developers to large teams.

PardusDB gives you the low-level building block for fast, private vector search, while Pardus AI delivers the high-level no-code experience for analysts, marketers, and business users who just want answers from their data.

If you enjoy working with PardusDB, we'd love for you to try Pardus AI — upload your spreadsheets or documents and ask questions in plain English. Free tier available, no credit card required.

License

MIT License — use it freely in personal and commercial projects.


⭐ Star us on GitHub if you find this useful! 🚀 Building something cool with PardusDB? Share it with us on X or Discord — we'd love to hear from you.

Pardus AIhttps://pardusai.org/

About

SQLite-like embedded vector database

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Rust 51.0%
  • Makefile 26.4%
  • Python 13.0%
  • Shell 5.3%
  • TypeScript 2.8%
  • DTrace 1.1%
  • Other 0.4%