Embedded SQLite micro-framework for hybrid search (FTS5 + vectors), IoT and RAG.
No server. No Docker. No network. One .db file. Runs on anything — Raspberry Pi, industrial PCs, that old Celeron laptop collecting dust on a shelf.
GitHub: github.com/SiberXXX/sqfox | README на русском
# Core only (zero dependencies, includes SqliteHnswBackend + SqliteFlatBackend)
pip install sqfox
# With hybrid search (English)
pip install sqfox[search]
# With hybrid search (English + Russian)
pip install sqfox[search-ru]from sqfox import SQFox
with SQFox("app.db") as db:
db.write("CREATE TABLE sensors (ts TEXT, value REAL)", wait=True)
# Non-blocking writes from any thread
db.write("INSERT INTO sensors VALUES (?, ?)", ("2026-03-01T10:00:00", 23.5))
# Reads — parallel, non-blocking
rows = db.fetch_all("SELECT * FROM sensors ORDER BY ts DESC LIMIT 10")All writes go through a single writer thread with batching. Multiple threads can read in parallel via WAL mode. PRAGMAs are tuned automatically.
from sqfox import SQFox
def my_embed(texts):
"""Your embedding function — any model, any API."""
raise NotImplementedError("Implement your embedding function")
with SQFox("knowledge.db") as db:
db.ingest("Short note about SQLite WAL mode", embed_fn=my_embed, wait=True)
results = db.search("database optimization", embed_fn=my_embed)
for r in results:
print(f"[{r.score:.3f}] {r.text}")Always use a chunker for documents longer than ~500 words. Without chunking, the entire document becomes a single vector — this kills search precision.
from sentence_transformers import SentenceTransformer
class QwenEmbedder:
def __init__(self, dim=256):
self.model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", truncate_dim=dim)
def embed_documents(self, texts):
return self.model.encode(texts).tolist()
def embed_query(self, text):
return self.model.encode(text, prompt_name="query").tolist()
embedder = QwenEmbedder()
with SQFox("rag.db") as db:
db.ingest("Document text...", embed_fn=embedder, wait=True)
results = db.search("query text", embed_fn=embedder)sqfox detects embed_documents / embed_query methods and calls the right one automatically. Plain callables work too.
sqfox ships with two pure-Python vector backends — zero C dependencies:
- HNSW (auto-selected for most environments) — O(log N) approximate nearest-neighbor search, graph stored as BLOB in SQLite.
- Flat — brute-force KNN, simple and exact.
from sqfox import SQFox, SqliteHnswBackend
# Auto-selected for desktop/server — HNSW (O(log N), pure Python, graph stored in SQLite as BLOB)
db = SQFox("app.db")
# Flat — brute-force KNN, exact search
db = SQFox("app.db", vector_backend="flat")
# HNSW with custom parameters
backend = SqliteHnswBackend(M=16, ef_construction=200, ef_search=64)
db = SQFox("app.db", vector_backend=backend)The HNSW graph is stored alongside documents in the same .db file. SQLite is the source of truth — if the graph is lost or corrupted, it rebuilds automatically from embedding BLOBs on next startup.
Write your own backend — implement the VectorBackend protocol:
from sqfox import VectorBackend
class MyBackend:
def set_writer_conn(self, conn) -> None: ...
def initialize(self, db_path: str, ndim: int) -> None: ...
def add(self, keys: list[int], vectors: list[list[float]]) -> None: ...
def remove(self, keys: list[int]) -> None: ...
def search(self, query: list[float], k: int, **kwargs) -> list[tuple[int, float]]: ...
def flush(self) -> None: ...
def count(self) -> int: ...
def close(self) -> None: ...
db = SQFox("app.db", vector_backend=MyBackend())from sqfox import SQFox, sentence_chunker, markdown_chunker, html_to_text
with SQFox("docs.db") as db:
db.ingest(text, chunker=sentence_chunker(chunk_size=500, overlap=1), embed_fn=my_embed, wait=True)
db.ingest(md_text, chunker=markdown_chunker(max_level=2), embed_fn=my_embed, wait=True)
clean = html_to_text(raw_html)
db.ingest(clean, chunker=sentence_chunker(), embed_fn=my_embed, wait=True)Available: sentence_chunker, paragraph_chunker, markdown_chunker, recursive_chunker, html_to_text. Custom (str) -> list[str] callables also work.
from sqfox import SQFoxManager
with SQFoxManager("./databases") as mgr:
mgr.ingest_to("sensors", "Temperature: 25.3C", embed_fn=my_embed, wait=True)
mgr.ingest_to("manuals", "Replace bearing at 50k RPM", embed_fn=my_embed, wait=True)
results = mgr.search_all("bearing vibration", embed_fn=my_embed)
for db_name, r in results:
print(f"[{db_name}] [{r.score:.3f}] {r.text}")
mgr.drop("old_logs", delete_file=True)db.ingest(
"Pressure sensor calibration procedure",
metadata={"source": "manual_v3", "equipment": "PS-100"},
embed_fn=my_embed, wait=True,
)
results = db.search("calibration", embed_fn=my_embed)
print(results[0].metadata) # {"source": "manual_v3", "equipment": "PS-100"}Two-stage retrieval: fast hybrid search retrieves candidates, then a cross-encoder re-scores them:
from sentence_transformers import CrossEncoder
reranker_model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def my_reranker(query: str, texts: list[str]) -> list[float]:
pairs = [(query, t) for t in texts]
return reranker_model.predict(pairs).tolist()
results = db.search(
"database optimization",
embed_fn=my_embed,
reranker_fn=my_reranker,
rerank_top_n=20,
)with SQFox("app.db") as db:
db.backup("backup.db")
def progress(status, remaining, total):
print(f"Copied {total - remaining}/{total} pages")
db.backup("backup.db", progress=progress)with SQFox("app.db") as db:
# Rewrite DB file, reclaim space from deleted rows
db.vacuum()
# Or write a compacted copy (no 2x disk space needed)
db.vacuum(into="compacted.db")AsyncSQFox — async facade with dual thread pools. I/O reads are never blocked by CPU-heavy embedding/reranking:
from contextlib import asynccontextmanager
from fastapi import FastAPI
from sqfox import AsyncSQFox
db = AsyncSQFox("data.db", max_cpu_workers=2)
@asynccontextmanager
async def lifespan(app: FastAPI):
async with db:
yield
app = FastAPI(lifespan=lifespan)
@app.post("/ingest")
async def ingest(text: str):
doc_id = await db.ingest(text, embed_fn=my_embed)
return {"id": doc_id}
@app.get("/search")
async def search(q: str):
return await db.search(q, embed_fn=my_embed)Two pools: I/O pool (default asyncio executor) for fetch_one/fetch_all/backup, CPU pool (limited ThreadPoolExecutor) for ingest/search with embeddings. Even with 200 concurrent ingests, only max_cpu_workers embeddings run at once.
Thread safety: If your model doesn't support concurrent calls, set
max_cpu_workers=1.
Single process: Use
uvicorn app:app --workers 1. For multi-worker, use PostgreSQL.
def on_error(sql: str, exc: Exception):
with open("failed_queries.log", "a") as f:
f.write(f"{sql}\n{exc}\n\n")
db = SQFox("app.db", error_callback=on_error)Fires on syntax errors, constraint violations, batch aborts, and writer thread crashes. Without a callback, errors still go to the Future object and Python logging.
Writer Thread (single) Reader Threads (many)
| |
PriorityQueue threading.local()
| |
Batching (N writes One connection
per transaction) per thread
| |
Writer Connection Reader Connections
| |
+--------- SQLite WAL ---------+
|
One .db file
|
+----------+----------+
| |
Flat backend HNSW graph
(brute KNN) (BLOB in SQLite)
- Writes: All writes go through a
PriorityQueueto a single writer thread. Batched into one transaction (BEGIN IMMEDIATE). Nodatabase is lockederrors. - Reads: Each thread gets its own connection via
threading.local(). Parallel, non-blocking (WAL mode). - Search: FTS5 (BM25) and vector backend (HNSW or Flat) run independently. Merged via Relative Score Fusion with adaptive alpha.
- Vector backends:
SqliteHnswBackend(default, pure Python HNSW, O(log N), graph serialized as CSR BLOB in SQLite) orSqliteFlatBackend(brute-force KNN). Custom backends via theVectorBackendprotocol. Pure Python, zero C dependencies. - Crash safety: Embedding BLOBs are always stored in SQLite (source of truth). The HNSW graph is a rebuildable cache — if corrupted, it auto-recovers from BLOBs on startup.
- Auto-adaptive: Detects RAM, CPU count, platform (Desktop / Raspberry Pi / Android Termux / SBC), SD card. Tunes PRAGMAs automatically: LOW (<1 GB) → 4 MB cache, no mmap; MEDIUM (1–4 GB) → 16 MB cache, 64 MB mmap; HIGH (>4 GB) → 64 MB cache, 256 MB mmap. Vector backend auto-selected on first
ingest(). Incremental vacuum runs silently in the background. Zero config needed. - Auto-PRAGMA: WAL,
synchronous=NORMAL,busy_timeout=5000,temp_store=MEMORY, cache/mmap auto-tuned to RAM,foreign_keys=ON. - Schema: Evolves automatically —
EMPTY → BASE → INDEXED → SEARCHABLE → ENRICHED. No manual migrations. Idempotent. Resumable backfill. - Lemmatization: Mixed RU+EN text handled per-word — pymorphy3 for Cyrillic, simplemma for Latin.
| Scenario | Description | Key features |
|---|---|---|
| DIY smart home / automation | Old laptop or Celeron mini-PC on a shelf — boiler control, sensor logging, climate automation. No cloud, no subscription, just Python + sqfox on hardware you already own | Concurrent writes, WAL, batching, backup, Grafana via SQLite plugin |
| Telegram / Discord bot | Local knowledge base for bots, desktop tools, CLI utilities, offline assistants | AsyncSQFox, hybrid search, chunking, metadata, reranking |
| Industrial IoT + Grafana | Edge gateways (Raspberry Pi, Jetson, industrial PCs). Grafana reads the same .db via frser-sqlite-datasource |
Concurrent writes, WAL, batching, backup |
| Self-diagnosing edge agent | Two AsyncSQFox instances: telemetry (I/O pool) + knowledge base (CPU pool). Threshold triggers auto-RAG search | Dual-pool isolation, auto-RAG |
| OBD-II smart mechanic | ELM327 reads engine data, sqfox auto-searches service manual on DTC codes | See demo/obd_smart_mechanic.py |
sqfox is not a replacement for PostgreSQL. It is designed for a specific niche.
| Requirement | Use instead |
|---|---|
| Multiple processes writing to one DB | PostgreSQL, MySQL |
| More than ~500K vectors | pgvector, Qdrant, Milvus |
| Multi-server / distributed / HA | PostgreSQL, Elasticsearch, CockroachDB |
| High-frequency data (>10K points/sec) | InfluxDB, ClickHouse |
- SqliteHnswBackend — pure Python HNSW. Faster than brute-force on 10K+ vectors, but slower than C/Rust implementations (usearch, hnswlib). Acceptable for edge/embedded scenarios.
- Lemmatization without context — pymorphy3 picks the most frequent word form (~79% accuracy). Fine for search, not perfect.
synchronous=NORMAL— last few seconds of commits may be lost on power failure. DB file won't corrupt. Usesynchronous=FULLmanually if you need guaranteed durability.- SD card wear — batching reduces fsyncs, but plan card replacement for long-running IoT deployments.
Single process, multiple threads. Up to ~100K+ docs with HNSW backend, or ~1M rows without vectors. Local/embedded deployment. Runs fine on dual-core Celerons, old Core 2 Duo laptops, Raspberry Pi — anything with Python 3.10+. Perfect for DIY home automation, boiler controllers, sensor logging, local knowledge bases. If you're thinking "maybe I need PostgreSQL" — you probably do. sqfox is for cases where PostgreSQL is overkill or impossible.
with SQFox("app.db") as db:
print(db.diagnostics()){
"sqfox_version": "0.3.0",
"python_version": "3.13.7",
"platform": "Linux-6.1.0-rpi-aarch64",
"sqlite_version": "3.50.4",
"path": "app.db",
"is_running": true,
"vector_backend": "hnsw",
"queue_size": 0,
"schema_state": "ENRICHED",
"auto": {
"total_ram_mb": 512,
"memory_tier": "LOW",
"cpu_count": 4,
"platform_class": "RASPBERRY_PI",
"is_sd_card": false,
"fts5_available": true,
"resolved_cache_size_kb": 4000,
"resolved_mmap_size_mb": 0
}
}sqfox auto-detects your environment at startup and configures itself:
| Detected | How | Fallback |
|---|---|---|
| RAM | /proc/meminfo, sysctl, GlobalMemoryStatusEx |
1 GB |
| CPU count | sched_getaffinity (Docker-aware), cpu_count() |
1 |
| Platform | TERMUX_VERSION env, /proc/device-tree/model, arch |
Desktop |
| SD card | Path heuristic (/media/, /mnt/sd, etc.) |
No |
| FTS5 | Probe in :memory: |
Vector-only search |
PRAGMA tuning by memory tier:
| Tier | RAM | cache_size | mmap_size |
|---|---|---|---|
| LOW | < 1 GB | 4 MB | 0 (disabled) |
| MEDIUM | 1–4 GB | 16 MB | 64 MB |
| HIGH | > 4 GB | 64 MB | 256 MB |
Zero config needed. Just SQFox("data.db") — it works on a Raspberry Pi, an Android phone, or a 64 GB desktop.
To override auto-detection, pass explicit values:
# Force specific settings (disables auto for these params)
db = SQFox(
"sensors.db",
cache_size_kb=8_000, # override auto: 8 MB
mmap_size_mb=0, # override auto: disable mmap
)Additional tips:
- SD card durability: Default
synchronous=NORMALmay lose the last transaction on power loss (file won't corrupt). For hard guarantees, rundb.write("PRAGMA synchronous=FULL", wait=True)after start. - Disk space: sqfox enables incremental auto-vacuum on new databases and reclaims pages silently during ingestion. For manual compaction:
db.vacuum()ordb.vacuum(into="compacted.db"). - Android / Termux: FTS5 support depends on the Python build. Most Kivy, BeeWare, and Termux Python packages include FTS5. If FTS5 is unavailable, sqfox detects this and works with vector-only search — no crash, no config needed.
- Graceful shutdown: sqfox registers an
atexithook so the writer thread is drained on normal process exit. ForSIGKILLor hard power-off, SQLite WAL journaling protects the database file. - FTS5 self-healing: On every startup, sqfox runs an FTS integrity check. If the index is corrupt, it auto-rebuilds.
| Platform | Status |
|---|---|
| Linux x86-64 | works |
| Linux ARM64 (Raspberry Pi, industrial PCs) | works |
| macOS Intel / Apple Silicon | works |
| Windows x86-64 | works |
| Alpine Linux (musl) | works |
| 32-bit / Windows ARM64 | works (pure Python, no C extensions) |
All vector backends are pure Python with zero C dependencies, so sqfox runs on any platform with Python 3.10+.
# IoT + RAG (requires sentence-transformers)
python demo/run_demo.py
# HNSW X-Ray — interactive graph inspector (requires sentence-transformers)
python demo/run_hnsw_xray.py
# Crash & Recovery — corrupt HNSW graph, auto-rebuild (requires sentence-transformers)
python demo/run_crash_recovery.py
# Smart home IoT emulation
python demo/run_smart_home.py
# OBD-II smart mechanic
python demo/run_smart_mechanic.py- Python >= 3.10
- Core: zero dependencies (stdlib only)
[search]: simplemma[search-ru]: + pymorphy3[search-hnsw]: usearch, numpy (optional USearch backend)- SqliteHnswBackend + SqliteFlatBackend: included in core (pure Python, no extra deps)
MIT
