Remex turns any folder of documents — PDFs, Word files, notes, spreadsheets, code — into a private, searchable knowledge base. Ask questions in plain language and get answers grounded in your own files, with sources cited.
Everything runs on your machine. No cloud account. No API key required for search. Bring your own AI provider (Anthropic, OpenAI, or a local Ollama instance) only when you want synthesised answers.
Native desktop app for Windows. No terminal required.
⚠️ Windows SmartScreen warning Windows may display a "Windows protected your PC" warning when downloading or installing Remex Studio. This happens because the app is not yet code-signed with a paid certificate — the software is safe and fully open source, feel free to audit the source code in this repository. To proceed: click "More info" then "Run anyway".
| 🔍 Semantic search | Vector similarity search across one or more collections simultaneously |
| 🤖 AI Answer | Ask a question, get a synthesised answer with cited sources (Anthropic · OpenAI · Ollama) |
| 📄 12 file formats | .pdf .docx .md .txt .csv .json .jsonl + .html .pptx .xlsx .epub .odt (optional package) |
| 🗄 SQLite ingest | Embed rows from any table alongside your files |
| ♻️ Incremental ingest | SHA-256 hash check — only changed files are re-processed |
| 🎯 Source filter | Narrow results to one or more documents before searching or asking AI |
| 🔎 Chunk viewer | Expand any result to read the full chunk, navigate with keyboard arrows |
| 📦 Collections manager | Rename, describe, purge, bulk-delete sources, one-click re-ingest |
| 📤 Export | JSON · CSV · Markdown · BibTeX · RIS · CSL-JSON · Obsidian vault |
| 👁️ Watch folders | Re-ingest automatically when files change inside chosen directories |
| 🔬 All embedding models | MiniLM, bge-base, bge-large, multilingual, nomic-embed long-context, custom HuggingFace/FastEmbed names |
| 🌙 Themes | Light, dark, auto (follows OS) + sixteen accent colours |
| 🔎 Searchable query history | Filter past queries by substring |
| ⌨️ Keyboard-driven | Press ? anywhere in Studio for the full shortcuts reference |
| ⚙️ Optional packages | Install extra file formats, AI integrations, or sentence chunking from Settings → General at any time |
Remex is free and open-source. Every feature ships in the box — no tiers, no license keys, no payment required.
pip install remex-cli # core — ingest + query (7 formats)
pip install "remex-cli[formats]" # + .pptx .xlsx .epub .html .odt
pip install "remex-cli[ai]" # + Anthropic & OpenAI embeddings / generation
pip install "remex-cli[sentence]" # + sentence-aware chunking (NLTK)
pip install "remex-cli[api]" # + FastAPI sidecar (used by Studio)
pip install "remex-cli[all]" # everything above# Scaffold a project
remex init
# Ingest a folder of documents
remex ingest ./docs
# Semantic search
remex query "how does authentication work?"
# AI-synthesised answer (requires ANTHROPIC_API_KEY, OPENAI_API_KEY, or a running Ollama)
remex query "how does authentication work?" --ai| Command | Description |
|---|---|
remex init [path] |
Scaffold docs/, remex.toml, and .gitignore entries |
remex ingest <dir> |
Ingest files from a directory into a collection |
remex ingest-sqlite <db> |
Ingest rows from a SQLite table |
remex query <text> |
Semantic search; add --ai for an AI-synthesised answer |
remex sources |
List all ingested source paths in a collection |
remex stats |
Show chunk and source counts |
remex delete-source <path> |
Remove all chunks for a specific source |
remex purge |
Remove chunks whose source file no longer exists on disk |
remex reset |
Wipe an entire collection |
remex list-collections |
List all collections in a database |
remex serve |
Start the FastAPI sidecar on localhost:8000 |
remex <command> --help # full option reference for any commandfrom remex import ingest, query
# Ingest a folder
result = ingest("./docs", collection_name="my-kb")
print(f"{result.chunks_stored} chunks stored")
# Search
results = query("how does auth work?", collection_name="my-kb")
for r in results:
print(f"[{r.score:.3f}] {r.source} → {r.text[:120]}")Drop a remex.toml in your project root (or run remex init to generate one):
[remex]
db = "./remex_db"
collection = "my-kb"
embedding_model = "all-MiniLM-L6-v2"
# chunk_size = 768 # characters per chunk (512-1024 works well)
# overlap = 150 # ~20% overlap preserves context at boundaries
# min_chunk_size = 50 # discard chunks shorter than this
# chunking = "recursive" # "recursive" (default) | "sentence" | "word"CLI flags always override remex.toml values.
| Preset | Model | Size | Notes |
|---|---|---|---|
| Light | all-MiniLM-L6-v2 |
22 MB | Default — fast, good accuracy |
| Balanced | intfloat/e5-base-v2 |
438 MB | Better retrieval quality |
| Multilingual | paraphrase-multilingual-MiniLM-L12-v2 |
470 MB | 50+ languages |
| Large (Pro) | BAAI/bge-large-en-v1.5 |
1.3 GB | Best English accuracy |
| E5 Large (Pro) | intfloat/e5-large-v2 |
1.3 GB | Strong retrieval benchmark |
| Long ctx (Pro) | nomic-ai/nomic-embed-text-v1.5 |
547 MB | 8,192-token context window |
Any model from SBERT, HuggingFace sentence-similarity, or Ollama can be used by typing the model name directly.
Studio requires Rust, Node.js 20+, and the Tauri prerequisites for Windows.
# Python CLI
pip install -e ".[dev]"
pytest
# Studio (dev server with hot-reload)
cd studio
npm install
npm run tauri dev
# Studio (production build)
npm run tauri buildSee studio/README.md for the full build guide.
Changelog · Contributing · Licensing · GitHub
Python CLI: Apache-2.0 · Studio (v1.3.0+): FSL-1.1-MIT — see LICENSES.md
