-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Is your feature request related to a problem? Please describe.
The original C++ SimString distribution shipped with a CLI for indexing/searching corpora, but this project currently exposes only library bindings (Rust/Python). When experimenting or demoing functionality, users end up writing small bespoke programs just to insert strings or run similarity queries, which slows down exploration and onboarding.
Describe the solution you'd like
Add a first-class command-line interface (e.g., simstring-rs) that can build databases from text/JSON/CSV sources, persist them, and run interactive/batch queries using the existing measures and feature extractors. Ideally it would also expose options for configuring n-gram parameters, thresholds, and output formats so the CLI is feature-parity with the original project.
Describe alternatives you've considered
- Writing ad-hoc Rust binaries that link
simstring_rustfor each experiment (works but wastes time and leads to duplicated code). - Using the Python bindings and building a CLI in Python (adds a Python runtime dependency and loses the ability to distribute a single native binary).
Additional context
This would restore parity with the legacy SimString UX, make it easier to validate new features against large corpora, and offer a friendlier entry point for users who just want searchable similarity without fully embedding the crate.