Skip to content

feat: add support for standalone cli #51

@PyDataBlog

Description

@PyDataBlog

Is your feature request related to a problem? Please describe.
The original C++ SimString distribution shipped with a CLI for indexing/searching corpora, but this project currently exposes only library bindings (Rust/Python). When experimenting or demoing functionality, users end up writing small bespoke programs just to insert strings or run similarity queries, which slows down exploration and onboarding.

Describe the solution you'd like
Add a first-class command-line interface (e.g., simstring-rs) that can build databases from text/JSON/CSV sources, persist them, and run interactive/batch queries using the existing measures and feature extractors. Ideally it would also expose options for configuring n-gram parameters, thresholds, and output formats so the CLI is feature-parity with the original project.

Describe alternatives you've considered

  • Writing ad-hoc Rust binaries that link simstring_rust for each experiment (works but wastes time and leads to duplicated code).
  • Using the Python bindings and building a CLI in Python (adds a Python runtime dependency and loses the ability to distribute a single native binary).

Additional context
This would restore parity with the legacy SimString UX, make it easier to validate new features against large corpora, and offer a friendlier entry point for users who just want searchable similarity without fully embedding the crate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions