Hiro (Early Development)

Fast, lightweight, and relevance-ranked search for Luau datasets.

Hiro is a fast, client-optimized in-memory search engine for Luau datasets (like item catalogs, localization keys, or config records).

It combines:

Token normalization + tokenization
Bigram vocabulary indexing
Multi‑strategy token expansion (exact / prefix / fuzzy / single-char heuristic)
Simple weighted heuristic scoring

…to produce fast, relevance‑ranked search results suitable for real-time UI filtering inside a Roblox experience.

When To Use Hiro

Use Hiro when you need:

Fast local search/filter for medium-sized client datasets
Lightweight, dependency-free indexing
Extensible normalization / tokenization for domain-specific search behavior
Early fuzzy tolerance (misspellings / omissions: e.g. swrd → “Sword”)

Hiro isn't ideal (yet) for:

Massive datasets (tens of thousands of long documents)
Full semantic / phrase / positional search
Handling dynamic datasets without rebuilding
Outside of Roblox development (not tested yet)

Disclaimer

Hiro works in most cases, but hasn’t been fully tested for production use. You’re free to use it in any project, but please keep in mind that there may still be unexpected issues or edge cases. Contributions are welcome, feel free to open an issue or submit a pull request if you find something!

Quick Start

local Hiro = require(path.to.Hiro)

local DATASET = table.freeze {
    "Iron Sword",
    "Steel Shield",
    "Potion of Healing",
    "Magic Staff",
}

local Engine = Hiro.new(DATASET, { Keys = { "." } })

local Results = Engine:Search("swrd")
for i, hit in Results do
    print(i, hit.Item)
end

Example output:

1 Iron Sword

Key Pattern Semantics

Pattern	Meaning
`"."`	Use entire value directly if the dataset element itself is a string
`[fieldName]`	Index `item.[fieldName]` if it is a string
`[some].[nested].[path]`	Dotted traversal
`[path].*`	Collect all string leaf values under `item.[path]` (recursively), concatenate, normalize, index
`".*"`	For table documents: collect all string leaf values anywhere

All concatenated forms are joined with a space before normalization.

Options Reference

All major behaviors (tokenization, normalization, sorting) can be overridden by injecting custom functions.

Option	Type	Default	Description
`FuzzyThreshold`	number	`0.35`	Minimum Jaccard similarity
`EnablePrefix`	boolean	`true`	Enable prefix expansion for tokens length > 1
`AndLogic`	boolean	`false`	Intersection (AND) instead of union across tokens (under review)
`Keys`	`{string}`	`{".*"}`	Field patterns (see table above)
`Weights`	`{[string]: number}`	`nil`	Per-field weighting in scoring
`Tokenizer`	function?	`nil`	Custom tokenizer
`Normalizer`	function?	`nil`	Custom normalizer
`Sort`	function?	`nil`	Custom result ordering
`Limit`	number?	`nil`	Limit the result count after sorting

(DRAFT) Benchmark Results

Tested on a dataset of 66 values, producing:

242 unique bigrams
210 unique tokens

Performance (averages, Roblox Luau runtime):

Indexing: ~0.68 ms per init (dataset load + indexing)
Querying: ~0.02 ms per search (longer query)

Query length behavior

Single-character queries: Depends on how many bigrams share that starting character.
- Can be ~0.25× faster if rare (fewer candidates).
- Up to ~5× slower if very common (e.g. a).
Longer queries: Stay consistent, with only small ±0.002 ms variations.

These results align with the expected complexity profile. Even at this early stage, Hiro delivers sub-millisecond query times, suitable for real-time client-side search.

Roadmap

Search & Query Language

Add extended operators, like Fuse.js (exact =term, negation !term, prefix ^term)
Support logical grouping and phrase/proximity matching

Scoring & Relevance

Better heuristic scoring and configurable strategies
Field-level weights with more transparency

Performance & Indexing

Async/batched search for large sets
Incremental dataset updates (add/remove without full rebuild)

Developer Experience

Clearer stats (token/bigram counts, index size)
Structured result metadata (matched tokens, match kinds)
Expand unit tests and documentation
Benchmark tools for scaling scenarios

Internationalization

Stronger Unicode support
Preserve CJK characters properly

Limitations

Datasets must be fully reindexed after changes (no incremental updates yet)
Limited query operators (raw tokens only, for now)
Scoring is intentionally simple — designed for speed over sophistication

Contributing

Issues and PRs welcome. Please:

Open an issue describing intent (performance, feature, refactor).
Include benchmark diffs if performance-related.
Try to keep added dependencies zero.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
bin		bin
scripts		scripts
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.luaurc		.luaurc
LICENSE		LICENSE
README.md		README.md
default.project.json		default.project.json
rokit.toml		rokit.toml
test.project.json		test.project.json
wally.lock		wally.lock
wally.toml		wally.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hiro (Early Development)

When To Use Hiro

Disclaimer

Quick Start

Key Pattern Semantics

Options Reference

(DRAFT) Benchmark Results

Roadmap

Search & Query Language

Scoring & Relevance

Performance & Indexing

Developer Experience

Internationalization

Limitations

Contributing

About

Uh oh!

Languages

License

Smiling-Axolotl/Hiro

Folders and files

Latest commit

History

Repository files navigation

Hiro (Early Development)

When To Use Hiro

Disclaimer

Quick Start

Key Pattern Semantics

Options Reference

(DRAFT) Benchmark Results

Roadmap

Search & Query Language

Scoring & Relevance

Performance & Indexing

Developer Experience

Internationalization

Limitations

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages