Fast, lightweight, and relevance-ranked search for Luau datasets.
Hiro is a fast, client-optimized in-memory search engine for Luau datasets (like item catalogs, localization keys, or config records).
It combines:
- Token normalization + tokenization
- Bigram vocabulary indexing
- Multi‑strategy token expansion (exact / prefix / fuzzy / single-char heuristic)
- Simple weighted heuristic scoring
…to produce fast, relevance‑ranked search results suitable for real-time UI filtering inside a Roblox experience.
Use Hiro when you need:
- Fast local search/filter for medium-sized client datasets
- Lightweight, dependency-free indexing
- Extensible normalization / tokenization for domain-specific search behavior
- Early fuzzy tolerance (misspellings / omissions: e.g.
swrd→ “Sword”)
Hiro isn't ideal (yet) for:
- Massive datasets (tens of thousands of long documents)
- Full semantic / phrase / positional search
- Handling dynamic datasets without rebuilding
- Outside of Roblox development (not tested yet)
Hiro works in most cases, but hasn’t been fully tested for production use. You’re free to use it in any project, but please keep in mind that there may still be unexpected issues or edge cases. Contributions are welcome, feel free to open an issue or submit a pull request if you find something!
local Hiro = require(path.to.Hiro)
local DATASET = table.freeze {
"Iron Sword",
"Steel Shield",
"Potion of Healing",
"Magic Staff",
}
local Engine = Hiro.new(DATASET, { Keys = { "." } })
local Results = Engine:Search("swrd")
for i, hit in Results do
print(i, hit.Item)
endExample output:
1 Iron Sword
| Pattern | Meaning |
|---|---|
"." |
Use entire value directly if the dataset element itself is a string |
[fieldName] |
Index item.[fieldName] if it is a string |
[some].[nested].[path] |
Dotted traversal |
[path].* |
Collect all string leaf values under item.[path] (recursively), concatenate, normalize, index |
".*" |
For table documents: collect all string leaf values anywhere |
All concatenated forms are joined with a space before normalization.
All major behaviors (tokenization, normalization, sorting) can be overridden by injecting custom functions.
| Option | Type | Default | Description |
|---|---|---|---|
FuzzyThreshold |
number | 0.35 |
Minimum Jaccard similarity |
EnablePrefix |
boolean | true |
Enable prefix expansion for tokens length > 1 |
AndLogic |
boolean | false |
Intersection (AND) instead of union across tokens (under review) |
Keys |
{string} |
{".*"} |
Field patterns (see table above) |
Weights |
{[string]: number} |
nil |
Per-field weighting in scoring |
Tokenizer |
function? | nil |
Custom tokenizer |
Normalizer |
function? | nil |
Custom normalizer |
Sort |
function? | nil |
Custom result ordering |
Limit |
number? | nil |
Limit the result count after sorting |
Tested on a dataset of 66 values, producing:
- 242 unique bigrams
- 210 unique tokens
Performance (averages, Roblox Luau runtime):
- Indexing: ~0.68 ms per init (dataset load + indexing)
- Querying: ~0.02 ms per search (longer query)
Query length behavior
- Single-character queries: Depends on how many bigrams share that starting character.
- Can be ~0.25× faster if rare (fewer candidates).
- Up to ~5× slower if very common (e.g.
a).
- Longer queries: Stay consistent, with only small ±0.002 ms variations.
These results align with the expected complexity profile. Even at this early stage, Hiro delivers sub-millisecond query times, suitable for real-time client-side search.
- Add extended operators, like Fuse.js (exact
=term, negation!term, prefix^term) - Support logical grouping and phrase/proximity matching
- Better heuristic scoring and configurable strategies
- Field-level weights with more transparency
- Async/batched search for large sets
- Incremental dataset updates (add/remove without full rebuild)
- Clearer stats (token/bigram counts, index size)
- Structured result metadata (matched tokens, match kinds)
- Expand unit tests and documentation
- Benchmark tools for scaling scenarios
- Stronger Unicode support
- Preserve CJK characters properly
- Datasets must be fully reindexed after changes (no incremental updates yet)
- Limited query operators (raw tokens only, for now)
- Scoring is intentionally simple — designed for speed over sophistication
Issues and PRs welcome. Please:
- Open an issue describing intent (performance, feature, refactor).
- Include benchmark diffs if performance-related.
- Try to keep added dependencies zero.