Using a memmap for the dictionary #148

kirawi · 2024-02-03T15:10:18Z

Is your feature request related to a problem? Please describe.

It's harder to support lower-end hardware (with limited memory) particularly with bigger dictionaries.

Describe the solution you'd like

I would like the option to be able to use a memory map instead to refer to an uncompressed dictionary since storage is usually cheaper than memory. The application I need does not need extreme performance so I feel like the IO penalty would be acceptable. If the dictionary gets processed by Vibrato into something else, it would also be nice to be able serialize it to a file and memmap it as well. fst offers something like that: https://docs.rs/fst/latest/fst/#example-stream-to-a-file-and-memory-map-it-for-searching

Describe alternatives you've considered

None that I'm aware of.

Additional context

None

The text was updated successfully, but these errors were encountered:

kirawi · 2024-02-03T15:36:15Z

Actually, this might not make sense since it's a niche requirement. I'll explore it in my own fork though.

vbkaisetsu · 2024-02-07T02:44:20Z

@kirawi I think the Vaporetto tokenizer is a better choice for small devices.
Lattice based tokenizers (including Vibrato) require large dictionaries, while pointwise tokenizers (including Vaporetto) work with smaller models.

There is an example that works on STM32F3DISCOVERY.
https://github.com/daac-tools/vaporetto/tree/main/examples/embedded_device

kirawi closed this as completed Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a memmap for the dictionary #148

Using a memmap for the dictionary #148

kirawi commented Feb 3, 2024 •

edited

Loading

kirawi commented Feb 3, 2024

vbkaisetsu commented Feb 7, 2024

Using a memmap for the dictionary #148

Using a memmap for the dictionary #148

Comments

kirawi commented Feb 3, 2024 • edited Loading

kirawi commented Feb 3, 2024

vbkaisetsu commented Feb 7, 2024

kirawi commented Feb 3, 2024 •

edited

Loading