Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] Automaton File Compression support #85

Open
stevefan1999-personal opened this issue Sep 23, 2023 · 5 comments
Open

[Rust] Automaton File Compression support #85

stevefan1999-personal opened this issue Sep 23, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@stevefan1999-personal
Copy link
Contributor

stevefan1999-personal commented Sep 23, 2023

We can leverage SOF3/include-flate: A variant of include_bytes!/include_str! with compile-time deflation and runtime lazy inflation (github.com) for this. 

From my preliminary test, the C# grammar can achieve more than 90% compression ratio (7.57MB to 230KB), at the cost of larger runtime memory allocation.

I'm not sure if this could be a new research direction haha but I'm curious what the automatons now looks like after trie compression and huffman coding

@stevefan1999-personal
Copy link
Contributor Author

Yeeeeeeeeeepppppppp...savings are pretty huge

@stevefan1999-personal
Copy link
Contributor Author

I think I realized the reason why. 

Since we are using a lot of u32s, but we can't definitely use this much number of states. 

So simply speaking there are a bunch of sparsely-spanned zero bits, and text compression exactly like this kind of pattern!

@woutersl
Copy link
Member

Wow, this looks great. I'll try this out. The advantage is that the compression is done at compile-time of the generated code so this does not require support in .Net and Java.

@SOF3
Copy link

SOF3 commented Sep 27, 2023

just curious, was it really a well thought decision to use include_flate? while it significantly reduces the size of the static files, both the compressed data and the decompressed data (lazily allocated) remain in process memory without dropping. is it really that meaningful to produce a small binary but large runtime memory?

@woutersl
Copy link
Member

To clarify a little bit, this feature is not there yet. In addition, it will be gated behind a flag and disabled by default so the current behavior does not change, but users that care about binary size (at the detriment of runtime memory indeed) can take advantage of it.

@woutersl woutersl self-assigned this Oct 2, 2023
@woutersl woutersl added the enhancement New feature or request label Oct 2, 2023
@woutersl woutersl added this to the 5.0.0.release milestone Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants