Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary size gets bloated by mapping tables #36

Closed
d3lm opened this issue Sep 7, 2023 · 6 comments
Closed

Binary size gets bloated by mapping tables #36

d3lm opened this issue Sep 7, 2023 · 6 comments

Comments

@d3lm
Copy link
Member

d3lm commented Sep 7, 2023

I have noticed that, when compiling Ada to a Wasm module (via the wasi-sdk), there's a lot of bloat coming from data sections likely the mapping tables. Here's the output from twiggy when profiling the code size of the Wasm module:

image

As you can see, more than 60% is just data.

I was wondering if there's a way to minimze the bloat from these tables or if there's other ways to reduce the binary size.

CC @lemire @anonrig

@d3lm d3lm changed the title Reducing binary size Binary size gets bloated by mapping tables Sep 7, 2023
@lemire
Copy link
Member

lemire commented Sep 11, 2023

Pull request invited !

@lemire
Copy link
Member

lemire commented Sep 11, 2023

The binary should be about 280KB. For context, ICU (a standard dependency which we avoid) is over 40 MB or nearly 150 times larger.

@d3lm
Copy link
Member Author

d3lm commented Sep 12, 2023

Yes that's true. ICU is a lot bigger. I don't know from the top of my head how we could avoid that "bloat", except for relying on host functionality if compiled to Wasm / WASI. So that we don't have to include the tables, and toUnicode and toAscii is something that could be provided by the host?

@lemire
Copy link
Member

lemire commented Sep 12, 2023

After standard compression, the entire ada library (including idna) is well under 200 KB. The compressed WASM is about 100 KB. This is much less than the JavaScript aggregation of most websites. A typical website these days require megabytes of downloads (with caching, of course). It is much, much smaller than ICU which is already almost everywhere.

Except for relying on host functionality if compiled to Wasm / WASI. So that we don't have to include the tables, and toUnicode and toAscii is something that could be provided by the host?

What is contained in your host?

Ada (in its entirety) is distributed with every version of Node.js. Why don't you just distribute ada as binary?

If you are concerned with the 100KB WASM download and you cannot install ada locally, why don't you cache it?

@d3lm
Copy link
Member Author

d3lm commented Sep 13, 2023

Yea, the reason I am mostly concerned is not the compressed size. It's small as you said, but for us, we have a multi-process architecture, and userland can spawn any number of processes or threads. Each process instantiates a WebAssembly module that includes ada, which means every process has to parse it, and every process then basically includes its own set of mapping tables. But there is also no standard Web API that we could use to provide toUnicode or toAscii other than implementing this in JS and that would also mean we'd have to have some mapping tables in every process as well.

@lemire
Copy link
Member

lemire commented Sep 13, 2023

Each process instantiates a WebAssembly module that includes ada

Even processes that do not need ada? How is ada different, as a library, from any other library like, say, libicu, libzstd or libssh? Or the rust-url library for example?

As far as I know, ada/idna is the most compact library available currently with this functionality.

I am going to close this. You are invited to provide code or specific actionable advice on how to make this library smaller. A pull request would be appreciated.

@lemire lemire closed this as completed Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants