Skip to content

[wasmfs] Add support for squashfs #24670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

martenrichter
Copy link

This is an attempt to add support for squashfs to wasmfs.

There are a couple of questions that I need to find out during the review:

  1. Should support for squashfs be included in emscripten, or should it be an external port?
  2. Licensing: It uses libsquashfs to support squashfs. The license of libsquashfs is LGPLv3, so it may be that it should only be included by using a flag? (As it is not compatible with the standard emscripten licenses, or is it?)
  3. Distribution of source files: I have placed the wasmfs-related code under the 'system' directory, alongside the other backends. But as libsquashfs is implemented as a port, this may be inconsistent. So where should it go? All inside the port? Or libsquashfs also under system?
  4. I may have missed some of the magic of emscripten to include only the necessary bits and pieces of code, so please point me towards them.
  5. I do not fully understand if I have to add some js glue code for a wasmfs backend. If so, please tell me.

The primary motivation on my side is to avoid decompressing large file systems in Jupyter Lite deployments, which may or may not result in a speedup. (Potentially using the fetch backend for the squashfs).

@kripken
Copy link
Member

kripken commented Jul 10, 2025

First, it's good to see that it is practical to add a WasmFS backend like this!

Otherwise, I would guess that this makes more sense as an external/contrib port, that is, with the main code out of tree, like e.g. glfw3, both because of the licensing and that I'm not sure how much general interest there is for it. I don't feel strongly though, curious to hear thoughts from @tlively @sbc100

How would this compare to LZ4, btw? We have such a backend for the old FS.

@martenrichter
Copy link
Author

First, it's good to see that it is practical to add a WasmFS backend like this!

Otherwise, I would guess that this makes more sense as an external/contrib port, that is, with the main code out of tree, like e.g. glfw3, both because of the licensing and that I'm not sure how much general interest there is for it. I don't feel strongly though, curious to hear thoughts from @tlively @sbc100

Also, I am not sure what the best approach would be, but if you could take a look at the code anyway, and let me know if I've missed anything, that would also be helpful.

How would this compare to LZ4, btw? We have such a backend for the old FS.

I did not know about LZ4. However, the idea is mainly for python packages for jupyter lite. So that it can be generated outside of the tool chain. So my idea is that a squashfs may be backed by the fetch backend of wasmfs for the sqfs file can have benefits, since you do not have to download the whole file.
However, I am not yet at the point of benchmarking it.

Currently, I have only included zlib, which may yield performance not as good as LZ4 for compression.

@kripken
Copy link
Member

kripken commented Jul 11, 2025

but if you could take a look at the code anyway, and let me know if I've missed anything, that would also be helpful.

I looked now. Looks pretty good to me. The only two thoughts I have are

  • For errors during backend creation, reporting them from the C++ constructor of the backend can't work, as your comments say. But maybe wasmfs_create_squashfs_backend can do something other than just create an instance of the backend, like call a static method that returns the backend or an error. Then wasmfs_create_squashfs_backend could at least return null on error.
  • A simpler approach might be to just apply compression on data files, not directories. That is what the old LZ4 backend did: metadata worked normally, but the contents of data files were compressed on the fly. (But that isn't good enough if you are worried about keeping long directories in memory.)

@martenrichter
Copy link
Author

martenrichter commented Jul 11, 2025

but if you could take a look at the code anyway, and let me know if I've missed anything, that would also be helpful.

I looked now. Looks pretty good to me. The only two thoughts I have are

Thanks!

  • For errors during backend creation, reporting them from the C++ constructor of the backend can't work, as your comments say. But maybe wasmfs_create_squashfs_backend can do something other than just create an instance of the backend, like call a static method that returns the backend or an error. Then wasmfs_create_squashfs_backend could at least return null on error.

That is a good idea! I can just look, if it is inited and tear it down otherwise.

  • A simpler approach might be to just apply compression on data files, not directories. That is what the old LZ4 backend did: metadata worked normally, but the contents of data files were compressed on the fly. (But that isn't good enough if you are worried about keeping long directories in memory.)

Sure, but my motivation comes from Jupyter Lite. I use the python (I think C/C++ may also work) in my lectures as applets, and the startup times are not great (30 s and more). As it basically emulates a python environment it has to keep the directory and file structure as it is for the standard setup. Currently, the implementation downloads a tar.gz file and extracts it. My idea is that not all files are needed for startup, and I may benefit from using squashfs (probably sorting files may also be important).

@martenrichter
Copy link
Author

martenrichter commented Jul 12, 2025

Well, I am trying to move the wasm backend code to a port (as step towards an external port).
But I am wondering, the wasmfs internals are not exposed in the public wasmfs.h header file.
Is it currently possible to create a backend outside system?

EDIT: I would say it can't go outside system.
So the external port only for libsquashfs?

@martenrichter
Copy link
Author

Ok, I can find the upstream code via EMSDK environment variable

@martenrichter
Copy link
Author

martenrichter commented Jul 12, 2025

Move the wasmfs code from system to port to prepare for conversion to an external port. (Also, this only includes LGPL code if the ports are included).

@martenrichter
Copy link
Author

martenrichter commented Jul 12, 2025

External port is now at:
https://github.com/martenrichter/emscripten_wasm_squashfs
I leave this here, in case a different opinion about integration results after discussion.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally this kind of work could be done completely out-of-tree, with no changes to emscripten itself, except perhaps to make all the required parts of wasmfs API exposed such that extensions like this can be authored based only on the public headers.

// 1 = use emscripten_wasmfs_sqaushfs from emscripten-ports
// Alternate syntax: --use-port=emscripten_wasmfs_sqaushfs
// [compile+link]
var USE_EMSCRIPTEN_WASMFS_SQUASHFS = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new port we don't add new settings like. Instead we just make them available via --use-port.

@martenrichter
Copy link
Author

I think ideally this kind of work could be done completely out-of-tree, with no changes to emscripten itself, except perhaps to make all the required parts of wasmfs API exposed such that extensions like this can be authored based only on the public headers.

In the external repository, I was able to compile it out of tree. But I had to include the internal headers of wasmfs via a hack.
In principle, all base objects have to be available via a public header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants