Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace 7z with gzip #3824

Open
StefanKarpinski opened this issue Mar 4, 2024 · 13 comments
Open

replace 7z with gzip #3824

StefanKarpinski opened this issue Mar 4, 2024 · 13 comments
Assignees
Labels

Comments

@StefanKarpinski
Copy link
Sponsor Member

It seems from reports on discourse that 7z is banned for security reasons at many organizations. We used to use a lot of its functionality, but now we only use it in ways that gzip can replace. We should do that. The changes to Pkg are actually trivial, the most annoying part is removing 7z as a Base dependency and adding gzip.

@stevengj
Copy link
Member

stevengj commented Mar 4, 2024

Or something more modern like zstd?

@giordano
Copy link
Contributor

giordano commented Mar 4, 2024

But we want to be able to unpack more than gzipped tarballs, no?

if !(ext in ["tar", "gz", "tgz", "bz2", "xz"])

@StefanKarpinski
Copy link
Sponsor Member Author

Or something more modern like zstd?

Adding support for zstd would be fine, but we can't drop support for gzip in the client because there are private package servers in the wild that will continue serving gzip tarballs for the foreseeable future and we also can't drop gzip as a format on servers because they have to keep serving to older Julia versions that expect gzip. So adding zstd would force servers to generate and serve both gzip and zstd. I'm also not convinced that zstd is such a huge win for our use case. I haven't seen massively better compression ratios. The compression time is better, but since we'll have to do gzip and zstd on the server, that doesn't actually save us any compute. Decompression time is also better, but I don't think decompression time is a meaningful bottleneck. My inclination is to add support for negotiating the compression format so that at some point in the future we can migrate over.

But we want to be able to unpack more than gzipped tarballs, no?

I'm not sure how big an issue this is in practice (i.e. how much do we actually use/rely on this), but what we can do is ship gzip with Julia itself and then dynamically install JLLs for the other compression formats as required. That allows us to bootstrap from gzip to install and handle other compression formats as needed.

@mnemnion
Copy link

mnemnion commented Mar 6, 2024

Bundling libarchive would cover gzip, and leave plenty of flexibility for switching to another, or decoding existing data in other formats.

@StefanKarpinski
Copy link
Sponsor Member Author

I'm a bit hesitant to swap out one swiss-army-knife dependency that handles multiple compression formats but isn't the standard tool that's guaranteed to be patched and maintained and allowed by sysadmins for another one with the exact same description. Of course, libarchive looks much better maintained than 7z, but you get my point. It also seems likely that gzip is going to be faster and more efficient than anything else.

@stevengj
Copy link
Member

stevengj commented Mar 6, 2024

Adding support for zstd would be fine, but we can't drop support for gzip in the client

The zstd web page says that its library/command-line tool supports .gz files? Or does it do that by linking another library?

Looks like zstd include a minigzip.c implementation, but maybe it requires zlib?

But it seems safer/simpler to stick with gzip alone if there's not a compelling reason to support other formats.

@mnemnion
Copy link

mnemnion commented Mar 6, 2024

Of course, libarchive looks much better maintained than 7z, but you get my point.

Worth noting that libarchive has been integrated into Windows 11, which has implications both for continuity of maintenance and the likelihood that sysadmins will green-light it. I can't speak to gzip's speed and efficiency vs. libarchive, but it's clearly both a more narrowly-tailored tool and some of the most battle-tested code in existence.

@StefanKarpinski
Copy link
Sponsor Member Author

Ok, seems like we've got a few solid options here—gzip, zstd (with built-in gzip), or libarchive.

@nsajko
Copy link

nsajko commented Mar 9, 2024

A pure Julia solution would allow decreasing the amount of external dependencies of the Julia install. The memory-safety would also look better security-wise, given no @inbounds or other unchecked operations.

@StefanKarpinski
Copy link
Sponsor Member Author

If it's implemented in Julia we have to build it into the Julia sysimg, which we would rather not do. It's also safer to do decompression in an external transient process. Ideally, we would put the decompression process in a jail where it can't open files and all it can do is read from stdin and write to stdout. Putting it in the main Julia process is the opposite of what we want from a security perspective.

@nhz2
Copy link
Contributor

nhz2 commented May 15, 2024

There is a new version of 7z that supports unpacking zstd https://github.com/ip7z/7zip/releases/tag/24.05

@giordano
Copy link
Contributor

Note that we use https://github.com/p7zip-project/p7zip, that's a different project.

@nhz2
Copy link
Contributor

nhz2 commented May 16, 2024

From what I can tell we are already using 7zip 23.01 for Windows instead of p7zip. According to https://github.com/ip7z/7zip/blob/24.05/DOC/readme.txt compiling 7zip for Unix is possible now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants