Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable repodata.json.zst by default #13256

Closed
2 tasks done
dholth opened this issue Oct 27, 2023 · 7 comments · Fixed by #13452 · May be fixed by #13273
Closed
2 tasks done

Enable repodata.json.zst by default #13256

dholth opened this issue Oct 27, 2023 · 7 comments · Fixed by #13452 · May be fixed by #13273
Labels
source::anaconda created by members of Anaconda, Inc. type::feature request for a new feature or capability

Comments

@dholth
Copy link
Contributor

dholth commented Oct 27, 2023

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

repodata.json.zst is faster to download than repodata.json and will always be a win over the latter. Depending on your network versus disk bandwidth it may be faster than jlap, and it is usable on the first request e.g. in a CI system when repodata.json is not yet cached. Enable repodata.json.zst by default.

We could use the jlap network code, that already knows how to fetch from alternate URL's depending on whether .zst and jlap is available or not, and add a separate "zst but not jlap" flag.

Why is this needed?

@dholth dholth added type::feature request for a new feature or capability source::anaconda created by members of Anaconda, Inc. labels Oct 27, 2023
@jezdez
Copy link
Member

jezdez commented Oct 28, 2023

Let's first conduct a review of the current flagged feature and review how JLAP adoption in mamba and rattler project has been going before we commit to shipping it by default. E.g. in case there were learnings that we should apply to the CEP before rollout.

@jezdez
Copy link
Member

jezdez commented Oct 28, 2023

@baszalmstra
Copy link

@jezdez I guess this issue is about whether to use zstd compressed repodata by default. Not about JLAP?

We have been using zstd compressed repodata in rattler (and thus pixi) since the start. I think mamba is also doing this? (@AntoinePrv ) This has drastically reduced the wait time for repodata.

I think the biggest file is the conda-forge linux-64 zstd file which is about +-28MB. For reference, on a fairly typical 100mbit connection this takes about 2.5-3 seconds to download and decompress, which I think is fairly fast. I have a 500mbit internet connection so downloading and decompressing takes 0.5-1 second!

From what I understood the on the fly gzip compressed file is rate limited which is why it takes longer to download.

I think it makes total sense to enable this by default!

@baszalmstra
Copy link

@jezdez We have been discussing JLAP in the CEP here.

@AntoinePrv
Copy link
Contributor

@dholth
Copy link
Contributor Author

dholth commented Oct 28, 2023

Yes, this is a separate issue from "enable jlap by default" because repodata.json.zst is an overwhelming win over Content-Encoding: gzip. Since the jlap download code also understands everything about repodata.json.zst and repodata_has_zst we could consider having an option for "zstd but not jlap", for users who are not sure about jlap's read-modify-write.

The community learned that repodata.json (uncompressed) is faster than repodata.json (Content-Encoding: gzip) if your bandwidth is > ~300Mbps.

We may have to consider whether anaconda.org's on-the-fly zstd compression on non-CDN-mirrored channels loads that server. It will be much less server load than the old on-the-fly bzip2 compression however.

@dholth
Copy link
Contributor Author

dholth commented Dec 18, 2023

% time curl --compressed https://conda.anaconda.org/conda-forge/linux-64/repodata.json > /dev/null
100 35.1M    0 35.1M    0     0  7326k      0 --:--:--  0:00:04 --:--:-- 7623k
curl --compressed  > /dev/null  1.29s user 0.33s system 32% cpu 4.930 total
% time curl --compressed https://conda.anaconda.org/conda-forge/linux-64/repodata.json.zst > /dev/null
100 29.8M  100 29.8M    0     0  16.5M      0  0:00:01  0:00:01 --:--:-- 16.6M
curl --compressed  > /dev/null  0.32s user 0.10s system 22% cpu 1.828 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source::anaconda created by members of Anaconda, Inc. type::feature request for a new feature or capability
Projects
Archived in project
4 participants