Explanation of info- & pkg ordering #182

wolfv · 2022-12-20T09:31:41Z

I see that the recent release changed the order of info and pkg archives in the .conda format (it's mentioned in the Changelog as well). I tried to go through some PRs but couldn't find the reasoning for the change. Would be curious to hear why this was done :)

The text was updated successfully, but these errors were encountered:

mbargull · 2022-12-20T10:12:44Z

Would be good to document this, yes.
I can't say why it was changed. But a good explanation for it is that the outer archive is a Zip file. Hence, the outer archive's index is at the end of the file. So, if you put the info-*.tar.zst at the end too, you can fetch the metadata with a single fetch (from disk or (HTTP) server).
(In case of the former .tar.bz2 you'd want info at the beginning of the index-less tarball, of course.)

wolfv · 2022-12-22T13:46:51Z

Hmm, although you don't know beforehand how large the info.tar.zst file is, right? You mean one would fetch N bytes and hope that it covers both the zip-index and info.tar.zst part?

baszalmstra · 2022-12-22T14:00:03Z

Wouldn't it make much more sense to make sure that you put it at the start? If I understand zip correctly, every file in the zip is preceded by a zip local file header. If we would always put the info archive at the start of the zip, we could stream the contents of the entire file with a regular GET request. Since the local file header contains all the information you need. There would be no need to inspect the zips central directory at all, which would really simplify the handling. It would actually be similar to how the tar.bz2 files are handled currently.

Having the central directory of the zip at the end really makes things hard.

Obviously too late now because .conda files are already widespread. 🤷

dholth · 2023-01-05T19:04:07Z

conda-package-streaming has good support for reading partial remote zip archives, and using this to get the info out of a conda in a maximum of 3 remote requests, but it doesn't matter where the info is inside the zip.

It was done so that this transmute implementation https://github.com/conda/conda-package-streaming/blob/main/conda_package_streaming/transmute.py#L72 could buffer the usually-small info in memory while writing the pkg- directly to the zip archive.

There are streaming zip implementations for Python that ignore the central directory, but not the excellent standard library zipfile.

The order doesn't matter for conda-package-handling's create because it asks for a complete list of info and pkg members ahead of time. https://github.com/conda/conda-package-handling/blob/main/src/conda_package_handling/conda_fmt.py

mbargull added the type::documentation request for improved documentation label Dec 20, 2022

dholth closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation of info- & pkg ordering #182

Explanation of info- & pkg ordering #182

wolfv commented Dec 20, 2022

mbargull commented Dec 20, 2022

wolfv commented Dec 22, 2022

baszalmstra commented Dec 22, 2022

dholth commented Jan 5, 2023 •

edited

Loading

Explanation of info- & pkg ordering #182

Explanation of info- & pkg ordering #182

Comments

wolfv commented Dec 20, 2022

mbargull commented Dec 20, 2022

wolfv commented Dec 22, 2022

baszalmstra commented Dec 22, 2022

dholth commented Jan 5, 2023 • edited Loading

dholth commented Jan 5, 2023 •

edited

Loading