Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft CEP for .conda package format #42

Open
jakirkham opened this issue Nov 18, 2022 · 7 comments
Open

Draft CEP for .conda package format #42

jakirkham opened this issue Nov 18, 2022 · 7 comments

Comments

@jakirkham
Copy link
Member

It would be good to have a CEP that spells out what is in the .conda format as this is missing atm. Especially as we increasingly rely on this and depend on a few tools to manage reading and writing these. Currently the info we have, which could be used for this CEP is...

Would be good to pull this together to provide a single point of truth.

Independently there are some things that we might want to consider to amend the specification like generating/reusing a Zstandard dictionary for faster and more compact compression/decompression and have per file format dictionaries (text files may benefit a lot from this for example).

@leofang
Copy link

leofang commented Nov 20, 2022

It'd be nice to also get this page updated: https://docs.conda.io/projects/conda-build/en/latest/resources/package-spec.html

@jakirkham
Copy link
Member Author

Would suggest raising a new conda-build doc issue

@dholth
Copy link
Contributor

dholth commented Nov 22, 2022

So .conda packages are ZIP-format containers with a metadata.json file containing just the version number, and then an info and pkg file that are always .tar.zst even though some earlier documentation hoped to support "any libarchive filter". The order of metadata, info and pkg inside the ZIP does not matter.

Put together the pkg- and info- tarballs have exactly the same contents as old-format .tar.bz2 conda packages. Generally the info/ subdirectory of a .tar.bz2 package goes into the info- tarball of a .conda.

conda-package-handling uses a list of regular expressions to determine which files go into info/, but this list excludes some files that obviously belong in info/ - for example info/LICENSE vs info/LICENSE.txt. We should audit the existing packages to see whether we can drop this behavior and simply include info/ wholesale. Do packages include significant application data in info/ (besides test data, which is already intentionally in info/)?

A regular conda install unpacks both inner .tar.zst and does not use the "easy to inspect just the metadata" feature provided by the info/pkg split. This is still good, because zst is much, much faster to extract compared to bz2.

We might want to standardize whether info- or pkg- gets extracted first, or enforce that one cannot overwrite the other (that no filename appears in both inner tarballs).

Separate from the .conda container is the shared question of what the metadata looks like. This probably has to be a different, longer document.

@jakirkham
Copy link
Member Author

Forget where this was discussed atm, but recall one point of confusion was whether conda_pkg_format_version should be an int or a str. Would be nice to resolve this as part of this work

@jaimergp
Copy link
Contributor

jaimergp commented May 4, 2023

We might want to standardize whether info- or pkg- gets extracted first, or enforce that one cannot overwrite the other (that no filename appears in both inner tarballs).

Yea, clobbered files in info/ (i.e. package overwrites conda metadata) should be prevented with an error by conda-build (and alike) before the artifact is generated.

@dholth
Copy link
Contributor

dholth commented May 4, 2023

I don't think the normal way of creating .conda can create clobbered files. It takes a list of filenames and categorizes them into two groups. The check would need to be on extraction.

@jaimergp
Copy link
Contributor

jaimergp commented May 4, 2023

No, but conda-build can infer which files have gotten into info/ and flag those that would result in a clobber error, I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants