-
-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy artifact without unpacking (non-tarball) #2764
Comments
The DataDeps.jl package (https://github.com/oxinabox/DataDeps.jl) might be a good solution for your use case. |
In theory we could look at the magic bytes to see if it is a gzipped file, otherwise, assume it is uncompressed..? |
Another layer of compression shouldn't really hurt though, and you can use |
But then you need to rehost the files. |
Agreed, there are many dataset hosting providers which expect you to upload the file directly, rather than uploading a tarball wrapping a file. |
If we allow artifacts to be arbitrary container and non-container formats with arbitrary compression schemes, there's really a never-ending stream of features that would have to be added, which is not something I think it's acceptable to do with a feature like artifacts that's built into the package manager. Consider something apparently simple like allowing artifacts to be just a single file. This seems straightfoward enough: you just use the git blob hash of the file as its content address and put the file at the the artifact path instead of an extracted artifact directory like we do currently. So the path to this file will be something like Different compression and container formats are more reasonable, imo, since they only complicate the model of how to deliver an artifact, rather than complicating the model of what an artifact is. The main issue with that is that Pkg needs to be able to extract other container formats. Julia is shipped with the dependencies required to decompress and extract tarballs, but we don't really want to add more dependencies to Julia for every format someone happens to want to use. But we could have a plugin system where a download stanza specifies a registered package/function for handling the content of the download stanza, and then lets the package acquire the artifact content however one wants. For example, we could support downloading a single file something like this: [data_mat]
git-tree-sha1 = "83f7499f0e79ac39a1a34d3e6ac119f5389ee66d"
[[data_mat.download]]
plugin = "FileArtifacts"
url = "https://example.com/path/to/data.mat"
sha256 = "ab2332e1005836afb236bf8515adf1b0522b640a51c9b8a401d64e3f5fc4478c" What this would do is use the package called
The end result is that This is the way forward, but I'm not sure I really want to do this. Among other things, this would entail either not serving such artifacts through the package server system, or running arbitrary package code for artifact downloading in the package server system. Neither option is super appealing to me. We could maybe approve specific packages as "blessed" downloaders that we allow running on the package servers. |
Isn't this exactly what is required now already from a user's perspective? In order to access anything from an artifact, the user has to
I think I don't quite understand what you mean. If two artifact files (does this refer to "descriptors", i.e. The considerations you described sound more like implementation details to me -- no offend. All I am asking for is an option to skip a certain part of the download/registration/creation process of an artifact, namely archive inflation. I am not questioning what an artifact is. An artifact remains a single file before and during download (a compressed or un-compressed tar-ball, or an arbitrary file) which becomes a content-addressed directory. This doesn't change at all. And from a user's perspective it doesn't change either. The user shouldn't need to care how the content-hash comes to be, because a user never gets in touch with it anyway. This is a detail hidden within |
I would like to "deliver" some
mat
data set that I need during package testing as an artifact. The data set happened to be hosted already, though as is and not as atar.gz
. IIRC.mat
support compression on their own, so wrapping them in atar.gz
feels odd.How do I declare a
lazy
artifact (containing only a single file) that doesn't need to be unpacked?If that's not possible (yet), I would like to propose to add a new keyword
unpack
(default:true
which matches the current behavior) to theArtifacts.toml
.Somewhat related:
The text was updated successfully, but these errors were encountered: