New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CQ1 - Container image #9
Comments
For containers, capturing the digest (checksum) of the actual container run should be the minimum along the host OS w/ version (previous Linux kernel versions have had math bugs); CPU info (basically the contents of |
A One should be able to reference a container image in any remote repository. Also, it seems handy to be able to define what type of container image it is (e.g., Singularity, Docker, etc.). Both these requirements could be satisfied by using a full URI, where the scheme is used to identify the image type. This approach is also used by Snakemake. |
Thanks @ilveroluca. Following your link, it looks like Snakemake, in turn, accepts what's supported by Singularity. So the spec could say something like "values for |
The registry where the container is (e.g., Dockerhub, GitHub, etc.) is quite important here as well. I propose capturing it (in case a file is not used, just the id in that registry) |
I think the idea discussed yesterday was to capture, using separate properties:
|
Since the question is what container images were used by the run, the source entity should be As for the property used to link to the image, Reusing image does not feel quite right, since it's meant for pictures. We could define a For the image type we could use additionalType, and define For the registry we should define a custom property, which could be Referring to the previous comment, the problem with the "organization" bit is that it's not always an organization. Keeping as reference the For the image name we can use name, mapping to text like "debian", "biocontainer/samtools", etc. Note that the terminology is not always consistent in the Docker docs: e.g., what is referred to as "name" in the The tag needs a new custom property that we can call For the digest, we already have Here is a possible example: {
"@id": "#cb04c897-eb92-4c53-8a38-bcc1a16fd650",
"@type": "CreateAction",
"instrument": {"@id": "bam2fastq.cwl"},
...
"containerImage": {"@id": "#samtools-image"}
},
{
"@id": "#samtools-image",
"@type": "ContainerImage",
"additionalType": "DockerImage",
"registry": "docker.io",
"name": "biocontainers/samtools",
"tag": "v1.9-4-deb_cv1",
"sha256": "da61624fda230e94867c9429ca1112e1e77c24e500b52dfc84eaf2f5820b4a2a"
} |
While I think your proposal for ContainerImage will work in practice, I have some considerations. The only think that really identifies the image is the checksum. On the other hand, it's possible that images are mirrored in multiple locations, or that over time they migrate across repositories. Tags can also be reused (while this is not a best practice, it can happen). Also, I question the value added by splitting the image URL into its components (i.e., registry, name, tag). I would therefore consider defining a ContainerImage that: 1) uses a "simple" URL to references image locations; and 2) allows referencing secondary image locations. An example might look like this:
|
One problem with "https://docker.io/biocontainers/samtools:v1.9-4-deb_cv1" is that it does not represent a resource on the web: it leads to a "page not found" if entered on a browser and you cannot
The separate fields would allow the consumer to build the preferred pull syntax easily by joining the relevant parts, and also to perform more articulate queries (e.g., all images from That's for Docker images at least, since Singularity allows pulling by URL. |
I'm more in favor of this approach, since it describes in more details the image, thus you are getting richer metadata than can later be used. |
@stain any thoughts on this one? |
…esearchObject/workflow-run-crate#9 (comment) Also, common.py has been thinned, moving several declarations to their "natural" places. This has led to a major code reorganization, which has raised an issue unmarshalling some instances from the working directory state files. So, yaml loader has been taught how to deal with this mismatch.
As noted by Stian, |
What container images (e.g., Docker) were used by the run?
File
if the image is a tarball fromdocker save
The text was updated successfully, but these errors were encountered: