Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQ4 - environment/container file #12

Open
simleo opened this issue Jun 23, 2022 · 15 comments
Open

CQ4 - environment/container file #12

simleo opened this issue Jun 23, 2022 · 15 comments
Labels
Requirement Something we want to capture in the spec

Comments

@simleo
Copy link
Collaborator

simleo commented Jun 23, 2022

What is the environment/container file used in a specific workflow execution step?

Similar to the configuration file (#11) problem. Need env dump support from workflow engine.

@simleo simleo added the Requirement Something we want to capture in the spec label Jul 6, 2022
@stain
Copy link
Contributor

stain commented Jul 21, 2022

This could be a resolved Conda environment (conda export), a resolved Docker image #9 or others. Need sub-types per environment.

Renske have started modelling this for CWLprov.

@simleo
Copy link
Collaborator Author

simleo commented Oct 13, 2022

One complication here is how many stack levels to represent. E.g. a Conda environment running in a Docker container running on Kubernetes running on OpenStack running on an HPC cluster.

@dgarijo
Copy link
Contributor

dgarijo commented Oct 13, 2022

In https://github.com/osoc-es/c2t#ontology-diagram we addressed the basic representation of packages in a container (Docker). But this may be out of the scope of what we intend to do here.

I suggest allowing pointing out to the file that creates the container/environment, or pointing to the id and registry where the container is stored in.

@dgarijo
Copy link
Contributor

dgarijo commented Oct 13, 2022

Also, the command used to invoke the container/creating the environment is quite useful

@simleo
Copy link
Collaborator Author

simleo commented Oct 13, 2022

It's a bit confusing because we have #9 specifically for Docker images. I guess this is more general, maybe it should be split into multiple ones depending on the level of abstraction (minus Docker, since we already have #9).

@ilveroluca
Copy link
Contributor

Maybe #9 and this issue should be merged. It is necessary to reference container images; the discussion in #9 has so far been dedicated to this. CQ1/#9 is concerned with compiling a list of all images use by the run. That list can be compiled by collecting the images from all workflow execution steps as described by the spec part which will come out of this issue.

@GlassOfWhiskey
Copy link
Contributor

GlassOfWhiskey commented Oct 16, 2022

Yes but container images are not the only remote environment supported by WMSs. For example, StreamFlow can offload each step of a workflow to a different environment (a Cloud VM, a bare metal node, an HPC Queue manager, a local Docker, etc.). Plus, with the following release it will be possible to stack the things (e.g., a SLURM Queue Manager over an SSH-connected node, a Singularity container over a Queue Manager over SSH, etc.)

I fear that the pure schema.org ontology cannot represent these things effectively enough. I think that if we want to capture these scenarios we have to move to external ontologies. One example is the very recent GAIA-X Ontology. However, it is more provider-oriented than consumer-oriented. Indeed, the link between the resource and a software consumer seems to be missing.

@ilveroluca
Copy link
Contributor

Ok. I think we're mixing two orthogonal things though. On the one hand we have the environment in which the process is executed, while on the other we have the method through which the compute infrastructure is accessed to get resources and instantiate the environment.

The first is the essential part. For that the container image, or conda environment, or indeed VM image seems sufficient. Instead, for the latter to me it seems debatable whether it is even interesting enough to be captured in most cases.

Taking containers as an example, my point is that generally I would not care much whether you executed a container over SSH, over batch queue or over k8s; I still only need the container image and the command to reproduce what you did.

@simleo
Copy link
Collaborator Author

simleo commented Jan 16, 2023

We could model part of what CWL offers: https://www.commonwl.org/v1.2/CommandLineTool.html#Requirements_and_hints, especially SoftwareRequirement and ResourceRequirement (see RenskeW/runcrate-analysis#4 (comment))

@simleo
Copy link
Collaborator Author

simleo commented Feb 16, 2023

We could model part of what CWL offers: https://www.commonwl.org/v1.2/CommandLineTool.html#Requirements_and_hints, especially SoftwareRequirement and ResourceRequirement (see RenskeW/runcrate-analysis#4 (comment))

That's more about the prospective part actually. Unless we can make some sort of mapping from prospective to retrospective.

@kinow
Copy link
Member

kinow commented Dec 21, 2023

I attended a NIST conference a few days ago where they discussed FAIR containerized computational software. They created a manifest there: https://www.nist.gov/system/files/documents/2023/12/08/DayOne_Plugin-manifest_MyleneSimon_0.pdf

Maybe there's something interesting there for this issue too.

image

@stain
Copy link
Contributor

stain commented Apr 11, 2024

Could @jmfernandez have a look if there is a way to indicate type of environment file?

@stain
Copy link
Contributor

stain commented Apr 25, 2024

Nextflow uses both conda channels and list of packages. In Nextflow this is per step in the workflow. In other cases it can be an external environmental file like in Snakemake. CWL lists them in SoftwareRequirement but without conda channel.

@jmfernandez
Copy link
Contributor

About CWL, due its versatility there is no fixed way to represent it. The main path is the "SoftwarePackage", which is a definition hanging on SoftwareRequirement following different fashions.

So, the CWL engine implemenetation has to know what Conda is, it needs a "conda" mode, and it has to implement a dependency resolver which recognizes the prefixes from the conda package IRIs .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Requirement Something we want to capture in the spec
Projects
None yet
Development

No branches or pull requests

7 participants