This repository documents the analysis of Workflow Run RO-Crates (WRROC) converted from CWLProv RO Bundles using runcrate. The results of this analysis are also published on Zenodo: https://doi.org/10.5281/zenodo.12689424.
The analysis follows the same methodology as previous work, in which we conducted a qualitative evaluation of metadata coverage in CWLProv (version 0.6.0). This earlier analysis was based on concrete examples of ROs associated with a realistic bioinformatics workflow. Here, we repeated the analysis for Workflow Run RO-Crate, and compared the WRROC RDF representation (in ro-crate-metadata.json
) with the CWLProv RDF provenance graph.
We used the following approach and documented it in the Issues:
- Provenance metadata was classified into 6 categories: T1-6.
- For each category, we made an inventory of metadata that is contained in CWLProv RO Bundles (in RDF, and structured, non-RDF documents (
packed.cwl
, andprimary-job.json
/primary-output.json
)). - Subsequently, we assessed if and how this information is represented in Workflow Run RO-Crates converted by runcrate, based on a number of examples (see below).
- Finally, we provided suggestions how to represent metadata that is present in CWLProv but missing in RO-Crate.
Scenario 1: Analyze representation of CWL metadata fields, human agent, file characteristics, execution details
Scenario 4: Analyze representation of String
, File
, Directory
and File array
input parameters AND ResourceRequirement
Overview of the representation of each category of the provenance taxonomy, and its representation in RO-Crate. For a detailed explanation of each of the categories, see here: https://doi.org/10.5281/zenodo.7014950.
Explanation of the design of the workflow and its steps can be included in the CWL metadata fields (doc
, label
, intent
).
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: no representation
Explanation of the meaning of individual input/output data entities can be represented as structured annotations in the CWL input parameter file (not propagated to ro-crate-metadata.json
), but there is in the CWL standards v1.2 no clear guideline how to do these annotations.
- CWL (non-RDF): partial representation (structured annotations but no CWL metadata fields)
ro-crate-metadata.json
(RDF): no representation- CWLProv RDF: no representation
Workflow execution annotations (why was this combination of input parameters chosen?) can be represented as annotations in the CWL input parameter file (unstructured, not propagated to ro-crate-metadata.json
).
- CWL (non-RDF): unstructured representation
ro-crate-metadata.json
(RDF): no representation- CWLProv RDF: no representation
This information can be added in the CWL input parameter file as structured annotations, but there is in the CWL standards v1.2 no clear guideline how to do these annotations. I
- CWL (non-RDF): partial representation
ro-crate-metadata.json
(RDF): no representation- CWLProv RDF: no representation
Filename, checksum are represented for all files, creation timestamps are available for output files. Additional structured annotations may be made in the CWL input parameter file. Filename and checksum are propagated to ro-crate-metadata.json
.
- CWL (non-RDF): partial representation
ro-crate-metadata.json
(RDF): partial representation- CWLProv RDF: partial representation
The CWL standards v1.2 allow specification of a remote location for data, which would serve as access to a downloadable form of the data.
- CWL (non-RDF): partial representation
ro-crate-metadata.json
(RDF): no representation- CWLProv RDF: no representation
Mapping of input/output data to workflow parameters is represented in ro-crate-metadata.json
.
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: full representation
SoftwareRequirement
field is propagated to ro-crate-metadata.json
. SoftwareRequirement
contains specs
field with IRI, resolving to landing page with metadata about the tool (see CWL standards v1.2).
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: no representation
SoftwareRequirement
field is propagated to ro-crate-metadata.json
.
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: no representation
SoftwareRequirement
field is propagated to ro-crate-metadata.json
.
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: no representation
The workflow itself (packed.cwl
) is contained in the CWLProv RO Bundle, as well as the RO-Crate produced by runcrate. Metadata/documentation about the workflow can be represented in CWL metadata fields (doc
, label
, intent
), which are propagated to ro-crate-metadata.json
. ro-crate-metadata.json
also contains a description of the workflow and all its parameters and steps. The representation of the workflow in CWLProv RDF is incomplete.
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: partial representation
Information about the workflow parameters can be represented in the CWL metadata fields (doc
, label
, format
).
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: no representation
The CWL ResourceRequirement
field is partially propagated to ro-crate-metadata.json
(Scenario 4).
- CWL (non-RDF): full representation
ro-crate-metadata.json
(RDF): partial representation- CWLProv RDF: no representation
Absent.
Absent.
Container image is partially represented in CWL DockerRequirement
field, which is propagated to ro-crate-metadata.json
(Scenario 3).
- CWL (non-RDF): partial representation
ro-crate-metadata.json
(RDF): partial representation- CWLProv RDF: no representation
- CWL (non-RDF): no representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: full representation
Absent.
- CWL (non-RDF): no representation
ro-crate-metadata.json
(RDF): partial representation- CWLProv RDF: partial representation (only name and version, no IRI resolving to landing page with metadata)
- CWL (non-RDF): no representation
ro-crate-metadata.json
(RDF): full representation- CWLProv RDF: full representation