Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQ11 - Parameter connections #25

Closed
simleo opened this issue Aug 2, 2022 · 4 comments · Fixed by #35
Closed

CQ11 - Parameter connections #25

simleo opened this issue Aug 2, 2022 · 4 comments · Fixed by #35
Labels
Requirement Something we want to capture in the spec
Milestone

Comments

@simleo
Copy link
Collaborator

simleo commented Aug 2, 2022

Knowing how workflow parameters were passed to individual tools is important to find out how they affected the outputs.

We are currently linking workflow and tool parameters with connectedTo from the source tool / workflow to the target tool / workflow. For instance, in revsort:

graph

we currently have:

{
    "@id": "packed.cwl#revtool.cwl",
    "@type": "SoftwareApplication",
    "input": [
        {"@id": "packed.cwl#revtool.cwl/input"}
    ],
    "output": [
        {"@id": "packed.cwl#revtool.cwl/output"}
    ]
},
{
    "@id": "packed.cwl#sorttool.cwl",
    "@type": "SoftwareApplication",
    "input": [
        {"@id": "packed.cwl#sorttool.cwl/reverse"},
        {"@id": "packed.cwl#sorttool.cwl/input"}
    ],
    "output": [
        {"@id": "packed.cwl#sorttool.cwl/output"}
    ]
},
{
    "@id": "packed.cwl#revtool.cwl/output",
    "@type": "FormalParameter",
    "connectedTo": {"@id": "packed.cwl#sorttool.cwl/input"}
}

but that's inaccurate, since such links only exist within the revsort workflow. packed.cwl#revtool.cwl and packed.cwl#sorttool.cwl represent standalone software tools that happen to be connected this way in revsort, but might be used differently in another workflow.

@simleo
Copy link
Collaborator Author

simleo commented Aug 2, 2022

We need something like:

{
    "@id": "packed.cwl#main/sorted",
    "@type": "HowToStep",
    "position": "1",
    "workExample": {"@id": "packed.cwl#sorttool.cwl"},
    "parameterConnections": [
        {"@id": "#pc1"},
        ...
    ]
},
{
    "@id": "#pc1",
    "@type": "ParameterConnection",
    "source": {"@id": "packed.cwl#revtool.cwl/output"},
    "target": {"@id": "packed.cwl#sorttool.cwl/input"}
}

@simleo simleo added the Requirement Something we want to capture in the spec label Aug 31, 2022
@simleo simleo added this to the 0.1 milestone Sep 28, 2022
@simleo
Copy link
Collaborator Author

simleo commented Sep 29, 2022

See proposal in ResearchObject/ro-terms#12. I changed the property's name from parameterConnections to connections and its domain from HowToStep to ComputationalWorkflow because:

  • "External" connections (from workflow parameter to tool parameter) belong in the workflow anyway, so we'd have to modify two classes
  • This should make them easier to consume, since they're not scattered among multiple entities
  • This way, at some point, we could propose them for a Bioschemas extension (unlikely we'd ever get that on Schema.org's HowToStep

@simleo
Copy link
Collaborator Author

simleo commented Sep 30, 2022

Also changed source to sourceParameter and target to targetParameter: they're more specific and there is no clash with http://schema.org/target.

@simleo
Copy link
Collaborator Author

simleo commented Oct 20, 2022

The problem with ResearchObject/ro-terms#12 (implemented in #29) is that all links to parameter connections are in the workflow. While this makes sense, it may not be enough to derive the actual data flow, especially when a tool is reused in different steps.

As an example, consider this workflow for getting a sorted list of top level domains given a list of hostnames as input. It uses a CWL equivalent of the classic rev | cut -f 1 | rev trick to work around cut's inability to select the last field:

graph

The connections for this workflow are:

{
    "@id": "#pc1",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#revtool.cwl/rev_out"},
    "targetParameter": {"@id": "packed.cwl#cuttool.cwl/cut_in"}
},
{
    "@id": "#pc2",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#main/hostnames"},
    "targetParameter": {"@id": "packed.cwl#revtool.cwl/rev_in"}
},
{
    "@id": "#pc3",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#cuttool.cwl/cut_out"},
    "targetParameter": {"@id": "packed.cwl#revtool.cwl/rev_in"}
},
{
    "@id": "#pc4",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#main/reverse_sort"},
    "targetParameter": {"@id": "packed.cwl#sorttool.cwl/reverse"}
},
{
    "@id": "#pc5",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#revtool.cwl/rev_out"},
    "targetParameter": {"@id": "packed.cwl#sorttool.cwl/sort_in"}
},
{
    "@id": "#pc6",
    "@type": "ParameterConnection",
    "sourceParameter": {"@id": "packed.cwl#sorttool.cwl/sort_out"},
    "targetParameter": {"@id": "packed.cwl#main/tlds"}
}

Suppose a consumer tries to build the workflow's diagram with this information. Since order is not guaranteed, connections might be processed as #pc1, #pc5, #pc2, #pc3, #pc4, #pc6. This leads to the same revtool-executing step being linked to both cuttool and sorttool. Only when processing #pc3 the consumer realizes that there must be another revtool-executing step, since connecting to the existing one would lead to a cycle. The resulting diagram is:

graph-bad-connection

which is a different workflow that computes an entirely different output.

To avoid this problem, we should add connection to the relevant HowToStep instances. Note that we need to retain the ability to place them in ComputationalWorkflow as well, since some languages (e.g. CWL) allow passthrough links with no steps involved:

graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Requirement Something we want to capture in the spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant