Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow #7

Merged
merged 10 commits into from
Jul 23, 2020
Merged

Workflow #7

merged 10 commits into from
Jul 23, 2020

Conversation

alaninmcr
Copy link

Workflow and FormalParameter type

@stain
Copy link

stain commented Jul 6, 2020

When can this get merged?

<div typeof="rdfs:Class" resource="http://schema.org/FormalParameter">
<link property="http://schema.org/isPartOf" href="http://bio.schema.org" />
<span class="h" property="rdfs:label">FormalParameter</span>
<span property="rdfs:comment">A formal parameter is a slot that may be satisfied when the workflow is run. It appears as an Input or Output of a Workflow</span>
Copy link
Collaborator

@ljgarcia ljgarcia Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance to extend this to, for instance, software? I can imagine we are talking here of more than data/variable types but maybe worth to give it a thought.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps input and output to a FormalParameter also make sense on a SoftwareApplication - (considering it could be a pretend subclass CommandLineTool)

..but we found it hard to put it on SoftwareSourceCode without adding an intermediate Script - what is the formal parameters for code.c ? Would need to break into functions etc..! If someone adds Function again they can reuse FormalParameter.

So that was one of the motivation for making a new subclass Workflow rather than just profiling SoftwareSourceCode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new type Workflow rather than profiling SoftwareSourceCode is well-motivated. However, the definition and description of FormalParameter could be more generic so it can later be integrated via a new property to SoftwareSourceCode

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting renaming it to some kind of ExpectedValue wrapper or just using these kind of words without mentioning "Workflow"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suggesting to take more compatible cases, such as the software one, into account when defining the type and corresponding descriptions

<span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/Thing">Thing</a></span>
-->
</div>

Copy link
Collaborator

@ljgarcia ljgarcia Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a property specifying the nature/type of the input/output data? It could be a schema/Bioschemas type or EDAM (or both). Would it be useful for the Workflows case? I know it has been mentioned for the Software/Tools case. Of course, the input/output could not be data but another research object. Still, having the nature could help findability and connectivity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to do that in the BioSchemas profile for FormalParameter - where we say we should use additionalType to link to EDAM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the profile, but not via additionalType. The motivation of additionalType is allowing multiple classes in one of the serialization which does not natively support it (microdata, I think). That is why I thought of adding a new property here... although, if done via profiling, additionalType could actually work.

Copy link

@stain stain Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good thing about the softer additionalType here is that it is not the same RDF semantics as @type - in particular a FormalParameter with an additionalType of EDAM Genome Sequence is not actually a genome sequence, as it is a parameter slot which would accept (or produce) values of that type. In the same way additionalType can type using identifiers to non-class instances, e.g. a skos:Concept.

As it's only in the profile this is straight forward.. if we want it in the FormalParameter type registration (this repo) then at best we would need to describe that pattern in free text for the class itself.

Say we added new properties instead, I don't know a better property that would work well on both input (expectsValuesOfType ?) and output (producesValuesByType ?)..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed additionalType makes sense here. @AlasdairGray any thoughts on this point?

* | Issues:
-->

<div typeof="rdfs:Class" resource="http://schema.org/FormalParameter">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a generalization covering other types also working with input/output is not possible, maybe changing the name could be considered

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be used with other things having input and output, but I don't know what domains to add there as here we only need it on Workflow (although it would be nice to have also on an individual step referenced from hasPart).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the software case? As it will be a ComputationalWorkflow, pieces of software will be important there. Could you keep it in mind when describing domains, ranges and so? I know it adds work but it will be more usable even by others.

<span>Source: <a property="dc:source" href="http://bioschemas.org">Bioschemas</a></span>
</div>

<div typeof="rdf:Property" resource="http://schema.org/softwareRequirements">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about hardware/materials requirements? From the broad workflow definition, I would say baking a cake fits in there (to bake a cake a follow a organize resources (ingredients) and following a process, sequence of steps, so I can transform those materials, input, into a cake, output ). It is possible I am missing something here but maybe the workflow definition needs to be narrow down. Could programmatic workflow maybe work here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This touches on other kind of workflows that are not computational, and thus do not have formal parameters. Even in Bio* world there is lots of confusion on this, as people have lab protocols called workflows - more like business workflows and even what Wikipedia just call https://en.wikipedia.org/wiki/Workflow

Such workflows are presumably not really subclasses of SoftwareSourceCode (unless formalized in BPM etc). If we imagine someone comes along and want to describe these they may found our Workflow in the way, and a name like ComputationalWorkflow make more sense.

However we want to avoid duality as both might have inputs and outputs and steps - would a future Workflow be possible to inject later as a new second parent of ComputationalWorkflow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComputationalWorkflow is a better option, indeed. And, yes, a future generic Workflow type could become parent of ComputationalWorkflow. Hard to know whether a generic Workflow type will come later so model your ComputationalWorkflow on its own. Maybe also worth to see the HowTo class in schema.org, that one allows to describe things that can be split into steps (like a how to section), there could be some things there that could be useful for ComputationalWorkflow as well, or you could decide to inherit from it.

@stain
Copy link

stain commented Jul 8, 2020

Agree with @ljgarcia that a future WetlabWorkflow or similar would make sense to model over https://schema.org/HowTo just like its specialization https://schema.org/Recipe - in fact many of the lab protocols are really recipes just not for edibles!

It makes makes me think that it is too early to say anything about all about general Workflow and our ComputationalWorkflow is not required to become related to it, so the simplest thing is to just do the rename, perhaps make FormalParameter not sound just about workflows and leave it at that.

@stain
Copy link

stain commented Jul 8, 2020

Playing Devil's advocate, just running with the idea of https://schema.org/HowTo for computational workflows we would have:

It's quite a close fit, particularly in terminology of step and tool. The biggest difference is that our FormalParameter is replaced by a HowToSupply for inputs, but a weaker numerical QuantitativeValue for outputs (which have no property to say what kind of thing it is making) - so the yield would need to be extended.

The other is the indirection through itemListElement of the https://schema.org/HowToStep to its inner https://schema.org/HowToDirection - this is similar to how in CWL we have a list of WorkflowSteps that run a CommandLineTool or a nested Workflow.

It is confusing that a single HowToStep can have multiple HowToDirections - but that's how recipes work as well ("Mix the eggs together in a bowl and add the sugar"). Possibly some workflows might do more than one thing in a single step, e.g. pre/post processing, but mostly we would have a single itemListElement equivalent to the CWL run (which we could add as subproperty)

Here is my attempt to adapt the example in https://github.com/ResearchObject/ro-crate/blob/bioschemas-workflow-0.5/docs/1.1-DRAFT/index.md#complying-with-bioschemas-workflow-profile as a HowTo:

{
  "@context": "https://schema.org/",
  "id": "workflow/retropath.knime",
  "type": [
    "HowTo",
    "SoftwareSourceCode",
    "ScientificWorkflow"
  ],
  "creator": {
    "id": "#alice"
  },
  "dateCreated": "2020-05-23",
  "license": "https://spdx.org/licenses/CC-BY-NC-SA-4.0",
  "name": "Sequence alignment workflow",
  "programmingLanguage": {
    "id": "#knime"
  },
  "sdPublisher": {
    "id": "#workflow-hub"
  },
  "step": [
    {
      "type": "HowToStep",
      "itemListElement": {
        "type": "HowToDirection",
        "name": "analyze_csv",
        "tool": {
          "id": "scripts/analyse_csv.py",
          "type": [
            "SoftwareSourceCode",
            "HowToTool"
          ],
          "name": "Analyze CSV files",
          "programmingLanguage": {
            "id": "https://www.python.org/downloads/release/python-380/"
          }
        }
      }
    },
    {
      "type": "HowToStep",
      "itemListElement": {
        "type": "HowToDirection",
        "name": "analyze_csv",
        "tool": {
          "id": "https://www.imagemagick.org/",
          "type": [
            "SoftwareApplication",
            "HowToTool"
          ],
          "name": "ImageMagick",
          "url": "https://www.imagemagick.org/",
          "version": "ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org"
        }
      }
    }
  ],
  "supply": {
    "type": [
      "HowToSupply",
      "FormalParameter"
    ],
    "additionalType": "http://edamontology.org/data_2977",
    "encodingFormat": {
      "id": "http://edamontology.org/format_1929"
    },
    "name": "genome_sequence"
  },
  "tool": [
    {
      "id": "#python",
      "type": [
        "HowToTool",
        "SoftwareApplication"
      ],
      "name": "Python"
    },
    {
      "id": "#knime",
      "type": [
        "HowToTool",
        "SoftwareApplication"
      ],
      "name": "Knime"
    }
  ],
  "url": "http://example.com/workflows/alignment",
  "version": "0.5.0",
  "yield": [
    {
      "id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044",
      "type": "FormalParameter",
      "additionalType": "http://edamontology.org/data_2977",
      "encodingFormat": {
        "id": "http://edamontology.org/format_2572"
      },
      "name": "cleaned_sequence"
    },
    {
      "id": "#2f32b861-e43c-401f-8c42-04fd84273bdf",
      "type": "FormalParameter",
      "additionalType": "http://edamontology.org/data_1383",
      "encodingFormat": {
        "id": "http://edamontology.org/format_1982"
      },
      "name": "sequence_alignment"
    }
  ]
}

Needless to say these intermediate HowToStep and use of itemListElement mean you get a lot of _:bnodes in RO-Crate's flattened JSON-LD:

{
  "@context": "https://w3id.org/ro/crate/1.1-DRAFT/context",
  "@graph": [
    {
      "@id": "_:b0",
      "@type": "HowToStep",
      "itemListElement": {
        "@id": "_:b1"
      }
    },
    {
      "@id": "_:b1",
      "@type": "HowToDirection",
      "name": "analyze_csv",
      "tool": {
        "@id": "scripts/analyse_csv.py"
      }
    },
    {
      "@id": "_:b2",
      "@type": "HowToStep",
      "itemListElement": {
        "@id": "_:b3"
      }
    },
    {
      "@id": "_:b3",
      "@type": "HowToDirection",
      "name": "analyze_csv",
      "tool": {
        "@id": "https://www.imagemagick.org/"
      }
    },
    {
      "@id": "_:b4",
      "@type": [
        "HowToSupply",
        "schema:FormalParameter"
      ],
      "additionalType": {
        "@id": "http://edamontology.org/data_2977"
      },
      "encodingFormat": {
        "@id": "http://edamontology.org/format_1929"
      },
      "name": "genome_sequence"
    },
    {
      "@id": "#2f32b861-e43c-401f-8c42-04fd84273bdf",
      "@type": "schema:FormalParameter",
      "additionalType": {
        "@id": "http://edamontology.org/data_1383"
      },
      "encodingFormat": {
        "@id": "http://edamontology.org/format_1982"
      },
      "name": "sequence_alignment"
    },
    {
      "@id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044",
      "@type": "schema:FormalParameter",
      "additionalType": {
        "@id": "http://edamontology.org/data_2977"
      },
      "encodingFormat": {
        "@id": "http://edamontology.org/format_2572"
      },
      "name": "cleaned_sequence"
    },
    {
      "@id": "#knime",
      "@type": [
        "HowToTool",
        "SoftwareApplication"
      ],
      "name": "Knime"
    },
    {
      "@id": "#python",
      "@type": [
        "HowToTool",
        "SoftwareApplication"
      ],
      "name": "Python"
    },
    {
      "@id": "scripts/analyse_csv.py",
      "@type": [
        "SoftwareSourceCode",
        "HowToTool"
      ],
      "name": "Analyze CSV files",
      "programmingLanguage": {
        "@id": "https://www.python.org/downloads/release/python-380/"
      }
    },
    {
      "@id": "workflow/retropath.knime",
      "@type": [
        "HowTo",
        "SoftwareSourceCode",
        "schema:ScientificWorkflow"
      ],
      "creator": {
        "@id": "#alice"
      },
      "dateCreated": {
        "@type": "Date",
        "@value": "2020-05-23"
      },
      "license": {
        "@id": "https://spdx.org/licenses/CC-BY-NC-SA-4.0"
      },
      "name": "Sequence alignment workflow",
      "programmingLanguage": {
        "@id": "#knime"
      },
      "sdPublisher": {
        "@id": "#workflow-hub"
      },
      "step": [
        {
          "@id": "_:b0"
        },
        {
          "@id": "_:b2"
        }
      ],
      "supply": {
        "@id": "_:b4"
      },
      "tool": [
        {
          "@id": "#python"
        },
        {
          "@id": "#knime"
        }
      ],
      "url": {
        "@id": "http://example.com/workflows/alignment"
      },
      "version": "0.5.0",
      "yield": [
        {
          "@id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044"
        },
        {
          "@id": "#2f32b861-e43c-401f-8c42-04fd84273bdf"
        }
      ]
    },
    {
      "@id": "https://www.imagemagick.org/",
      "@type": [
        "SoftwareApplication",
        "HowToTool"
      ],
      "name": "ImageMagick",
      "url": {
        "@id": "https://www.imagemagick.org/"
      },
      "version": "ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org"
    }
  ]
}

We also see that yield here again needs the equivalent of FormalParameter so we can't easily escape such a type, and particularly for Computational Workflows it makes sense that for both inputs and outputs it should be same/similar class.

@stain
Copy link

stain commented Jul 8, 2020

See also my email to workflowhub list.

tl;dr:

(..)
However I am not personally convinced, and think the simplest for now is:

  • Rename Workflow type to ComputationalWorkflow
  • Rename Workflow profile to ComputationalWorkflow profile
  • Soften FormalParameter references to "workflow" so it potentially could be used with other thingsd

I am not sure if we need to rename FormalParameter - but something similar would be needed for "yield" in HowTo to support non-quantative "Expected Values"

@alaninmcr
Copy link
Author

I agree with

  • Rename Workflow type to ComputationalWorkflow
  • Rename Workflow profile to ComputationalWorkflow profile
  • Soften FormalParameter references to "workflow" so it potentially could be used with other things_

I think any harmonization with HowTo should wait for a later draft version.

@ljgarcia
Copy link
Collaborator

@stain regarding "Playing Devil's advocate, just running with the idea of https://schema.org/HowTo for computational workflows we would have:", you still can have the FormalParameter. Inheriting from HowTo does not mean that you will use all of the properties defined there (see all the properties available in CreativeWork and how many make sense for some of it child types), they will be available just in case but if FormalParameter makes more sense (I think it does) add it as a new property. How to use the whole thing will be clarify via the corresponding profile.

@AlasdairGray
Copy link
Member

I'm changing the status of this PR to draft. Once the changes to ComputationalWorkflow are complete please change it back again.

@alaninmcr
Copy link
Author

I have renamed Workflow to ComputationalWorkflow and edited the descriptions.

@alaninmcr alaninmcr marked this pull request as ready for review July 21, 2020 09:00
@alaninmcr alaninmcr merged commit 84af282 into BioChemEntity Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants