-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow #7
Workflow #7
Conversation
Avoid restating properties from schema.rdfa
Avoid claiming softwareRequirements
.xl file to BAM file
When can this get merged? |
data/ext/bio/FormalParameter.rdfa
Outdated
<div typeof="rdfs:Class" resource="http://schema.org/FormalParameter"> | ||
<link property="http://schema.org/isPartOf" href="http://bio.schema.org" /> | ||
<span class="h" property="rdfs:label">FormalParameter</span> | ||
<span property="rdfs:comment">A formal parameter is a slot that may be satisfied when the workflow is run. It appears as an Input or Output of a Workflow</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance to extend this to, for instance, software? I can imagine we are talking here of more than data/variable types but maybe worth to give it a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perhaps input
and output
to a FormalParameter
also make sense on a SoftwareApplication
- (considering it could be a pretend subclass CommandLineTool
)
..but we found it hard to put it on SoftwareSourceCode
without adding an intermediate Script
- what is the formal parameters for code.c
? Would need to break into functions etc..! If someone adds Function
again they can reuse FormalParameter
.
So that was one of the motivation for making a new subclass Workflow
rather than just profiling SoftwareSourceCode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new type Workflow rather than profiling SoftwareSourceCode is well-motivated. However, the definition and description of FormalParameter could be more generic so it can later be integrated via a new property to SoftwareSourceCode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting renaming it to some kind of ExpectedValue
wrapper or just using these kind of words without mentioning "Workflow"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting to take more compatible cases, such as the software one, into account when defining the type and corresponding descriptions
<span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/Thing">Thing</a></span> | ||
--> | ||
</div> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a property specifying the nature/type of the input/output data? It could be a schema/Bioschemas type or EDAM (or both). Would it be useful for the Workflows case? I know it has been mentioned for the Software/Tools case. Of course, the input/output could not be data but another research object. Still, having the nature could help findability and connectivity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to do that in the BioSchemas profile for FormalParameter - where we say we should use additionalType
to link to EDAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the profile, but not via additionalType. The motivation of additionalType is allowing multiple classes in one of the serialization which does not natively support it (microdata, I think). That is why I thought of adding a new property here... although, if done via profiling, additionalType could actually work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good thing about the softer additionalType
here is that it is not the same RDF semantics as @type
- in particular a FormalParameter
with an additionalType of EDAM Genome Sequence is not actually a genome sequence, as it is a parameter slot which would accept (or produce) values of that type. In the same way additionalType
can type using identifiers to non-class instances, e.g. a skos:Concept
.
As it's only in the profile this is straight forward.. if we want it in the FormalParameter
type registration (this repo) then at best we would need to describe that pattern in free text for the class itself.
Say we added new properties instead, I don't know a better property that would work well on both input (expectsValuesOfType
?) and output (producesValuesByType
?)..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed additionalType makes sense here. @AlasdairGray any thoughts on this point?
* | Issues: | ||
--> | ||
|
||
<div typeof="rdfs:Class" resource="http://schema.org/FormalParameter"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a generalization covering other types also working with input/output is not possible, maybe changing the name could be considered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be used with other things having input and output, but I don't know what domains to add there as here we only need it on Workflow
(although it would be nice to have also on an individual step referenced from hasPart
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the software case? As it will be a ComputationalWorkflow, pieces of software will be important there. Could you keep it in mind when describing domains, ranges and so? I know it adds work but it will be more usable even by others.
data/ext/bio/Workflow.rdfa
Outdated
<span>Source: <a property="dc:source" href="http://bioschemas.org">Bioschemas</a></span> | ||
</div> | ||
|
||
<div typeof="rdf:Property" resource="http://schema.org/softwareRequirements"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about hardware/materials requirements? From the broad workflow definition, I would say baking a cake fits in there (to bake a cake a follow a organize resources (ingredients) and following a process, sequence of steps, so I can transform those materials, input, into a cake, output ). It is possible I am missing something here but maybe the workflow definition needs to be narrow down. Could programmatic workflow maybe work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This touches on other kind of workflows that are not computational, and thus do not have formal parameters. Even in Bio* world there is lots of confusion on this, as people have lab protocols called workflows - more like business workflows and even what Wikipedia just call https://en.wikipedia.org/wiki/Workflow
Such workflows are presumably not really subclasses of SoftwareSourceCode
(unless formalized in BPM etc). If we imagine someone comes along and want to describe these they may found our Workflow
in the way, and a name like ComputationalWorkflow
make more sense.
However we want to avoid duality as both might have inputs and outputs and steps - would a future Workflow
be possible to inject later as a new second parent of ComputationalWorkflow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComputationalWorkflow is a better option, indeed. And, yes, a future generic Workflow type could become parent of ComputationalWorkflow. Hard to know whether a generic Workflow type will come later so model your ComputationalWorkflow on its own. Maybe also worth to see the HowTo class in schema.org, that one allows to describe things that can be split into steps (like a how to section), there could be some things there that could be useful for ComputationalWorkflow as well, or you could decide to inherit from it.
Agree with @ljgarcia that a future It makes makes me think that it is too early to say anything about all about general |
Playing Devil's advocate, just running with the idea of https://schema.org/HowTo for computational workflows we would have:
It's quite a close fit, particularly in terminology of step and tool. The biggest difference is that our The other is the indirection through It is confusing that a single Here is my attempt to adapt the example in https://github.com/ResearchObject/ro-crate/blob/bioschemas-workflow-0.5/docs/1.1-DRAFT/index.md#complying-with-bioschemas-workflow-profile as a {
"@context": "https://schema.org/",
"id": "workflow/retropath.knime",
"type": [
"HowTo",
"SoftwareSourceCode",
"ScientificWorkflow"
],
"creator": {
"id": "#alice"
},
"dateCreated": "2020-05-23",
"license": "https://spdx.org/licenses/CC-BY-NC-SA-4.0",
"name": "Sequence alignment workflow",
"programmingLanguage": {
"id": "#knime"
},
"sdPublisher": {
"id": "#workflow-hub"
},
"step": [
{
"type": "HowToStep",
"itemListElement": {
"type": "HowToDirection",
"name": "analyze_csv",
"tool": {
"id": "scripts/analyse_csv.py",
"type": [
"SoftwareSourceCode",
"HowToTool"
],
"name": "Analyze CSV files",
"programmingLanguage": {
"id": "https://www.python.org/downloads/release/python-380/"
}
}
}
},
{
"type": "HowToStep",
"itemListElement": {
"type": "HowToDirection",
"name": "analyze_csv",
"tool": {
"id": "https://www.imagemagick.org/",
"type": [
"SoftwareApplication",
"HowToTool"
],
"name": "ImageMagick",
"url": "https://www.imagemagick.org/",
"version": "ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org"
}
}
}
],
"supply": {
"type": [
"HowToSupply",
"FormalParameter"
],
"additionalType": "http://edamontology.org/data_2977",
"encodingFormat": {
"id": "http://edamontology.org/format_1929"
},
"name": "genome_sequence"
},
"tool": [
{
"id": "#python",
"type": [
"HowToTool",
"SoftwareApplication"
],
"name": "Python"
},
{
"id": "#knime",
"type": [
"HowToTool",
"SoftwareApplication"
],
"name": "Knime"
}
],
"url": "http://example.com/workflows/alignment",
"version": "0.5.0",
"yield": [
{
"id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044",
"type": "FormalParameter",
"additionalType": "http://edamontology.org/data_2977",
"encodingFormat": {
"id": "http://edamontology.org/format_2572"
},
"name": "cleaned_sequence"
},
{
"id": "#2f32b861-e43c-401f-8c42-04fd84273bdf",
"type": "FormalParameter",
"additionalType": "http://edamontology.org/data_1383",
"encodingFormat": {
"id": "http://edamontology.org/format_1982"
},
"name": "sequence_alignment"
}
]
} Needless to say these intermediate {
"@context": "https://w3id.org/ro/crate/1.1-DRAFT/context",
"@graph": [
{
"@id": "_:b0",
"@type": "HowToStep",
"itemListElement": {
"@id": "_:b1"
}
},
{
"@id": "_:b1",
"@type": "HowToDirection",
"name": "analyze_csv",
"tool": {
"@id": "scripts/analyse_csv.py"
}
},
{
"@id": "_:b2",
"@type": "HowToStep",
"itemListElement": {
"@id": "_:b3"
}
},
{
"@id": "_:b3",
"@type": "HowToDirection",
"name": "analyze_csv",
"tool": {
"@id": "https://www.imagemagick.org/"
}
},
{
"@id": "_:b4",
"@type": [
"HowToSupply",
"schema:FormalParameter"
],
"additionalType": {
"@id": "http://edamontology.org/data_2977"
},
"encodingFormat": {
"@id": "http://edamontology.org/format_1929"
},
"name": "genome_sequence"
},
{
"@id": "#2f32b861-e43c-401f-8c42-04fd84273bdf",
"@type": "schema:FormalParameter",
"additionalType": {
"@id": "http://edamontology.org/data_1383"
},
"encodingFormat": {
"@id": "http://edamontology.org/format_1982"
},
"name": "sequence_alignment"
},
{
"@id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044",
"@type": "schema:FormalParameter",
"additionalType": {
"@id": "http://edamontology.org/data_2977"
},
"encodingFormat": {
"@id": "http://edamontology.org/format_2572"
},
"name": "cleaned_sequence"
},
{
"@id": "#knime",
"@type": [
"HowToTool",
"SoftwareApplication"
],
"name": "Knime"
},
{
"@id": "#python",
"@type": [
"HowToTool",
"SoftwareApplication"
],
"name": "Python"
},
{
"@id": "scripts/analyse_csv.py",
"@type": [
"SoftwareSourceCode",
"HowToTool"
],
"name": "Analyze CSV files",
"programmingLanguage": {
"@id": "https://www.python.org/downloads/release/python-380/"
}
},
{
"@id": "workflow/retropath.knime",
"@type": [
"HowTo",
"SoftwareSourceCode",
"schema:ScientificWorkflow"
],
"creator": {
"@id": "#alice"
},
"dateCreated": {
"@type": "Date",
"@value": "2020-05-23"
},
"license": {
"@id": "https://spdx.org/licenses/CC-BY-NC-SA-4.0"
},
"name": "Sequence alignment workflow",
"programmingLanguage": {
"@id": "#knime"
},
"sdPublisher": {
"@id": "#workflow-hub"
},
"step": [
{
"@id": "_:b0"
},
{
"@id": "_:b2"
}
],
"supply": {
"@id": "_:b4"
},
"tool": [
{
"@id": "#python"
},
{
"@id": "#knime"
}
],
"url": {
"@id": "http://example.com/workflows/alignment"
},
"version": "0.5.0",
"yield": [
{
"@id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044"
},
{
"@id": "#2f32b861-e43c-401f-8c42-04fd84273bdf"
}
]
},
{
"@id": "https://www.imagemagick.org/",
"@type": [
"SoftwareApplication",
"HowToTool"
],
"name": "ImageMagick",
"url": {
"@id": "https://www.imagemagick.org/"
},
"version": "ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org"
}
]
} We also see that |
See also my email to workflowhub list. tl;dr:
|
I agree with
I think any harmonization with HowTo should wait for a later draft version. |
@stain regarding "Playing Devil's advocate, just running with the idea of https://schema.org/HowTo for computational workflows we would have:", you still can have the FormalParameter. Inheriting from HowTo does not mean that you will use all of the properties defined there (see all the properties available in CreativeWork and how many make sense for some of it child types), they will be available just in case but if FormalParameter makes more sense (I think it does) add it as a new property. How to use the whole thing will be clarify via the corresponding profile. |
I'm changing the status of this PR to draft. Once the changes to ComputationalWorkflow are complete please change it back again. |
I have renamed Workflow to ComputationalWorkflow and edited the descriptions. |
Workflow and FormalParameter type