New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment with Bioschemas profile Workflow 0.5 #81
Conversation
Remove Workflow and Script
As discussed in meeting 2020-05-28 we should reduce/avoid need for @type array aligning with BioSchemas Workflow profile means we don't need wf4ever terms for Workflow or Script or Sketch, however at the cost of loosing some precision
This must be augmented for pending changes in 0.5 for `FormalParameter`
From draft of the draft BioSchemas profile Workflow 0.5 See also: * <https://docs.google.com/spreadsheets/d/1VKntwxkgjvup6yfzaPZsBN3h5aKWU5nuTKGEds3GzeQ/edit> * <https://docs.google.com/spreadsheets/d/1MBNye9xqXDAe2Q8uBtioY4-17J-vKV8UA_PNprSMAVY/edit>
It was decided in BioSchemas Workflow working group that Other proposed types from BioSchemas, e.g. https://bioschemas.org/types/Taxon/0.3-RELEASE-2019_11_18/ have a separate namespace
however no equivalent seems to exist for its new properties, e.g. https://bioschemas.org/childTaxon does not exist however https://bioschemas.org/Taxon#childTaxon do refer to the right HTML row: <tr id="childTaxon">
<th style="color: #0B794B;">childTaxon</th>
<td>
<a style="color: #0B794B;" href="/types/drafts/Taxon">Taxon</a> or<br/>
<a href="http://schema.org/Text">Text</a> or<br/>
<a href="http://schema.org/URL">URL</a>
</td>
<td>
Closest child taxa of the taxon in question. <br/>
Inverse property: <span style="color: #0B794B;">parentTaxon</span>
</td>
</tr> So assuming it will appear on the bioschemas.org website, perhaps we should use https://bioschemas.org/Workflow as type and the new properties in the style of https://bioschemas.org/Workflow#input ? We still refer to which version of the profile at the |
..although it is consistent with the http://schema.org/Grant example 3, see schemaorg/schemaorg#383 Using https:/bioschemas.org/Workflow#input etc
Added a softer distinction between workflow and script
I thought that we resolved to keep @type arrays - so that it was easy to identify Data Entities, that was my recollection of the conclusion in the last call. If you remove "File" then it makes it harder for developers to identify things that might need be fetched from their @id URIs and to build interfaces like the one in Describo. |
@ptsefton Reflecting #83 I changed it to use the |
@alaninmcr has sent the Workflow and FormalParameter types to bioschemas. I am waiting for their response. |
Latest discussion in BioSchemas/bioschemas.github.io#304 and BioSchemas/schemaorg#7 (incl. @ljgarcia @alaninmcr @AlasdairGray) concludes to rename Those pull requests for updating bioschemas.org are blocked mainly by that rename. |
@alaninmcr's pull request BioSchemas/bioschemas.github.io#304 is updated with new snapshot dates, so I will use the URLs which will appear at https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21 and https://bioschemas.org/profiles/ComputationalWorkflow/0.5-DRAFT-2020_07_21 |
Based on ResearchObject/ro-crate#81 7c61c2bf49b11badf03d9445ee9a3fc94d346ad0 and https://schema.org/version/9.0/
See also review edits in #100 |
Based on ResearchObject/ro-crate#81 7c61c2bf49b11badf03d9445ee9a3fc94d346ad0 and https://schema.org/version/9.0/
Updating our Workflow section to map closer to BioSchemas profile for Workflows.
It also removes wf4ever references for
Workflow
andScript
.Reflecting the desire to avoid
@type: [arrays]
(which issue, @ptsefton ?) - this also gets rid of the previous triple-typing of@type: [File, SoftwareSourceCode, Script]
to just@type: SoftwareSourceCode
(other changes needed elsewhere for that change)Work in progress
@type: Workflow
or@type: SoftwareSourceCode
array-less @type overall in RO-Crate (not blocker)Author vs creator
https://bioschemas.org/profiles/Workflow/0.4-DRAFT-2020_05_11/ specifies
creator
rather thanauthor
- however we have made https://researchobject.github.io/ro-crate/1.0/ consistent to useauthor
for other types including theDataset
itself and also https://bioschemas.org/profiles/ScholarlyArticle/0.1-DRAFT-2019_03_15/ - however their https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/ recommendscreator
rather thanauthor
.Occasionally the author of a workflow may be different from the creator, e.g. Alice writes the workflow in Galaxy, then Bob rewrites it in Snakemake, which is quite a different workflow language. However the conceptual workflow could remain the same. As a workflow is typed as http://schema.org/SoftwareSourceCode then perhaps it makes most sense to note who made the code lines as the creator - so I left our workflow examples to also use that.
See section Authoring in our PAV paper for discussion of author vs creator vs curator vs contributor.
Multiple type array
Removing the multiple
@type: [arrays]
meant I also got rid ofWorkflowSketch
so the diagrams are now in a sense untyped except for theirabout
relation to a workflow, which again is now just aSoftwareSourceCode
.When we change this generally in RO-Crate we have to also soften the requirement that data entities from
hasPart
has to have typeFile
- as workflows are generally saved in files. (Same applies toImageObject
andScholarlyArticle
if embedded).Note that the FormalParameter proposal uses
additionalType
andformat
with links to EDAM ontology. In the example I used them as full URIs , but not sure if we need to recommend their@type: Thing
contextual entities as in my example, as their URIs generally give a readable description (e.g. http://edamontology.org/format_1929 )Script vs Workflow
Removing the
wf4ever
terms and the multiple types make it harder to distinguish workflows from scripts. Perhaps that was always tricky, e.g. https://snakemake.readthedocs.io/ workflows look a lot like a script anyway.It is unclear from https://bioschemas.org/profiles/Workflow/0.4-DRAFT-2020_05_11/ if they are proposing a new type
Workflow
(to become http://schema.org/Workflow) or specifying how https://schema.org/SoftwareSourceCode should be used under this profile.This text suggest the second:
compared to https://bioschemas.org/profiles/ChemicalSubstance/0.4-RELEASE/
However clarity needs to be sought from BioSchemas as this is inconsistent across their site - perhaps @alaninmcr @AlasdairGray can chip in.
Assuming this, this pull request uses just
SoftwareSourceCode
and removes the previous distinction betweenScript
andWorkflow
from wf4ever.BioSchemas as an optional profile in RO-Crate
https://bioschemas.org/profiles/Workflow/0.4-DRAFT-2020_05_11/ specifies these mandatory properties:
creator
dateCreated
input
license
name
output
programmingLanguage
sdPublisher
url
version
I think having all of this information is a bit excessive for any RO-Crate that happens to have a workflow, as our other types are not as restrictive. Therefore I added the BioSchemas compliance as a new, in a way optional section.
However I did use the word SHOULD in this wording, so it may need to be softened to make it clear they don't have to follow this section?
Namespaces
This pull request is work in progress because it is reflecting changes for planned release Workflows DRAFT 0.5 which adds the
FormalParameter
type for inputs and outputs. Once that is released on bioschemas.org we can insert the date in this@context
mapping:It is not pretty, but as these terms will be proposed to schema.org, we don't know if they will change in the process (e.g.
format
might be dropped for http://schema.org/encodingFormat andinput
might becomehttp://schema.org/inputParameter
rather than intendedhttp://schema.org/input
.These URIs are all 404 now, so the idea was to map to https://bioschemas.org/profiles/Workflow/0.4-DRAFT-2020_05_11/#input etc - even if strangely there is no
id="input"
HTML ancor on that page (there probably should! Views, @AlasdairGray ?).If we release RO-Crate 1.1 we have to be stable in what we map to, just like https://w3id.org/ro/crate/1.0/context has a fixed mapping to https://schema.org/version/5.0/ terms - these crates might end up on tape drives etc. and should be able to have a long life.
I've updated the context for schema.org release 8.0 which adds some extra terms (see diff). I also added a new
isBasedOn
property to reflect basing our context on schema.org, pcdm and the BioSchemas Workflow profile (again once the 0.5 URI is known).