Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document challenges found using incremental FDOs in workflows #59

Closed
Tracked by #78
PaulBrack opened this issue Nov 2, 2021 · 5 comments
Closed
Tracked by #78

Document challenges found using incremental FDOs in workflows #59

PaulBrack opened this issue Nov 2, 2021 · 5 comments
Assignees
Labels
D8.4 Work associated with final deliverable - e.g. testing, sustainability, and documentation. documentation Improvements or additions to documentation wontfix This will not be worked on

Comments

@PaulBrack
Copy link
Contributor

From #47 :

Another point for discussion: at the moment, every tool has openDS as input and code.py maps these to opends_properties. The drawback of this, is that every input variable needs to be munged into the openDS structure.
Take for example, GEORG: this has locality text string input. For this to be run as a standalone tool on a spreadsheet of data, the spreadsheet would need to contain (or be converted to) openDS objects.

Would it be better for tools to accept the inputs they actually require? Instead of each tool converting the openDS into the inputs, we would have instead an openDS mapper tool, which would take the openDS input, define the expected outputs which would be plucked from the openDS JSON, and then feed these into the subsequent tool.

So rather than openDS => tool we would have openDS => openDS mapper => tool.

Advantages: more in line with the way Galaxy is designed to be used. Also, there is a lot of code redundancy - each tool needs code.py to extract the openDS properties. Instead, there would be one tool that did this.

@PaulBrack PaulBrack added this to Backlog in Current Milestone Nov 2, 2021
@PaulBrack
Copy link
Contributor Author

Decided to leave this for now and reconsider in next milestone

@Cubey0
Copy link
Collaborator

Cubey0 commented Nov 4, 2021

OK - works for me. Thanks for your responce.

@PaulBrack PaulBrack moved this from Backlog to Current sprint in Current Milestone Nov 10, 2021
@PaulBrack PaulBrack moved this from Current sprint to In progress in Current Milestone Nov 18, 2021
@PaulBrack
Copy link
Contributor Author

Writing proposed solution today

@stain stain moved this from In progress to Backlog in Current Milestone Mar 23, 2022
@stain
Copy link
Collaborator

stain commented Mar 23, 2022

We agreed to delay this till after the May 2022 deliverable.

Perhaps have two inputs/outputs of openDS (JSON only) and data (dataset of files, possibly an RO-Crate) to make data flow within Galaxy explicit rather than implicit for files not yet published with DISSCO (currently stored in a temporary directory within Galaxy).

If we move the common openDS processing to a pip installable module, then this could be used by a Galaxy wrapper per tool, and the tool (where appropriate) can be less openDS-aware, so that there in effect cuold be two wrappers, one that is passing openDS+data, and another wrapper that is doing the files natively - which is better for testing and use in other workflows.

@llivermore llivermore added D8.4 Work associated with final deliverable - e.g. testing, sustainability, and documentation. wontfix This will not be worked on labels Jul 6, 2022
@llivermore llivermore changed the title Determine if current openDS datatype model is suitable Document challenges found using incremental FDOs in workflows Oct 26, 2022
@llivermore llivermore added the documentation Improvements or additions to documentation label Oct 26, 2022
@llivermore
Copy link
Contributor

llivermore commented Oct 26, 2022

Also described on this slide: FDO Challenges Also here: Livermore, Laurence; Brack, Paul; Scott, Ben; Soiland-Reyes, Stian; Woolland, Oliver (2022): The Specimen Data Refinery: Using a scientific workflow approach for information extraction. figshare. Presentation. https://doi.org/10.6084/m9.figshare.21312345.v1

Need to write up for final report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D8.4 Work associated with final deliverable - e.g. testing, sustainability, and documentation. documentation Improvements or additions to documentation wontfix This will not be worked on
Projects
Development

No branches or pull requests

5 participants