New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Secondary index files and directories in WDL #2269
Comments
Not keen on the specific implementation suggestion. But the general idea is obviously a good one. |
Not sure if you mean the unicorn or the file-finding setup; I am of course open to alternate suggestions on both. This would bring much author joy forth into the world. |
Hah I'd prefer to provide a clean way to specify any collection of files (see CWLs secondary files concept) and then syntactic sugar in the form of specialized types, e.g. BamFile which knows to look for an index Having it be configurable at the Cromwell level implies a potential lack of portability for WDLs |
Oh hmm very good point, hadn't viewed it from the portability angle. |
@davidbenjamin has an interesting proposal for user-defined / explicit sets of params for WDL: https://github.com/broadinstitute/wdl/issues/102 Depending on how "CWL support" addresses the secondaryFiles mentioned above, it's supposed that similar WDL features will follow. |
FYI this was a key item in feedback from our WDL sessions in the UK workshops; having to specify accessory files is a big source of annoyance. Not that it's any surprise, but we're definitely getting confirmation from real users. |
We should certainly heed the lesson that CWL learned to provide both the concepts of directory and secondary files. They wound up implementing the former because people were also trying to do that and shoehorning it into the latter. |
+100 on support for secondary files! BAM + Index, VCF + Index would be
super helpful!
…-------------------------------
Kristian Cibulskis
Engineering Director, Data Sciences & Data Engineering
Broad Institute of MIT and Harvard
kcibul@broadinstitute.org
On Tue, Jul 25, 2017 at 4:18 PM, Jeff Gentry ***@***.***> wrote:
We should certainly heed the lesson that CWL learned to provide both the
concepts of directory and secondary files. They wound up implementing the
former because people were also trying to do that and shoehorning it into
the latter.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2269 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABW4g1ghUrXGroGI7qjHMH_N5z75osBjks5sRk2sgaJpZM4NZ6CY>
.
|
not strictly secondary files, but its close |
Is there any progress on secondary files, I see we have structs which I could probably use but I'm looking for a concept that makes it simpler to pick up index files rather than writing more globs and more mappings. We got directory support in WDL (openwdl/wdl#241) and in Cromwell (#3980). I understand that the language and the engine are different, but Cromwell has some concept of these secondary files as the CWL implementation supports it. |
@illusional While I cannot speak on behalf of the cromwell team on what they are implementing, I can say that there has been no discussions around secondary files for WDL. My inclination is that we will try to steer clear of it within WDL. However I encourage you to create an issue or make a PR in the WDL repo suggesting this change and we can allow the community to determine wtheher or not it should be something supported. |
I would agree w/ @patmagee that this is a matter for the OpenWDL group. Any Cromwell-level constructs to get at the underlying functionality would require non-portable WDLs to be written. I'll tag @cjllanwarne in case he has any clever ideas on how to express the concept in portable WDL in a less sucky way. I disagree with @patmagee that WDL should steer clear of the concept - IMO not doing this in the first place was one of the larger mistakes we made in the early days of WDL. Perhaps something with |
Yeah from an end user POV it is still a pain not to have a file bundle
concept, and it is something I wish we had at the WDL level.
On Sat, Feb 16, 2019 at 9:32 AM Jeff Gentry ***@***.***> wrote:
I would agree w/ @patmagee <https://github.com/patmagee> that this is a
matter for the OpenWDL group. Any Cromwell-level constructs to get at the
underlying functionality would require non-portable WDLs to be written.
I'll tag @cjllanwarne <https://github.com/cjllanwarne> in case he has any
clever ideas on how to express the concept in portable WDL in a less sucky
way.
I disagree with @patmagee <https://github.com/patmagee> that WDL should
steer clear of the concept - IMO not doing this in the first place was one
of the larger mistakes we made in the early days of WDL. Perhaps something
with Object. We're seeing something similar play out in GA4GH land w/ DRS
... the concept of a file bundle seems inescapable and it's not quite the
same thing as Directory
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2269 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACnwEyiT-eFgW3NMiY39SRU8MvDG3L6Gks5vOBZpgaJpZM4NZ6CY>
.
--
Geraldine A. Van der Auwera, Ph.D.
Associate Director of Outreach and Communications
Data Sciences Platform
Broad Institute
|
Thanks @patmagee, @geoffjentry and all for directing me to the correct place, I've created a discussion over at openwdl/wdl#289 as a place to have the conversation. If anyone finds this conversation, I'd love to see any thoughts you have on how accessory files may be specified in WDL. |
Sounds great @illusional. I'm going to close this issue from the Cromwell side. |
@vdauwera commented on Mon Apr 24 2017
Need to put in a Cromwell ticket for this. Basic ask: have Cromwell automatically look for (and co-localize) accessory files when given files with a specific extensions. E.g. if I give it foo.bam file it should look for foo.bai.
Note that sometimes it's just a matter of swapping the extension, but sometimes it's adding another extension, and there can be multiple accessory files, e.g. reference.fasta is always accompanied by both reference.fasta.fai and reference.dict.
This would ideally be configurable by the Cromwell admin, who would set up a list of primary file extensions and their accessory file naming patterns. Bonus points if the user can provide their own config on the command line to override the server's config. And also I want a pet unicorn that farts glitter.
WDL folks;
This is a followup from a recent discussion about getting compatible bcbio generated WDL (http://gatkforums.broadinstitute.org/wdl/discussion/9257/object-attribute-access-and-secondary-index-files). Thanks to all the great help you've provided we now have compatible WDL output that passes validation:
https://github.com/bcbio/test_bcbio_cwl/blob/master/run_info-cwl-wdl
This is brilliant, and I'd like to move into testing runs with Cromwell. Before starting this, there is one major area I know we're missing in the conversion, handling of secondary files and directories of files. CWL has the notion of secondaryFiles (http://www.commonwl.org/v1.0/Workflow.html#File) which you can use to block these and ensure they get staged/run next to each other. I use this in bcbio and wanted to figure out the best way to map it into WDL.
There are two cases we use these for:
What is the recommended way to deal with these cases in WDL? I'll have to re-engineer bcbio to be able to represent and pass these and wanted to do so in a way that was forward compatible with WDL's thoughts and plans. I've seen recommendations on current hacks like explicitly declaring the indexes as separate files, or tarring up a directory of files and passing that as input. I'm not clear enough on staging files from WDL/Cromwell to understand if these are guaranteed to always go in the right place (bai next to bam, all indexes in the same directory).
Thanks for any thoughts/suggestions/tips.
This Issue was generated from your [forums]
[forums]: http://gatkforums.broadinstitute.org/wdl/discussion/9299/secondary-index-files-and-directories-in-wdl/p1
@vdauwera commented on Thu May 04 2017
@katevoss this is a very common request from the Cromwell user community
The text was updated successfully, but these errors were encountered: