-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve folder hierarchies on the platform #415
Comments
Directory output feature from WDL 2.0 is supported: |
Hi Stanley, I did try to use this but I wasn't able to get it to work. I will give it another try and let you know if I am still having issues. |
Hi Rachel you may find the test case helpful as template: |
Hi Stanley,
Thanks for sharing this. It doesn't look like there is a way to be able to
use the Directory outputs to output the files to a specific directory but
then to still be able to specify individual files from that task as inputs
to the next task - is that correct or is there a way to do this?
Many thanks,
Rachel
…On Tue, 10 Jan 2023, 22:05 Stanley Lan, ***@***.***> wrote:
Hi Rachel you may find the test case helpful as template:
https://github.com/dnanexus/dxCompiler/tree/develop/test/wdl_2_0
We include the tests from the test folder for our integration test so all
tests will have to pass in order for the dxCompiler possible. Please make
sure that the latest release is used for compile since the directory output
support is relatively new.
—
Reply to this email directly, view it on GitHub
<#415 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGJK3KUWACRXQ4XCMWCI7ZTWRXMJPANCNFSM6AAAAAATVPLZFA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Could you elaborate what specifically does not work? mkdir outputs/folder_{1..4}
for i in {1..4};
do
echo "Hello" > outputs/folder${i}/my_file_${i}
done then make your task's outputs look like that: output {
Directory outdir = "outputs/"
} your next task will have |
I think Rachel's goal is to be able to specify (or find a way to specify) one individual file from a task directory output and use it as the input for the next task. I don't believe such feature is supported by either WDL or dxCompiler/executor. If it is directory output the task generates, that output can only be passed as directory as input for the downstream task(s) via WDL. I can't find any case in the WDL spec that allows pick-and-choose among the directory outputs: https://github.com/openwdl/wdl/blob/main/versions/development/SPEC.md |
Then just simply use Directory as an input and point to a specific file in your command section of the task |
My goal is to be able to specify relative output locations for individual files. Using directory as an output would not work because in my case I am using both WDL tasks and native applets imported as WDL tasks (these have file inputs) in my workflow. The native applets require file inputs whereas the outputs from the previous task passed as inputs to the next task would be directories. It isn't feasible to go through and change all our existing apps to take a different input given the lengthy validation / release / quality management process we have to go through to make changes in a clinical setting. Also dependent upon the task you may not necessarily always want the outputs in the same directory. We would also only want the next task/ app to run if the inputs consisted of all the required files - i.e. the input files exist. If you were to run a workflow that took a directory input, it could be that a previous task/app finished but did not output all the required files for the next task to run (not all required files exist within that directory). With a directory input the next task would still run and fail with an internal error as opposed to just failing to start. This incurs a cost to the customer. Not only that but if you want to group all files within one directory, e.g. a 'bams' directory and there are multiple bams within that directory (e.g. across many samples), specifying that directory as input would mean that all filees within that directory would be uploaded to the worker, again incurring greater cost and a greater runtime for the app despite potentially only needing a single file from that directory. In my case I would be running many concurrent workflows - the app would have no way of knowing which bam file would be the correct one to use for the command, i.e. the bam file that was specific to just that workflow. I know it is possible to output files from a task to a specific subdirectory using the '--stage-relative-output-folder' argument however this specifies the output directory for all files output by that task and this is not always desirable and doesn't allow to specify output locations for individual files within that task (which is something that is possible and we already do for native applets). The behaviour makes WDL workflows inflexible in comparison to native workflows and I think there would be benefit to the user in supporting the relative output locations specified within the task output section being reflected within the project heirarchy / relative to the specified destination folder. I also know that a reorg app is an option however this is also not ideal as it only moves the files at the end of the completion of the workflow. When compiling/ running a workflow using Cromwell, if a relative output location is specified within the task for an output file e.g. "output/file.txt", the file is then placed within an output directory relative to the execution directory. Is it not possible to replicate similar behaviour but relative to the project for the dxCompiler? |
If it is the individual output file among the output directory that your downstream / native app needs, can you declare a File output along with the directory output in the (upstream) task? That way you get both the output directory as well as the individual output File from the task, and the later can be used as input for downstream. |
Hi, I suppose this would be possible however it can be misleading as we have multiple apps that output files into the same folder - if a folder is declared as an output from one file I don't think it would be clear which files from within that folder came from that app and which came from a different app. I think that behaviour allowing you to specify relative output for individual files is the desired behaviour as it is explicit. |
My team would find the previously requested behaviour in dnanexus/dxWDL#168 very useful.
We are investigating switching over to WDL workflows and are almost there for one of our pipelines. However, one of the problems is that whilst the '--stage-relative-output-folder' argument allows specifying the relative output folder per task for all files produced by that task, it does not allow for individual files to be placed in different directories.
It would be very helpful if the platform allowed for this behaviour through specification in the outputs section of each WDL task, rather than needing to create a separate reorg app (this just adds to the code base that needs to be maintained and means that files are only moved to the correct location at the end of the workflow rather than at the point of delocalisation).
I noticed it has been several years since the last comment on closed issue dnanexus/dxWDL#168 and was hoping you would be able to provide an update as to whether the exploratory work went anywhere and whether the professional services department decided that this functionality would be useful to incorporate into the dxCompiler.
Many thanks!
The text was updated successfully, but these errors were encountered: