-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default limit on file input to read_lines is too small #2768
Comments
One person's "way too small" is another's "way too big" so whatever we have is going to be inappropriate for somebody... On the other hand, 128k does seem a little too small (at least until we have actual, real FoFN support) |
Hmm it doesn't seem like a good idea to load a 5GB file over network in Cromwell's memory 😄 |
The limit is intentionally small. If you’ve got that much data you shouldn’t be using read lines |
@ldgauthier has the actual use case. We've already upgraded the methods cromwell. Is there a way to scatter over an iterator, so the whole list does not have to be read into RAM? |
Deleting some comments due to being interspersed with untrue things. As per #1762 the intention was to have spec mandated minimums and implementation level maximums. The former never happened so technically it's not part of the spec at all. And as I noted, Cromwell team is no longer in charge of the WDL spec, so ... That said, it's tunable. You can increase it if you want. I wouldn't recommend going all that high unless you're willing to really jam a lot of memory in there. As per your iterator comment, I go back to the Cromwell team doesn't control WDL anymore and there's no WDL construct which would allow that. There's been chatter about things which might help but they're unlikely to arrive until after WDL 1.0 |
If it can be specified by a workflow option, then everyone is happy. And
we don't have to wait for 1.0.
…On Oct 20, 2017 20:30, "Jeff Gentry" ***@***.***> wrote:
Deleting some comments due to being interspersed with untrue things.
@LeeTL1220 <https://github.com/leetl1220>
As per #1762 <#1762> the
intention was to have spec mandated minimums and implementation level
maximums. The former never happened so technically it's not part of the
spec at all. And as I noted, Cromwell team is no longer in charge of the
WDL spec, so ...
That said, it's tunable. You can increase it if you want. I wouldn't
recommend going all that high unless you're willing to really jam a lot of
memory in there.
As per your iterator comment, I go back to the Cromwell team doesn't
control WDL anymore and there's no WDL construct which would allow that.
There's been chatter about things which might help but they're unlikely to
arrive until after WDL 1.0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2768 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDXkyBD9ZS-tfUwjFaEVS_i9Gro7EOUks5suTs2gaJpZM4QBFpH>
.
|
The point of the setting is to protect the server from users. Putting the power directly in the hands of the users seems unwise |
I am using exactly the wdl and json offered by gatk GitHub page for gatk4-germline-snps-indels, locally, I got this error, intervals-hg38.even.handcurated.20k.intervals is larger than 128000 Bytes. Maximum read limits can be adjusted in the configuration under system.input-read-limits. |
You want For reference you are setting this:
|
Thanks very much. Problem solved. |
@LeeTL1220 since its possible to configure this limit as needed, I'm hoping you've got what you needed. Feel free to reopen if I missed something. |
Currently, this is set to default to 128000. This is too small for most practical use, especially given that gs URLs can get quite long. Can the new limit be much higher? We'll need 5GB (no joke!) for some our larger analyses. Or at least a workflow option to temporarily override?
Otherwise, we get an error such as:
"Workflow has invalid declarations: Could not evaluate workflow declarations:\nSingleSampleGenotyping.gvcfs_list:\n\tUse of WdlSingleFile(gs://broad-dsde-methods/gauthier/Finnish_FE_WGS.1000samples.gvcf_list) failed because the file was too big (174730 bytes when only files of up to 128000 bytes are permissible"
The text was updated successfully, but these errors were encountered: