Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default limit on file input to read_lines is too small #2768

Closed
LeeTL1220 opened this issue Oct 20, 2017 · 11 comments
Closed

Default limit on file input to read_lines is too small #2768

LeeTL1220 opened this issue Oct 20, 2017 · 11 comments

Comments

@LeeTL1220
Copy link

Currently, this is set to default to 128000. This is too small for most practical use, especially given that gs URLs can get quite long. Can the new limit be much higher? We'll need 5GB (no joke!) for some our larger analyses. Or at least a workflow option to temporarily override?

Otherwise, we get an error such as:
"Workflow has invalid declarations: Could not evaluate workflow declarations:\nSingleSampleGenotyping.gvcfs_list:\n\tUse of WdlSingleFile(gs://broad-dsde-methods/gauthier/Finnish_FE_WGS.1000samples.gvcf_list) failed because the file was too big (174730 bytes when only files of up to 128000 bytes are permissible"

@cjllanwarne
Copy link
Contributor

One person's "way too small" is another's "way too big" so whatever we have is going to be inappropriate for somebody...

On the other hand, 128k does seem a little too small (at least until we have actual, real FoFN support)

@Horneth
Copy link
Contributor

Horneth commented Oct 20, 2017

We'll need 5GB (no joke!)

Hmm it doesn't seem like a good idea to load a 5GB file over network in Cromwell's memory 😄

@geoffjentry
Copy link
Contributor

The limit is intentionally small. If you’ve got that much data you shouldn’t be using read lines

@LeeTL1220
Copy link
Author

@ldgauthier has the actual use case. We've already upgraded the methods cromwell. Is there a way to scatter over an iterator, so the whole list does not have to be read into RAM?

@geoffjentry
Copy link
Contributor

Deleting some comments due to being interspersed with untrue things.

@LeeTL1220

As per #1762 the intention was to have spec mandated minimums and implementation level maximums. The former never happened so technically it's not part of the spec at all. And as I noted, Cromwell team is no longer in charge of the WDL spec, so ...

That said, it's tunable. You can increase it if you want. I wouldn't recommend going all that high unless you're willing to really jam a lot of memory in there.

As per your iterator comment, I go back to the Cromwell team doesn't control WDL anymore and there's no WDL construct which would allow that. There's been chatter about things which might help but they're unlikely to arrive until after WDL 1.0

@LeeTL1220
Copy link
Author

LeeTL1220 commented Oct 21, 2017 via email

@geoffjentry
Copy link
Contributor

The point of the setting is to protect the server from users. Putting the power directly in the hands of the users seems unwise

@shuang-luo
Copy link

I am using exactly the wdl and json offered by gatk GitHub page for gatk4-germline-snps-indels, locally, I got this error, intervals-hg38.even.handcurated.20k.intervals is larger than 128000 Bytes. Maximum read limits can be adjusted in the configuration under system.input-read-limits.
I tried to change it via type this in command line: java -Dsystem.input-read-limits=500000 -jar /cromwell-34.jar
Didn't work.
Who can tell me how to fix it?

@danbills
Copy link
Contributor

You want java -Dsystem.input-read-limits.lines=500000 -jar /cromwell-34.jar

For reference you are setting this:

@shuang-luo
Copy link

Thanks very much. Problem solved.

@ruchim
Copy link
Contributor

ruchim commented Aug 30, 2018

@LeeTL1220 since its possible to configure this limit as needed, I'm hoping you've got what you needed. Feel free to reopen if I missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants