Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer size for sort #299

Open
pappewaio opened this issue Feb 8, 2022 · 0 comments
Open

Buffer size for sort #299

pappewaio opened this issue Feb 8, 2022 · 0 comments
Assignees

Comments

@pappewaio
Copy link
Contributor

Great, it seems we agree on most stuff here.

5. Btw, is there a way to control how much memory the sort is allowed to use? Or by default it it'll use whatever is available?

When cleaning, sort doesn't need to use much memory, and it will use default --buffer-size, which is calculated on the fly. For many sorts in the cleaning, adding --buffer-size, won't have an effect as sort is used with a pipe "|", and only few MBs are being used. So far no one has experienced any problems with the default settings. It might be worth adding parallelisation and --buffer-size as options as well though. It will likely make things run much faster, provided more cpus can be accessed.

I have added sort --buffer-size=20G for the sorting during snpdb specific preparation, and I think it is because adding parallelisation failed the default buffer calculation. It should be added as an option in nextflow.config, and the behaviour should be explained somewhere.

--buffer-size was explained well in this forum post:
https://stackoverflow.com/questions/37514283/gnu-sort-default-buffer-size

Originally posted by @pappewaio in #256 (comment)

@pappewaio pappewaio self-assigned this Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant