Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excessive memory usage when predicting variants #21

Open
aryarm opened this issue Oct 26, 2020 · 2 comments
Open

excessive memory usage when predicting variants #21

aryarm opened this issue Oct 26, 2020 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@aryarm
Copy link
Owner

aryarm commented Oct 26, 2020

The problem

@Jaureguy760 discovered that the predict_RF.R script uses an excessive amount of memory (up to ~400 times the size of its input data!). We should investigate why this is happening.

It might just be a quirk of the R tools we are using (ie mlr and ranger), so the following are some other solutions we could use if we can't get mlr and ranger to behave themselves.

Solution 1

There should be a way to make predictions on slices of the data at a time, so that we don't load the entire dataset into memory at once. Maybe we could do predictions on just 1000 rows at a time?

Solution 2

We could declare the expected memory usage of the predict rule via the resources directive, so that the cluster environment will know not to run too many of these jobs at the same time. We should add this to the predict rule:

resources:
    mem_mb=lambda wildcards, input, attempt: int(input.size_mb*500*attempt)

And add this parameter to the qsub command in the run.bash script:

-l h_vmem={resources.mem_mb}
@aryarm aryarm added bug Something isn't working enhancement New feature or request labels Oct 26, 2020
@aryarm aryarm removed the bug Something isn't working label Jan 21, 2021
@aryarm aryarm added bug Something isn't working help wanted Extra attention is needed and removed enhancement New feature or request labels Jul 9, 2021
@aryarm
Copy link
Owner Author

aryarm commented Jul 9, 2021

ok, just updating this for posterity:
We tried solution 2, but it basically just caused none of the jobs to ever get run. So next up might be to figure out some way to do solution 1.

@aryarm
Copy link
Owner Author

aryarm commented Jul 10, 2021

update: this may be related to imbs-hl/ranger#202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant