You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Jaureguy760 discovered that the predict_RF.R script uses an excessive amount of memory (up to ~400 times the size of its input data!). We should investigate why this is happening.
It might just be a quirk of the R tools we are using (ie mlr and ranger), so the following are some other solutions we could use if we can't get mlr and ranger to behave themselves.
Solution 1
There should be a way to make predictions on slices of the data at a time, so that we don't load the entire dataset into memory at once. Maybe we could do predictions on just 1000 rows at a time?
Solution 2
We could declare the expected memory usage of the predict rule via the resources directive, so that the cluster environment will know not to run too many of these jobs at the same time. We should add this to the predict rule:
ok, just updating this for posterity:
We tried solution 2, but it basically just caused none of the jobs to ever get run. So next up might be to figure out some way to do solution 1.
The problem
@Jaureguy760 discovered that the
predict_RF.R
script uses an excessive amount of memory (up to ~400 times the size of its input data!). We should investigate why this is happening.It might just be a quirk of the R tools we are using (ie
mlr
andranger
), so the following are some other solutions we could use if we can't getmlr
andranger
to behave themselves.Solution 1
There should be a way to make predictions on slices of the data at a time, so that we don't load the entire dataset into memory at once. Maybe we could do predictions on just 1000 rows at a time?
Solution 2
We could declare the expected memory usage of the predict rule via the resources directive, so that the cluster environment will know not to run too many of these jobs at the same time. We should add this to the predict rule:
And add this parameter to the
qsub
command in therun.bash
script:The text was updated successfully, but these errors were encountered: