-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mb disc space limit for ensemble #874
Mb disc space limit for ensemble #874
Conversation
b887b88
to
cb0f306
Compare
Codecov Report
@@ Coverage Diff @@
## development #874 +/- ##
===============================================
+ Coverage 84.12% 84.65% +0.52%
===============================================
Files 127 126 -1
Lines 9435 9246 -189
===============================================
- Hits 7937 7827 -110
+ Misses 1498 1419 -79
Continue to review full report at Codecov.
|
this_model_cost = sum([os.path.getsize(path) for path in paths]) | ||
|
||
# get the megabytes | ||
return round(this_model_cost / math.pow(1024, 2), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this become zero? If yes, it's ambiguous with respect to the initial value of self.read_preds[y_ens_fn]
and I suggest changing the initial value to -1 or None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None will be the default value. If we fail to read the data structure, the prediction will be ignored form the calculation.
Comments have been implemented. This fit jobs error can be reproduced even on development branch. This fit jobs 2 has to be improved. For instance, in my computer the assertion is AssertionError: 73 != 50, as there can only be top 50 models in disc, and this check performs a ls on the directory. This is something for our todo list, but not related to this feature. |
* Mb disc space limit for ensemble * track disc consumption * Solved artifacts of rebase * py3.5 compatible print message * Don't be pessimistic in Gb calc * Incomporate comments * Handle failure cases in ensemble disk space
Allow the user to specify the maximum megabytes of disc space that are allowed for models to exist.
The idea is to re-use the existing max models in the disc, so that if a float is provided it will be interpreted as maximum megabytes allowed of disc usage. The reason behind this is to simplify the control logic and usability -- for example, it is simpler than if we added a new variable.
The functionality revolves on the idea that the worst possible disc usage per model will define the disc usage. So we figure out what is the worst disc penalty of having a model, and divide the user-specified amount of megabytes by this number.
Added test code also for this.