Mb disc space limit for ensemble #874

franchuterivera · 2020-06-05T17:55:09Z

Allow the user to specify the maximum megabytes of disc space that are allowed for models to exist.

The idea is to re-use the existing max models in the disc, so that if a float is provided it will be interpreted as maximum megabytes allowed of disc usage. The reason behind this is to simplify the control logic and usability -- for example, it is simpler than if we added a new variable.

The functionality revolves on the idea that the worst possible disc usage per model will define the disc usage. So we figure out what is the worst disc penalty of having a model, and divide the user-specified amount of megabytes by this number.

Added test code also for this.

autosklearn/ensemble_builder.py

codecov-commenter · 2020-06-17T17:09:21Z

Codecov Report

Merging #874 into development will increase coverage by 0.52%.
The diff coverage is 92.68%.

@@               Coverage Diff               @@
##           development     #874      +/-   ##
===============================================
+ Coverage        84.12%   84.65%   +0.52%     
===============================================
  Files              127      126       -1     
  Lines             9435     9246     -189     
===============================================
- Hits              7937     7827     -110     
+ Misses            1498     1419      -79

Impacted Files	Coverage Δ
autosklearn/ensemble_builder.py	`73.72% <92.68%> (+2.63%)`	⬆️
autosklearn/data/abstract_data_manager.py	`77.02% <0.00%> (-12.17%)`	⬇️
...mponents/feature_preprocessing/nystroem_sampler.py	`85.29% <0.00%> (-5.89%)`	⬇️
..._preprocessing/select_percentile_classification.py	`86.20% <0.00%> (-3.45%)`	⬇️
autosklearn/evaluation/__init__.py	`80.54% <0.00%> (-2.17%)`	⬇️
...ine/components/classification/gradient_boosting.py	`91.89% <0.00%> (-0.91%)`	⬇️
autosklearn/smbo.py	`72.72% <0.00%> (-0.70%)`	⬇️
autosklearn/data/competition_data_manager.py
autosklearn/estimators.py	`90.41% <0.00%> (+0.05%)`	⬆️
autosklearn/metrics/__init__.py	`87.28% <0.00%> (+0.10%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d313f26...41ab718. Read the comment docs.

autosklearn/ensemble_builder.py

mfeurer · 2020-07-02T12:27:21Z

autosklearn/ensemble_builder.py

+        this_model_cost = sum([os.path.getsize(path) for path in paths])
+
+        # get the megabytes
+        return round(this_model_cost / math.pow(1024, 2), 2)


Can this become zero? If yes, it's ambiguous with respect to the initial value of self.read_preds[y_ens_fn] and I suggest changing the initial value to -1 or None.

None will be the default value. If we fail to read the data structure, the prediction will be ignored form the calculation.

autosklearn/ensemble_builder.py

franchuterivera · 2020-07-02T21:32:52Z

Comments have been implemented.
Here I also observed the kernel pca error but also a fit jobs error.

This fit jobs error can be reproduced even on development branch. This fit jobs 2 has to be improved. For instance, in my computer the assertion is AssertionError: 73 != 50, as there can only be top 50 models in disc, and this check performs a ls on the directory.

This is something for our todo list, but not related to this feature.

* Mb disc space limit for ensemble * track disc consumption * Solved artifacts of rebase * py3.5 compatible print message * Don't be pessimistic in Gb calc * Incomporate comments * Handle failure cases in ensemble disk space

mfeurer reviewed Jun 13, 2020

View reviewed changes

franchuterivera added 3 commits June 14, 2020 17:47

Mb disc space limit for ensemble

03274b3

track disc consumption

b599433

Solved artifacts of rebase

cb0f306

franchuterivera force-pushed the restrict_disc_usage branch from b887b88 to cb0f306 Compare June 14, 2020 16:22

franchuterivera added 2 commits June 15, 2020 09:28

py3.5 compatible print message

ebc8930

Don't be pessimistic in Gb calc

408a818

mfeurer reviewed Jul 2, 2020

View reviewed changes

franchuterivera added 2 commits July 2, 2020 18:03

Incomporate comments

ab66a0c

Handle failure cases in ensemble disk space

41ab718

mfeurer approved these changes Jul 3, 2020

View reviewed changes

mfeurer merged commit ffead2b into automl:development Jul 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mb disc space limit for ensemble #874

Mb disc space limit for ensemble #874

franchuterivera commented Jun 5, 2020

codecov-commenter commented Jun 17, 2020 •

edited

mfeurer Jul 2, 2020

franchuterivera Jul 2, 2020

franchuterivera commented Jul 2, 2020

Mb disc space limit for ensemble #874

Mb disc space limit for ensemble #874

Conversation

franchuterivera commented Jun 5, 2020

codecov-commenter commented Jun 17, 2020 • edited

Codecov Report

mfeurer Jul 2, 2020

Choose a reason for hiding this comment

franchuterivera Jul 2, 2020

Choose a reason for hiding this comment

franchuterivera commented Jul 2, 2020

codecov-commenter commented Jun 17, 2020 •

edited