Maximum resources #3142

yinweisu · 2023-04-14T17:56:08Z

Issue #, if available:
torch models training will be slowed by the usage of virtual cores.

Description of changes:

Add maximum resources check when in distributed mode

Example run output with a newly launched cluster of 8 m5.24xlarge machine:
The training time matches a local run and appear to be normal now

Fitting 1 L1 models ...
Fitting model: NeuralNetFastAI_BAG_L1 ...
        Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelDistributedFoldFittingStrategy
        0.9224   = Validation score   (accuracy)
        286.06s  = Training   runtime
        3.44s    = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
        0.9224   = Validation score   (accuracy)
        0.08s    = Training   runtime
        0.04s    = Validation runtime
AutoGluon training complete, total runtime = 298.45s ... Best model: "WeightedEnsemble_L2"

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2023-04-14T20:26:34Z

Job PR-3142-7a800f8 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3142/7a800f8/index.html

tabular/src/autogluon/tabular/models/knn/knn_model.py

core/src/autogluon/core/models/abstract/abstract_model.py

tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py

tabular/src/autogluon/tabular/models/fastainn/tabular_nn_fastai.py

core/src/autogluon/core/models/abstract/abstract_model.py

tabular/src/autogluon/tabular/models/fastainn/tabular_nn_fastai.py

tabular/src/autogluon/tabular/models/knn/knn_model.py

tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py

Innixma · 2023-04-14T22:13:43Z

core/src/autogluon/core/models/abstract/abstract_model.py

+        total_resources: Optional[Dict[str, Union[int, float]]] = None,
+        parallel_hpo: bool = False,
+        **kwargs
+        ):


add docstring explaining this, add return type

Innixma · 2023-04-14T22:13:50Z

core/src/autogluon/core/models/abstract/abstract_model.py

        return kwargs

+    def _preprocess_fit_resources(self, silent=False, total_resources=None, parallel_hpo=False, **kwargs):


add return type

Innixma

LGTM! Added a few minor comments

Innixma · 2023-04-14T22:17:29Z

core/src/autogluon/core/models/abstract/abstract_model.py

        return kwargs

+    def _preprocess_fit_resources(self, silent=False, total_resources=None, parallel_hpo=False, **kwargs):


add type hints

github-actions · 2023-04-14T22:52:06Z

Job PR-3142-bd6372d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3142/bd6372d/index.html

yinweisu · 2023-04-14T22:52:41Z

Merging as unit tests for previous commits have passed, the most recent commit only added comments

github-actions · 2023-04-14T23:22:24Z

Job PR-3142-f76a039 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3142/f76a039/index.html

github-actions · 2023-04-15T00:00:18Z

Job PR-3142-178cf94 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3142/178cf94/index.html

github-actions · 2023-04-15T00:57:18Z

Job PR-3142-d22642f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3142/d22642f/index.html

maximum resources

7a800f8

yinweisu requested a review from Innixma April 14, 2023 17:56

update

bd6372d