Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tabular] Resource Allocation Fix #2536

Merged
merged 14 commits into from
Dec 12, 2022

Conversation

yinweisu
Copy link
Collaborator

@yinweisu yinweisu commented Dec 8, 2022

Issue #, if available:
#2446

Description of changes:

  • Fixed the case when user didn't specify total resources, but specified lower level resources. Previously would pass default value based on the model to the lower value as the total resources, which is incorrect. Now will check if user passed lower level requirements, and use those.
  • Added a special case when user provide both num_resource and ag_args_fit when not doing bagging nor hpo. Might think about better way to handle it in the future...
  • Updated unit tests and added two more tests to cover cases when no resource requirement is specified, we should be using the default value based on the model.
  • Added end-to-end tests...

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

core/src/autogluon/core/models/abstract/abstract_model.py Outdated Show resolved Hide resolved
core/src/autogluon/core/models/abstract/abstract_model.py Outdated Show resolved Hide resolved
user_specified_lower_level_num_cpus = min(user_specified_model_level_num_cpus * k_fold, system_num_cpus)
if user_specified_lower_level_num_gpus is not None:
if user_specified_model_level_num_gpus is not None:
user_specified_lower_level_num_gpus = min(user_specified_lower_level_num_gpus * k_fold, user_specified_model_level_num_gpus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Why are we changing the user specified num_gpus for ag_args_fit_ensemble?
Why are we multiplying ag_args_fit_ensemble by k_fold??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mark as resolved without responding with an answer. This is a major bug if it is indeed a bug.

Another question would be: Why didn't the unit tests catch this bug?

@@ -190,7 +210,7 @@ def test_nonbagged_model_with_total_resources_and_model_resources(mock_system_re
hyperparameters={
'ag_args_fit': {
'num_cpus': 1,
'num_gpus': 1
'num_gpus': 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hard-coded? Wouldn't it be way easier to just have this be in the arguments of the test?

model_base = DummyModel()
bagged_model = DummyBaggedModel(
model_base,
hyperparameters={}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hard-coded? Wouldn't it be way easier to just have this be in the arguments of the test?

This way we can specify ag_args_fit_ensemble for example.

Example for ag_args_fit_ensemble:

hyperparameters={
    'ag_args_fit': {'num_cpus': 4, 'num_gpus': 1}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are intentionally testing when not providing any user specified resources, so the hyperparameter is empty dict. I should just remove hyperparameters

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

Job PR-2536-0898da1 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2536/0898da1/index.html

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

Job PR-2536-adc4354 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2536/adc4354/index.html

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

Job PR-2536-fd6a989 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2536/fd6a989/index.html

@yinweisu yinweisu merged commit 8116f27 into autogluon:master Dec 12, 2022
@yinweisu yinweisu deleted the fix_resource_allocation branch December 12, 2022 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants