Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoMM] Support HPO presets #2839

Merged
merged 13 commits into from
Feb 7, 2023
Merged

Conversation

zhiqiangdon
Copy link
Contributor

@zhiqiangdon zhiqiangdon commented Feb 4, 2023

Issue #, if available:

Description of changes:

  1. Support HPO presets, e.g., high_quality_hpo, medium_quality_hpo, best_quality_hpo, or hpo.
  2. Unit tests.
  3. Support combining preset hyperparameters and user-provided hyperparameters.
  4. Support combining preset hyperparameter_tune_kwargs and user-provided hyperparameter_tune_kwargs.
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor(problem_type="classification", presets="high_quality_hpo")
predictor.fit(train_data=train_data)

Text backbone candidates with the number of parameters:

medium quality

  • "google/electra-small-discriminator" (13.5M)
  • "google/flan-t5-small" (35.3M)
  • "microsoft/deberta-v3-xsmall" (22M)
  • "microsoft/MiniLM-L12-H384-uncased" (33.4M)
  • "albert-base-v2" (11.7M)

high quality

  • "google/electra-base-discriminator" (108.9M)
  • "google/flan-t5-base" (109.6M)
  • "microsoft/deberta-v3-small" (141M)
  • "roberta-base" (124.6M)
  • "albert-xlarge-v2" (58.8M)

best quality

  • "microsoft/deberta-v3-base" (183.8M)
  • "google/flan-t5-large" (341.2M)
  • "google/electra-large-discriminator" (334.1M)
  • "roberta-large" (355.4M)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@zhiqiangdon zhiqiangdon changed the title [AutoMM] Add HPO presets [AutoMM] Support HPO presets Feb 4, 2023
@zhiqiangdon zhiqiangdon added the model list checked You have updated the model list after modifying multimodal unit tests/docs label Feb 4, 2023
@github-actions
Copy link

github-actions bot commented Feb 4, 2023

Job PR-2839-7c0fb39 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2839/7c0fb39/index.html

@github-actions
Copy link

github-actions bot commented Feb 5, 2023

Job PR-2839-c9586fe is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2839/c9586fe/index.html

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

Job PR-2839-116d34d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2839/116d34d/index.html

Copy link
Contributor

@bryanyzhu bryanyzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a small comment.

"model.hf_text.checkpoint_name": "google/electra-small-discriminator",
"model.timm_image.checkpoint_name": "mobilenetv3_large_100",
"model.document_transformer.checkpoint_name": "microsoft/layoutlmv2-base-uncased",
"optimization.learning_rate": 4e-4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to specify learning rate here? It seems other default settings only include checkpoint name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This learning rate is used for small backbones, which has been here since the last release.

default_tunable_hyperparameters = {
"optimization.learning_rate": tune.loguniform(1e-5, 1e-2),
"optimization.optim_type": tune.choice(["adamw", "sgd"]),
"optimization.max_epochs": tune.choice(list(range(5, 31))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANGAreNotGnu I'm not sure if this default setting works for detection, since detection usually requires more training epochs. We might need to override these values in detection_hpo presets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just provides the default. Each problem type can further customize them.

default_hyperparameter_tune_kwargs = {
"searcher": "bayes",
"scheduler": "ASHA",
"num_trials": 512,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this number be too large?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It kind of depends on our search space. How many trials do you think are reasonable?

"optimization.learning_rate": tune.loguniform(1e-5, 1e-2),
"optimization.optim_type": tune.choice(["adamw", "sgd"]),
"optimization.max_epochs": tune.choice(list(range(5, 31))),
"env.batch_size": tune.choice([32, 64, 128, 256]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also try smaller batch size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider not tune batch-size for now (have a seperate batch size tuning logic) and focus on select the learning rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch size can also affect the performance. @sxjscience Do you mean tuning per_gpu_batch_size (which should not affect the performance) with lightning's tuner?


def parse_presets_str(presets: str):
use_hpo = False
if presets.endswith("_hpo"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be case sensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presets is already converted to lower case in predictor init.

@@ -604,6 +718,10 @@ def get_automm_presets(problem_type: str, presets: str):
hyperparameter_tune_kwargs
Hyperparameter tuning strategy and kwargs (for example, how many HPO trials to run).
"""
if not presets:
presets = DEFAULT
if presets == "hpo":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case sensitive, consider move the line (presets = presets.lower()) before it.

@sxjscience
Copy link
Collaborator

We can pick a selection criteria for each catalog, for example, we can measure the model size / training time (ideally we should report a curve about training throughput v.s. performance).

For now, we can consider to adopt the following rules:

  • medium quality (<50M backbones)
  • high quality (>=50M,<200M backbones)
  • best quality (>=200M backbones)

We can thus also consider backbones in https://www.sbert.net/docs/pretrained_models.html, and use other common backbones like

Copy link
Collaborator

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I'm not sure about whether we should also search for 'batch_size' as we need to usually use as large batch-size as possible. And the HPO is mainly centered at model selection / search for the best tuning method.

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

Job PR-2839-4aded4d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2839/4aded4d/index.html

@zhiqiangdon zhiqiangdon merged commit 5acf693 into autogluon:master Feb 7, 2023
@zhiqiangdon zhiqiangdon deleted the mm-presets branch February 8, 2023 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model list checked You have updated the model list after modifying multimodal unit tests/docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants