-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoMM] Support HPO presets #2839
Conversation
Job PR-2839-7c0fb39 is done. |
Job PR-2839-c9586fe is done. |
Job PR-2839-116d34d is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a small comment.
"model.hf_text.checkpoint_name": "google/electra-small-discriminator", | ||
"model.timm_image.checkpoint_name": "mobilenetv3_large_100", | ||
"model.document_transformer.checkpoint_name": "microsoft/layoutlmv2-base-uncased", | ||
"optimization.learning_rate": 4e-4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to specify learning rate here? It seems other default settings only include checkpoint name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This learning rate is used for small backbones, which has been here since the last release.
default_tunable_hyperparameters = { | ||
"optimization.learning_rate": tune.loguniform(1e-5, 1e-2), | ||
"optimization.optim_type": tune.choice(["adamw", "sgd"]), | ||
"optimization.max_epochs": tune.choice(list(range(5, 31))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FANGAreNotGnu I'm not sure if this default setting works for detection, since detection usually requires more training epochs. We might need to override these values in detection_hpo presets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just provides the default. Each problem type can further customize them.
default_hyperparameter_tune_kwargs = { | ||
"searcher": "bayes", | ||
"scheduler": "ASHA", | ||
"num_trials": 512, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this number be too large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It kind of depends on our search space. How many trials do you think are reasonable?
"optimization.learning_rate": tune.loguniform(1e-5, 1e-2), | ||
"optimization.optim_type": tune.choice(["adamw", "sgd"]), | ||
"optimization.max_epochs": tune.choice(list(range(5, 31))), | ||
"env.batch_size": tune.choice([32, 64, 128, 256]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also try smaller batch size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider not tune batch-size for now (have a seperate batch size tuning logic) and focus on select the learning rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch size can also affect the performance. @sxjscience Do you mean tuning per_gpu_batch_size
(which should not affect the performance) with lightning's tuner?
|
||
def parse_presets_str(presets: str): | ||
use_hpo = False | ||
if presets.endswith("_hpo"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this be case sensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presets
is already converted to lower case in predictor init.
@@ -604,6 +718,10 @@ def get_automm_presets(problem_type: str, presets: str): | |||
hyperparameter_tune_kwargs | |||
Hyperparameter tuning strategy and kwargs (for example, how many HPO trials to run). | |||
""" | |||
if not presets: | |||
presets = DEFAULT | |||
if presets == "hpo": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case sensitive, consider move the line (presets = presets.lower()) before it.
We can pick a selection criteria for each catalog, for example, we can measure the model size / training time (ideally we should report a curve about training throughput v.s. performance). For now, we can consider to adopt the following rules:
We can thus also consider backbones in https://www.sbert.net/docs/pretrained_models.html, and use other common backbones like
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. I'm not sure about whether we should also search for 'batch_size' as we need to usually use as large batch-size as possible. And the HPO is mainly centered at model selection / search for the best tuning method.
Job PR-2839-4aded4d is done. |
Issue #, if available:
Description of changes:
high_quality_hpo
,medium_quality_hpo
,best_quality_hpo
, orhpo
.Text backbone candidates with the number of parameters:
medium quality
high quality
best quality
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.