[multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage #3220

FANGAreNotGnu · 2023-05-17T22:33:24Z

The current two-stage/layerwise-decay learning rate settings supports use different lr in head and non-head layers. It also allows user to set lr in non-head layers to 0. However,

Most models are more complicated than backbone + head. And we may want to use two stage on head v.s. non-head, and layer freeze on backbone v.s. non-backbone. (e.g. backbone_lr = 1e-5, neck_lr=1e-5, head_lr=1e-3 and backbone_lr = 0, neck_lr=1e-4, head_lr=1e-4 may both converge fast and nice but backbone_lr = 0, neck_lr=0, head_lr=1e-4 may not be a good one)
Only setting lr = 0 in two-stage does not save GPU memory. Setting require_grad=False for backbone parameters can save lots of GPU memory and enable us to run a larger batch_size or a larger model.

So here we introduced a new hyperparameter model.mmdet.frozen_layers that disables gradient update for backbone. It can be used together with any lr_choice, i.e. "single_lr", "two_stage", or "layerwise_decay".

Future work:
Due to bandwidth limit, it's only added to lit_mmdet. Will need benchmarking on other problem types and add to corresponding lit modules. Also the lit modules may need a refactor (add an base module for a better OOP design).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

tonyhoo · 2023-05-17T23:35:26Z

Thank @FANGAreNotGnu, can you run some quick tests to make sure the memory footprint has been reduced post this change?

FANGAreNotGnu · 2023-05-17T23:42:01Z

Thank @FANGAreNotGnu, can you run some quick tests to make sure the memory footprint has been reduced post this change?

Yes I did, the YOLOX-L GPU usage with batch_size=8 is reduced from >16GB to <10GB. (trainable parameters from 54M to 27M)

tonyhoo · 2023-05-17T23:43:38Z

multimodal/src/autogluon/multimodal/models/mmdet_image.py

@@ -390,6 +391,18 @@ def get_layer_ids(

        return name_to_id

+    def get_backbone_layer_names(self):
+        backbone_layer_names = []
+        backbone_layers_patterns = [


does this cover all backbone layer names used by mmdet?

It works for common models in mmdet2. Just find out that it may not work for DINO (in mmdet3), will fix this in next PR that enables DINO support.

tonyhoo · 2023-05-17T23:44:27Z

multimodal/src/autogluon/multimodal/models/mmdet_image.py

+            "encoder",
+        ]
+        for n, _ in self.named_parameters():
+            for pattern in backbone_layers_patterns:


What if the layer name has both "backbone" and "encoder" in it? The layer name will be added twice in current impl? Is that intended?

Thanks for pointing this out. Made the edits to use any().

Ideally "model.backbone.xxx" parameter name is used by non transformer models and "model.encoder.xxx" is used by transformer based models. So there should be no conflicts. Also it won't cause a problem even if there is.

zhiqiangdon · 2023-05-18T01:01:24Z

multimodal/src/autogluon/multimodal/optimization/utils.py

@@ -628,6 +628,73 @@ def get_trainable_params_efficient_finetune(
    return trainable_param_names


+def apply_freeze_backbone_lr(


Is freezing backbone one special case of two stages lr? The difference is whether setting the requires_grad of backbone hyperparameters as False.

The frozen layers in freeze backbone setting should be a subset of the low learning rate layers in two stage setting. See PR description for details.

I see. But in your example, the neck_lr and head_lr are still the same, so the corresponding layers including neck can be treated as "head" layers.

I guess most users don't need to set different learning rates for neck and head. So, from the view of learning rate, it's still two-stage.

I think we may need to improve apply_two_stages_lr or apply_layerwise_lr_decay to support requires_grad=False for the parameters whose learning rate is 0. For object detection models, the head_layer_names attribute can have both neck and head layers.

Our current two-stage lr implementation have two issues:

Only support using head layer v.s. other layers

Use base lr on other layers, and lr * lr_mult on head layers

To support backbone frozen in current two stage implementation, we need to:

Add hyperparameter providing option to freeze layers with lr=0.

Add hyperparameter to support using head layer v.s. other layers or backbone layer v.s. other layers.

Use base lr on head/non-backbone layers, and lr / lr_mult on non-head/backbone layers

The third change involves an implicit change in API and may cause users' confusion.

Two stage lr on head layer v.s. other layers may still be useful since it converges faster.

Right. apply_two_stages_lr can't directly support lr=0 for backbone parameters due to lr * lr_mult. I think there are two choices:

Changing the design lr_mult in two-stage lr. Let lr be the head lr and the backbone lr uses lr * lr_mult but with 0<=lr_mult<1.

Using apply_layerwise_lr_decay instead of apply_two_stages_lr. apply_layerwise_lr_decay can simulate the case where head uses lr and backbone use 0 learning rate. In general, layerwise learning rate decay should perform better than two-stage if we can have a good lr_decay.

I think the definition of head layers can be model-specific. For object detection models, we can count the neck layers as head since we want to finetune them together with the real head layers.

github-actions · 2023-05-18T01:06:30Z

Job PR-3220-d859300 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/d859300/index.html

zhiqiangdon · 2023-05-18T01:08:44Z

The current two-stage/layerwise-decay learning rate settings supports use different lr in head and non-head layers. It also allows user to set lr in non-head layers to 0. However,

Most models are more complicated than backbone + head. And we may want to use two stage on head v.s. non-head, and layer freeze on backbone v.s. non-backbone. (e.g. backbone_lr = 1e-5, neck_lr=1e-5, head_lr=1e-3 and backbone_lr = 0, neck_lr=1e-4, head_lr=1e-4 may both converge fast and nice but backbone_lr = 0, neck_lr=0, head_lr=1e-4 may not be a good one)

Only setting lr = 0 in two-stage does not save GPU memory. Setting require_grad=False for backbone parameters can save lots of GPU memory and enable us to run a larger batch_size or a larger model.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Looks like this PR only addresses the second point, i.e., setting the requires_grad=True for backbone parameters.

github-actions · 2023-05-18T01:49:55Z

Job PR-3220-48164f5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/48164f5/index.html

github-actions · 2023-05-18T03:25:13Z

Job PR-3220-136b7dc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/136b7dc/index.html

github-actions · 2023-05-23T01:50:31Z

Job PR-3220-24f3ae3 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/24f3ae3/index.html

multimodal/src/autogluon/multimodal/optimization/utils.py

multimodal/src/autogluon/multimodal/models/mmdet_image.py

multimodal/src/autogluon/multimodal/configs/optimization/adamw.yaml

FANGAreNotGnu · 2023-05-23T23:23:56Z

Changed the design based on some offline discussions.

github-actions · 2023-05-24T21:53:37Z

Job PR-3220-9b0744f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/9b0744f/index.html

github-actions · 2023-05-24T21:53:51Z

Job PR-3220-2b21df0 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/2b21df0/index.html

github-actions · 2023-05-24T22:22:56Z

Job PR-3220-833f1fd is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/833f1fd/index.html

multimodal/src/autogluon/multimodal/optimization/utils.py

multimodal/src/autogluon/multimodal/models/utils.py

zhiqiangdon · 2023-05-24T23:48:23Z

multimodal/src/autogluon/multimodal/configs/model/fusion_mlp_image_text_tabular.yaml

@@ -149,6 +149,7 @@ model:
      - "image"
    max_img_num_per_col: 1
    output_bbox_format: "xyxy"  # now support xyxy or xywh, for bbox format details see https://keras.io/api/keras_cv/bounding_box/formats/
+    frozen_layers: null


What if removing null? What's the difference between nothing and null?

I think there's no difference. We are using both in the configs.

github-actions · 2023-05-25T02:40:48Z

Job PR-3220-d2af22a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3220/d2af22a/index.html

zhiqiangdon

LGTM.

FANGAreNotGnu added 2 commits May 17, 2023 19:48

add freeze backbone

c2af604

fixes

d859300

FANGAreNotGnu requested review from yongxinw, tonyhoo and zhiqiangdon May 17, 2023 22:43

FANGAreNotGnu changed the title ~~[WIP][multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage~~ [multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage May 17, 2023

FANGAreNotGnu added the model list checked You have updated the model list after modifying multimodal unit tests/docs label May 17, 2023

update test and example

48164f5

tonyhoo reviewed May 17, 2023

View reviewed changes

FANGAreNotGnu added 2 commits May 18, 2023 00:07

fix

136b7dc

remove encoder as backbone

614750f

zhiqiangdon reviewed May 18, 2023

View reviewed changes

FANGAreNotGnu added 3 commits May 19, 2023 19:44

update freeze backbone to be a independent hp

83dd819

Merge https://github.com/FANGAreNotGnu/autogluon into freeze_backbone_lr

2f9623b

remove freeze backbone for other lit module

03900b8

FANGAreNotGnu force-pushed the freeze_backbone_lr branch from 5373100 to 03900b8 Compare May 22, 2023 21:12

fix

24f3ae3

FANGAreNotGnu mentioned this pull request May 23, 2023

[multimodal]Upgrade object detection backend to mmdet 3.0 #3188

Merged

zhiqiangdon reviewed May 23, 2023

View reviewed changes

refactor the design

c3a6bce

fix

833f1fd

FANGAreNotGnu force-pushed the freeze_backbone_lr branch from 2b21df0 to 833f1fd Compare May 24, 2023 19:48

FANGAreNotGnu requested a review from zhiqiangdon May 24, 2023 22:10

zhiqiangdon reviewed May 24, 2023

View reviewed changes

FANGAreNotGnu added 2 commits May 24, 2023 23:58

fix docstrings

0aefa32

Merge https://github.com/autogluon/autogluon into freeze_backbone_lr

d2af22a

zhiqiangdon approved these changes May 25, 2023

View reviewed changes

zhiqiangdon merged commit 476164f into autogluon:master May 25, 2023
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage #3220

[multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage #3220

FANGAreNotGnu commented May 17, 2023 •

edited by zhiqiangdon

tonyhoo commented May 17, 2023

FANGAreNotGnu commented May 17, 2023

tonyhoo May 17, 2023

FANGAreNotGnu May 17, 2023 •

edited

tonyhoo May 17, 2023

FANGAreNotGnu May 18, 2023

zhiqiangdon May 18, 2023 •

edited

FANGAreNotGnu May 18, 2023

zhiqiangdon May 18, 2023

zhiqiangdon May 18, 2023

zhiqiangdon May 18, 2023 •

edited

FANGAreNotGnu May 18, 2023 •

edited

FANGAreNotGnu May 18, 2023 •

edited

zhiqiangdon May 18, 2023 •

edited

zhiqiangdon May 18, 2023 •

edited

github-actions bot commented May 18, 2023

zhiqiangdon commented May 18, 2023

github-actions bot commented May 18, 2023

github-actions bot commented May 18, 2023

github-actions bot commented May 23, 2023

FANGAreNotGnu commented May 23, 2023

github-actions bot commented May 24, 2023

github-actions bot commented May 24, 2023

github-actions bot commented May 24, 2023

zhiqiangdon May 24, 2023

FANGAreNotGnu May 24, 2023

github-actions bot commented May 25, 2023

zhiqiangdon left a comment

		@@ -628,6 +628,73 @@ def get_trainable_params_efficient_finetune(
		return trainable_param_names


		def apply_freeze_backbone_lr(

[multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage #3220

[multimodal]Enable backbone freezing to boost finetuning speed and save GPU usage #3220

Conversation

FANGAreNotGnu commented May 17, 2023 • edited by zhiqiangdon

tonyhoo commented May 17, 2023

FANGAreNotGnu commented May 17, 2023

Choose a reason for hiding this comment

FANGAreNotGnu May 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiangdon May 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiangdon May 18, 2023 • edited

Choose a reason for hiding this comment

FANGAreNotGnu May 18, 2023 • edited

Choose a reason for hiding this comment

FANGAreNotGnu May 18, 2023 • edited

Choose a reason for hiding this comment

zhiqiangdon May 18, 2023 • edited

Choose a reason for hiding this comment

zhiqiangdon May 18, 2023 • edited

Choose a reason for hiding this comment

github-actions bot commented May 18, 2023

zhiqiangdon commented May 18, 2023

github-actions bot commented May 18, 2023

github-actions bot commented May 18, 2023

github-actions bot commented May 23, 2023

FANGAreNotGnu commented May 23, 2023

github-actions bot commented May 24, 2023

github-actions bot commented May 24, 2023

github-actions bot commented May 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 25, 2023

zhiqiangdon left a comment

Choose a reason for hiding this comment

FANGAreNotGnu commented May 17, 2023 •

edited by zhiqiangdon

FANGAreNotGnu May 17, 2023 •

edited

zhiqiangdon May 18, 2023 •

edited

zhiqiangdon May 18, 2023 •

edited

FANGAreNotGnu May 18, 2023 •

edited

FANGAreNotGnu May 18, 2023 •

edited

zhiqiangdon May 18, 2023 •

edited

zhiqiangdon May 18, 2023 •

edited