-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Code implementation of Conv-LoRA #3933
Conversation
Great work! One real quick question, have you tried conv-lora on standard text tasks instead of image/SAM tasks? If so, how was it? If havnt tried, do you think it is a more general-purpose PEFT method, or it is more of a SAM/CV-specific approach? Thanks a lot! |
Thank you for your question. While I haven't tried text tasks yet, my understanding is that Conv-LoRA is primarily designed for image tasks. Conv-LoRA incorporates local priors into image features at appropriate scales, considering potential variations in object scale. This involves interpolating image features to larger scales than default and subsequently employing convolution operations for injecting local priors. In our paper, we find that interpolating features to larger scales for local prior injection is more beneficial, given that features in the Vision Transformer (ViT) are downscaled by a factor (e.g., 16) from the original continuous image. However, texts are 1-D discrete sequences and lack a concept akin to "object scale". Consequently, considering the feature processing in Conv-LoRA and its motivation, it is unsuitable for text tasks. |
Excellent! Thanks for the explanations. |
Job PR-3933-17d9af4 is done. |
def train(self, mode: bool = True): | ||
super().train(mode) | ||
for module in self.modules(): | ||
if isinstance(module, ConvLoRALinear): | ||
self.output_moe_loss = True | ||
return self | ||
|
||
return self | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function sets output_moe_loss
to True for training. During inference, it should be False, but it seems always True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the MoE loss when calculating the validation loss. During the validation process, the module mode is set to "eval". So we cannot distinguish the validation and inference process here.
|
||
# Calculate the gating values. | ||
lora_res = lora_res.permute(0, 3, 1, 2).contiguous() | ||
gates, moe_loss = self.lora_moe_gating(lora_res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid computing the moe loss during inference for better efficiency?
from torch.distributions.normal import Normal | ||
|
||
|
||
class MoEConv(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a more accurate name? MoEGate
? This class doesn't contain convolutions and is to determine gates.
multimodal/src/autogluon/multimodal/models/conv_lora/modeling_sam.py
Outdated
Show resolved
Hide resolved
multimodal/src/autogluon/multimodal/models/conv_lora/adaptation_layers.py
Outdated
Show resolved
Hide resolved
lora_alpha: int = 1, | ||
lora_dropout: float = 0.0, | ||
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out) | ||
merge_weights: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Conv-Lora reparameterizable?
It is more complicated than lora, lora just merge the weights by multiplying the matrices.
but here we have convolutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conv-LoRA is not reparameterizable mainly due to its interpolation operation.
Actually convolutions is not the main reason because convolution layer could be re-parameterized into FC layer in some cases. You could refer to papers about structural re-parameterization for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add examples of using conv-lora in the path: https://github.com/autogluon/autogluon/tree/master/examples/automm/Conv-LoRA
Job PR-3933-24ee8b2 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…tch-4 * 'master' of https://github.com/awslabs/autogluon: (46 commits) [core] move transformers to setup_utils, bump dependency version (autogluon#3984) [AutoMM] Fix one lightning upgrade issue (autogluon#3991) [CI][Feature] Create a package version table (autogluon#3972) [v.1.1][Upgrade] PyTorch 2.1 and CUDA 12.1 upgrade (autogluon#3982) [WIP] Code implementation of Conv-LoRA (autogluon#3933) [timeseries] Ensure that all metrics handle missing values in the target (autogluon#3966) [timeseries] Fix path and device bugs (autogluon#3979) [AutoMM]Remove grounding-dino (autogluon#3974) [Docs] Update install modules content (autogluon#3976) Add note on pd.to_datetime (autogluon#3975) [AutoMM] Improve DINO performance (autogluon#3970) Minor correction in differ to pick correct environment (autogluon#3968) Fix windows python 3.11 issue by removing ray (autogluon#3956) [CI][Feature] Package Version Comparator (autogluon#3962) [timeseries] Add support for categorical covariates (autogluon#3874) [timeseries] Add method for plotting forecasts (autogluon#3889) Update conf.py copyright to reflect current year (autogluon#3932) [Timeseries][CI]Refactor CI to skip AutoMM and Tabular tests w.r.t timeseries changes (autogluon#3942) Fix HPO crash in memory check (autogluon#3931) [AutoMM][CI] Capping scikit-learn to avoid HPO test failure (autogluon#3947) ...
Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-160.us-west-2.compute.internal> Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-160.us-west-2.compute.internal> Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.