Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Code implementation of Conv-LoRA #3933

Merged
merged 5 commits into from
Mar 18, 2024

Conversation

Harry-zzh
Copy link
Collaborator

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@hohoCode
Copy link

Great work! One real quick question, have you tried conv-lora on standard text tasks instead of image/SAM tasks? If so, how was it? If havnt tried, do you think it is a more general-purpose PEFT method, or it is more of a SAM/CV-specific approach? Thanks a lot!

@Harry-zzh
Copy link
Collaborator Author

Great work! One real quick question, have you tried conv-lora on standard text tasks instead of image/SAM tasks? If so, how was it? If havnt tried, do you think it is a more general-purpose PEFT method, or it is more of a SAM/CV-specific approach? Thanks a lot!

Thank you for your question. While I haven't tried text tasks yet, my understanding is that Conv-LoRA is primarily designed for image tasks.

Conv-LoRA incorporates local priors into image features at appropriate scales, considering potential variations in object scale. This involves interpolating image features to larger scales than default and subsequently employing convolution operations for injecting local priors. In our paper, we find that interpolating features to larger scales for local prior injection is more beneficial, given that features in the Vision Transformer (ViT) are downscaled by a factor (e.g., 16) from the original continuous image.

However, texts are 1-D discrete sequences and lack a concept akin to "object scale". Consequently, considering the feature processing in Conv-LoRA and its motivation, it is unsuitable for text tasks.

@hohoCode
Copy link

we find that interpolating features to larger scales for local prior injection is more beneficial, given that features in the Vision Transformer (ViT) are downscaled by a factor (e.g., 16) from the original continuous image

Excellent! Thanks for the explanations.

@zhiqiangdon zhiqiangdon added model list checked You have updated the model list after modifying multimodal unit tests/docs run-multi-gpu Run multimodal multi-gpu tests labels Feb 20, 2024
@zhiqiangdon zhiqiangdon self-requested a review February 27, 2024 07:11
Copy link

Job PR-3933-17d9af4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3933/17d9af4/index.html

Comment on lines +372 to +380
def train(self, mode: bool = True):
super().train(mode)
for module in self.modules():
if isinstance(module, ConvLoRALinear):
self.output_moe_loss = True
return self

return self

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function sets output_moe_loss to True for training. During inference, it should be False, but it seems always True?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the MoE loss when calculating the validation loss. During the validation process, the module mode is set to "eval". So we cannot distinguish the validation and inference process here.


# Calculate the gating values.
lora_res = lora_res.permute(0, 3, 1, 2).contiguous()
gates, moe_loss = self.lora_moe_gating(lora_res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid computing the moe loss during inference for better efficiency?

from torch.distributions.normal import Normal


class MoEConv(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a more accurate name? MoEGate? This class doesn't contain convolutions and is to determine gates.

multimodal/src/autogluon/multimodal/constants.py Outdated Show resolved Hide resolved
lora_alpha: int = 1,
lora_dropout: float = 0.0,
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
merge_weights: bool = False,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Conv-Lora reparameterizable?
It is more complicated than lora, lora just merge the weights by multiplying the matrices.
but here we have convolutions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conv-LoRA is not reparameterizable mainly due to its interpolation operation.
Actually convolutions is not the main reason because convolution layer could be re-parameterized into FC layer in some cases. You could refer to papers about structural re-parameterization for more details.

Copy link
Contributor

@zhiqiangdon zhiqiangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add examples of using conv-lora in the path: https://github.com/autogluon/autogluon/tree/master/examples/automm/Conv-LoRA

Copy link

Job PR-3933-24ee8b2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3933/24ee8b2/index.html

Copy link
Contributor

@zhiqiangdon zhiqiangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zhiqiangdon zhiqiangdon merged commit 7d8cef5 into autogluon:master Mar 18, 2024
36 checks passed
ddelange added a commit to ddelange/autogluon that referenced this pull request Mar 21, 2024
…tch-4

* 'master' of https://github.com/awslabs/autogluon: (46 commits)
  [core] move transformers to setup_utils, bump dependency version (autogluon#3984)
  [AutoMM] Fix one lightning upgrade issue (autogluon#3991)
  [CI][Feature] Create a package version table (autogluon#3972)
  [v.1.1][Upgrade] PyTorch 2.1 and CUDA 12.1 upgrade (autogluon#3982)
  [WIP] Code implementation of Conv-LoRA (autogluon#3933)
  [timeseries] Ensure that all metrics handle missing values in the target (autogluon#3966)
  [timeseries] Fix path and device bugs (autogluon#3979)
  [AutoMM]Remove grounding-dino (autogluon#3974)
  [Docs] Update install modules content (autogluon#3976)
  Add note on pd.to_datetime (autogluon#3975)
  [AutoMM] Improve DINO performance (autogluon#3970)
  Minor correction in differ to pick correct environment (autogluon#3968)
  Fix windows python 3.11 issue by removing ray (autogluon#3956)
  [CI][Feature] Package Version Comparator (autogluon#3962)
  [timeseries] Add support for categorical covariates (autogluon#3874)
  [timeseries] Add method for plotting forecasts (autogluon#3889)
  Update conf.py copyright to reflect current year (autogluon#3932)
  [Timeseries][CI]Refactor CI to skip AutoMM and Tabular tests w.r.t timeseries changes (autogluon#3942)
  Fix HPO crash in memory check (autogluon#3931)
  [AutoMM][CI] Capping scikit-learn to avoid HPO test failure (autogluon#3947)
  ...
prateekdesai04 pushed a commit to prateekdesai04/autogluon that referenced this pull request Apr 3, 2024
Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-160.us-west-2.compute.internal>
Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>
LennartPurucker pushed a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024
Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-160.us-west-2.compute.internal>
Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model list checked You have updated the model list after modifying multimodal unit tests/docs run-multi-gpu Run multimodal multi-gpu tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants