Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example of few-shot memory bank model with MultiModal Feature Extraction #2822

Merged
merged 15 commits into from
Feb 10, 2023

Conversation

Linuxdex
Copy link
Contributor

@Linuxdex Linuxdex commented Feb 2, 2023

This is an example which provides a simple and clear way of implementing a memory-bank-powered few-shot learning model with AutoGluon MultiModal according to Tip-adapter.

The idea is to store <feature, label> pairs from the training data in a key-value memory bank. In the prediction phase, we compare the similarity between the test image features and the memory bank keys, and aggregate the prediction logits. The logits obtained via feature-similarity is combined with logits obtained from a classification model that directly predicts the label from the features.

Experiments show that adding a memory-bank can improve the performance of image, text and image-text classification in the few-shot learning scenario.

@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Job PR-2822-9dbea0c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/9dbea0c/index.html

Copy link
Contributor

@bryanyzhu bryanyzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, performance looks good.

examples/automm/cache_adapter/README.md Outdated Show resolved Hide resolved
examples/automm/cache_adapter/README.md Outdated Show resolved Hide resolved
examples/automm/cache_adapter/README.md Outdated Show resolved Hide resolved
examples/automm/cache_adapter/cache_adapter.py Outdated Show resolved Hide resolved
examples/automm/cache_adapter/cache_adapter.py Outdated Show resolved Hide resolved
@sxjscience
Copy link
Collaborator

@Linuxdex , Let's add more description of the memory cache structure.

@@ -0,0 +1,65 @@
# Use MultiModal Feature Extraction to Create a Few-shot Cache Adapter Model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel that cache adapter is not a good name.

@github-actions
Copy link

github-actions bot commented Feb 3, 2023

Job PR-2822-9a7909b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/9a7909b/index.html

@github-actions
Copy link

github-actions bot commented Feb 3, 2023

Job PR-2822-20b107f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/20b107f/index.html

@github-actions
Copy link

github-actions bot commented Feb 3, 2023

Job PR-2822-f3ca830 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/f3ca830/index.html

@github-actions
Copy link

github-actions bot commented Feb 3, 2023

Job PR-2822-b7841fe is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/b7841fe/index.html

@sxjscience
Copy link
Collaborator

@Linuxdex You may also compare with the baseline of feature extraction + SVM, which has been added in #2850

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

Job PR-2822-576cd0a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/576cd0a/index.html

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

Job PR-2822-86fbc2d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/86fbc2d/index.html

@github-actions
Copy link

github-actions bot commented Feb 9, 2023

Job PR-2822-339f34a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/339f34a/index.html

@Linuxdex Linuxdex changed the title [WIP] Add an example of few-shot cache model with MultiModal Feature Extraction Add an example of few-shot memory bank model with MultiModal Feature Extraction Feb 9, 2023
Copy link
Collaborator

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can revise the textual descriptions in another PR.


Memory bank follows the excellent design of [Tip-Adapter](https://arxiv.org/pdf/2207.09519.pdf) which stores the image features of few-shot training set to improve the performance of zero-shot CLIP through feature similarity. The stored features can also serve as the initialization of a trainable classifier. This ProtoNet-like design makes full use of few-shot training information and leads to good performance [3]. We believe that the effectiveness of this design is not limited to CLIP, and can be widely applied to few-shot classification tasks of images and texts.

Memory bank which is the derivative application of Tip-Adapter obtains diversified multi-modal features through MultiModal Feature Extraction. In this example, we first trained a linear classifier based on multi-modal features to generate baseline accuracy. Then, the similarity result between features and memory bank is introduced to baseline predict probability. Finally, an additional linear adapter which is initialized with memory bank is trained to help few-shot classification.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rephrase this paragraph with proper English.


Memory bank which is the derivative application of Tip-Adapter obtains diversified multi-modal features through MultiModal Feature Extraction. In this example, we first trained a linear classifier based on multi-modal features to generate baseline accuracy. Then, the similarity result between features and memory bank is introduced to baseline predict probability. Finally, an additional linear adapter which is initialized with memory bank is trained to help few-shot classification.

Hyper-parameters `alpha` and `beta` which adjust the memory bank are modified through grid search on validation set to attain the superior performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Need to revise the paragraph.

return features


def generate_clip_weights(args, classnames, template, predictor):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider to just introduce the variable called semantic_label_embedding and compare the similarity between semantic_label_embedding and the embeddings of the labels stored in the memory bank.

@github-actions
Copy link

github-actions bot commented Feb 9, 2023

Job PR-2822-91d77d2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/91d77d2/index.html

@github-actions
Copy link

github-actions bot commented Feb 9, 2023

Job PR-2822-aff0452 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/aff0452/index.html

@github-actions
Copy link

Job PR-2822-de7be1c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/de7be1c/index.html

@github-actions
Copy link

Job PR-2822-5d44696 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2822/5d44696/index.html

@sxjscience sxjscience merged commit 23d4f97 into autogluon:master Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants