[AutoMM] Add support for loading pre-trained weights in ft_transformer #3859

taoyang1122 · 2024-01-13T00:29:43Z

Issue #, if available:
Resolves #3847

Description of changes:

Add checkpoint_name argument in ft_transformer so that users could load pre-trained ft_transformer weights.
Fixed a bug in setting layer_id when model_prefix is None.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

review-notebook-app · 2024-01-13T00:29:48Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions · 2024-01-13T03:33:00Z

Job PR-3859-0b4ccee is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3859/0b4ccee/index.html

zhiqiangdon

How do users have a pretrained ft transformer that aligns with our implementation? We haven't provided the pretraining functionality.
Consider adding a test.

zhiqiangdon · 2024-01-15T23:53:28Z

multimodal/src/autogluon/multimodal/models/ft_transformer.py

+        # init transformer backbone from provided checkpoint
+        if checkpoint_name:
+            ckpt = torch.load(checkpoint_name)
+            self.transformer.load_state_dict(ckpt["state_dict"])


This only loads weights for self.transformer. What if the saved weights also have self.categorical_adapter, self.numerical_adapter, and self.head?

Currently, this is mostly for the XTab pretraining. And the use case is that users load XTab pre-trained ft-transformer weights and do finetuning.

Can we add tests?

Can we benchmark the performance of pretrained weights and compare to random initialized ft transformer?

Added unit test. The XTab repo has the benchmarking results.

The readme in the XTab repo only shows the result on one toy dataset. Benchmark results on more tabular datasets are unclear. It's also unclear whether we can reproduce the results in the paper due to some details like light finetuning vs heavy finetuning.

github-actions · 2024-01-17T01:34:48Z

Job PR-3859-9667b72 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3859/9667b72/index.html

zhiqiangdon · 2024-01-17T05:59:04Z

multimodal/tests/unittests/predictor/test_predictor.py

@@ -677,6 +678,31 @@ def test_load_ckpt():
    npt.assert_equal(predictions_prob, predictions2_prob)


+def test_fttransformer_load_ckpt():
+    download("https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt", "./")


There is no documentation for how to download a pretrained ft transformer checkpoint. If the benchmark results can show the pretrained is better than the random initialized, we can use the pretrained as the default. That is, we download checkpoint and initialize ft transformer internally. It would make it easier for users.

If you'd like to run a benchmark of pre-trained vs random, we can run it on AutoML Benchmark after the code is in a state where I can specify the pre-trained weights to load via the model hyperparameters through TabularPredictor.

The codes should be able to specify the pre-trained weights in model hyperparameters by "model.ft_transformer.checkpoint_name": "path_to_checkpoint.ckpt". I am currently benchmarking on multimodal datasets. It would be great if you could run it on AutoML Benchmark.

Can we remove this line since we already support downloading it internally?

Innixma

Would really like being able to pass a URL path for the weights file as an ease of use improvement. Otherwise, looks good!

Innixma · 2024-01-21T21:54:26Z

multimodal/tests/unittests/predictor/test_predictor.py

+    )
+    hyperparameters = {
+        "model.names": ["ft_transformer"],
+        "model.ft_transformer.checkpoint_name": "./iter_2k.ckpt",


It would be a nice ease of use improvement if the user could specify a URL here:

instead of

download(...) "model.ft_transformer.checkpoint_name": "./iter_2k.ckpt",

Just do:

"model.ft_transformer.checkpoint_name": "https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt"

Internally we can call download and save it to some predefined directory for model weights.

For example, if this was the case then I wouldn't need to edit my benchmarking code to include the download call, which would simplify benchmarking a lot.

Currently I get the following error if I try to pass the URL:

FileNotFoundError: [Errno 2] No such file or directory: 'https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt'

This also would allow me to add the weights as part of the default hyperparameter configs.

Innixma

LGTM!

Innixma

Upon further testing, I notice that the pretrained weights are re-downloaded when loading the fit model from disk, leading to substantial inference slowdown. Beyond re-downloading, I believe the weights are also loaded from disk even if they are downloaded, which shouldn't be happening after fit.

Innixma · 2024-01-22T23:43:35Z

multimodal/src/autogluon/multimodal/models/ft_transformer.py

+        # init transformer backbone from provided checkpoint
+        if checkpoint_name:
+            if "https://" in checkpoint_name or is_s3_url(checkpoint_name):
+                with tempfile.TemporaryDirectory() as tmpdirname:
+                    checkpoint_path = os.path.join(tmpdirname, "./ft_transformer_pretrained.ckpt")
+                    download(checkpoint_name, checkpoint_path)
+                    ckpt = torch.load(checkpoint_path)
+            else:
+                ckpt = torch.load(checkpoint_name)
+            self.transformer.load_state_dict(ckpt["state_dict"])


This is called every time the model is loaded from disk, even after finetuning. We should ensure we are not using the pretrained weights in any way after we do training, as the pretrained weights are no longer necessary.

Minimal reproducible example:

from autogluon.tabular import TabularPredictor, TabularDataset if __name__ == '__main__': label = 'class' train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv') subsample_size = 500 # subsample subset of data for faster demo, try setting this to much larger values if subsample_size is not None and subsample_size < len(train_data): train_data = train_data.sample(n=subsample_size, random_state=0) hyperparameters = { 'FT_TRANSFORMER': [ {"model.ft_transformer.checkpoint_name": "https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt"}, ], } predictor = TabularPredictor( label=label, eval_metric='roc_auc', ) predictor: TabularPredictor = predictor.fit(train_data, hyperparameters=hyperparameters,) # predictor.leaderboard(data=test_data, display=True) print("###### Prior to predict ######") predictor.predict(test_data) print("###### After Predict (1) ######") predictor.predict(test_data) print("###### After Predict (2) ######") predictor.predict(test_data) print("###### After Predict (3) ######")

Output:

TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240122_233921") ###### Prior to predict ###### Downloading /tmp/tmplmmfeb91/./ft_transformer_pretrained.ckpt from https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt... 100%|██████████| 3.13M/3.13M [00:00<00:00, 5.01MiB/s] Load pretrained checkpoint: /home/ubuntu/workspace/code/scratch/AutogluonModels/ag-20240122_233921/models/FTTransformer/automm_model/model.ckpt Predicting DataLoader 0: 100%|██████████| 20/20 [00:00<00:00, 47.68it/s] ###### After Predict (1) ###### Downloading /tmp/tmpf5zqen3i/./ft_transformer_pretrained.ckpt from https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt... 100%|██████████| 3.13M/3.13M [00:00<00:00, 6.26MiB/s] Load pretrained checkpoint: /home/ubuntu/workspace/code/scratch/AutogluonModels/ag-20240122_233921/models/FTTransformer/automm_model/model.ckpt Predicting DataLoader 0: 100%|██████████| 20/20 [00:00<00:00, 30.74it/s] ###### After Predict (2) ###### Downloading /tmp/tmp2479loia/./ft_transformer_pretrained.ckpt from https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt... 100%|██████████| 3.13M/3.13M [00:00<00:00, 4.46MiB/s] Load pretrained checkpoint: /home/ubuntu/workspace/code/scratch/AutogluonModels/ag-20240122_233921/models/FTTransformer/automm_model/model.ckpt Predicting DataLoader 0: 100%|██████████| 20/20 [00:00<00:00, 46.99it/s] ###### After Predict (3) ######

Innixma

Thanks for the fix to the model load!

zhiqiangdon · 2024-01-25T00:08:50Z

multimodal/src/autogluon/multimodal/models/ft_transformer.py

+from autogluon.common.loaders._utils import download
+from autogluon.common.utils.s3_utils import is_s3_url


Can we use these utils?
https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/utils/download.py#L31
https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/utils/download.py#L49

zhiqiangdon · 2024-01-25T00:10:50Z

multimodal/src/autogluon/multimodal/models/ft_transformer.py

        if self.numerical_feature_tokenizer:
            self.numerical_adapter.apply(init_weights)
        if self.categorical_feature_tokenizer:
            self.categorical_adapter.apply(init_weights)
        self.head.apply(init_weights)
+        # init transformer backbone from provided checkpoint
+        if pretrained and checkpoint_name:
+            if "https://" in checkpoint_name or is_s3_url(checkpoint_name):


This can be simplified if using https://github.com/autogluon/autogluon/blob/master/multimodal/src/autogluon/multimodal/utils/download.py#L31

zhiqiangdon · 2024-01-25T00:12:25Z

multimodal/tests/unittests/predictor/test_predictor.py

@@ -677,6 +678,31 @@ def test_load_ckpt():
    npt.assert_equal(predictions_prob, predictions2_prob)


+def test_fttransformer_load_ckpt():
+    download("https://automl-mm-bench.s3.amazonaws.com/ft_transformer_pretrained_ckpt/iter_2k.ckpt", "./")


Can we remove this line since we already support downloading it internally?

zhiqiangdon · 2024-01-25T00:22:42Z

docs/tutorials/multimodal/advanced_topics/customization.ipynb

+    "source": [
+     "### model.ft_transformer.checkpoint_name\n",
+     "\n",
+     "Provide a pre-trained weights to initialize ft_transformer backbone."


Providing pre-trained weights is not accurate. Consider using local checkpoint path or a url?

zhiqiangdon · 2024-01-25T00:23:22Z

docs/tutorials/multimodal/advanced_topics/customization.ipynb

+     "# by default, AutoMM doesn't use pre-trained weights\n",
+     "predictor.fit(hyperparameters={\"model.ft_transformer.checkpoint_name\": None})\n",
+     "# initialize the ft_transformer backbone from the give checkpoint\n",
+     "predictor.fit(hyperparameters={\"model.ft_transformer.checkpoint_name\": 'my_checkpoint.ckpt'})\n",


Consider providing an example with our s3 url.

github-actions · 2024-01-26T02:28:28Z

Job PR-3859-6de89e4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3859/6de89e4/index.html

zhiqiangdon

LGTM

github-actions · 2024-01-27T02:32:40Z

Job PR-3859-d117ad4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3859/d117ad4/index.html

autogluon#3859) Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>

taoyang1122 requested review from Innixma and zhiqiangdon January 13, 2024 00:29

taoyang1122 added model list checked You have updated the model list after modifying multimodal unit tests/docs run-multi-gpu Run multimodal multi-gpu tests labels Jan 13, 2024

zhiqiangdon reviewed Jan 15, 2024

View reviewed changes

zhiqiangdon reviewed Jan 17, 2024

View reviewed changes

Innixma requested changes Jan 21, 2024

View reviewed changes

Innixma mentioned this pull request Jan 21, 2024

A finetuning issue BingzhaoZhu/XTab#1

Open

Innixma approved these changes Jan 22, 2024

View reviewed changes

Innixma requested changes Jan 22, 2024

View reviewed changes

Innixma approved these changes Jan 23, 2024

View reviewed changes

zhiqiangdon reviewed Jan 25, 2024

View reviewed changes

taoyang1122 added 9 commits January 25, 2024 23:23

add checkpoint_name for ft_transformer

7959fec

lint check

fd7405a

add unit test

1a71af5

update download link

d0d97d9

support passing url to ft_transformer checkpoint_name

f51fc61

support private s3 url

44c76ea

fix re-downloading ft-transformer checkpoint in prediction

95a71b9

fixed a bug in set layer_id

235150d

update ftt weights download and tutorial

6de89e4

taoyang1122 force-pushed the fttransformer_load branch from 1a20279 to 6de89e4 Compare January 25, 2024 23:33

Remove an unused import

d117ad4

zhiqiangdon approved these changes Jan 26, 2024

View reviewed changes

zhiqiangdon merged commit f9c64e5 into autogluon:master Jan 26, 2024
8 checks passed

LennartPurucker pushed a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024

[AutoMM] Add support for loading pre-trained weights in ft_transformer (

e2461df

autogluon#3859) Co-authored-by: Zhiqiang Tang <zhiqiang.tang@rutgers.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoMM] Add support for loading pre-trained weights in ft_transformer #3859

[AutoMM] Add support for loading pre-trained weights in ft_transformer #3859

taoyang1122 commented Jan 13, 2024 •

edited

review-notebook-app bot commented Jan 13, 2024

github-actions bot commented Jan 13, 2024

zhiqiangdon left a comment

zhiqiangdon Jan 15, 2024

taoyang1122 Jan 16, 2024

zhiqiangdon Jan 16, 2024

zhiqiangdon Jan 16, 2024

taoyang1122 Jan 17, 2024

zhiqiangdon Jan 17, 2024 •

edited

github-actions bot commented Jan 17, 2024

zhiqiangdon Jan 17, 2024

Innixma Jan 18, 2024

taoyang1122 Jan 18, 2024

zhiqiangdon Jan 25, 2024

Innixma left a comment

Innixma Jan 21, 2024

Innixma Jan 21, 2024

Innixma Jan 21, 2024

Innixma left a comment

Innixma left a comment

Innixma Jan 22, 2024

Innixma left a comment

zhiqiangdon Jan 25, 2024

zhiqiangdon Jan 25, 2024

zhiqiangdon Jan 25, 2024

zhiqiangdon Jan 25, 2024

zhiqiangdon Jan 25, 2024

github-actions bot commented Jan 26, 2024

zhiqiangdon left a comment

github-actions bot commented Jan 27, 2024

		from autogluon.common.loaders._utils import download
		from autogluon.common.utils.s3_utils import is_s3_url

[AutoMM] Add support for loading pre-trained weights in ft_transformer #3859

[AutoMM] Add support for loading pre-trained weights in ft_transformer #3859

Conversation

taoyang1122 commented Jan 13, 2024 • edited

review-notebook-app bot commented Jan 13, 2024

github-actions bot commented Jan 13, 2024

zhiqiangdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiangdon Jan 17, 2024 • edited

Choose a reason for hiding this comment

github-actions bot commented Jan 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Innixma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Innixma left a comment

Choose a reason for hiding this comment

Innixma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Innixma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 26, 2024

zhiqiangdon left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 27, 2024

taoyang1122 commented Jan 13, 2024 •

edited

zhiqiangdon Jan 17, 2024 •

edited