Add TabPFN #3270

Innixma · 2023-06-04T03:40:25Z

Issue #, if available:
#2806

Description of changes:

Adds the TabPFN model to AutoGluon.

TODO:

Run AutoMLBenchmark to verify correctness & performance
Verify implementation with TabPFN authors

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Innixma · 2023-06-04T03:46:54Z

Hello @noahho and @SamuelGabriel, I am looking to add TabPFN to AutoGluon for the upcoming v0.8 release on June 13th!

If you had time, it would be great if you could give a quick look at the PR and mention any suggestions/changes you'd recommend.

In particular, I'd be curious your thoughts on the sample_rows value and how to best handle missing values.

SamuelGabriel · 2023-06-04T11:09:43Z

tabular/src/autogluon/tabular/models/tabpfn/tabpfn_model.py

+        )
+        return X, y
+
+    # TODO: Should we fillna 0? what about -1 or mean? Does TabPFN automatically fill missing values?


TabPFNClassifier can handle nans and fills them more or less with the mean of the feature. So, I think it is best to just hand the nans over.

Great! I did a quick test and it appears to work without doing the fillna, with comparable performance, so I've updated the PR to not fillna and instead pass the NaNs to TabPFN directly.

SamuelGabriel · 2023-06-04T11:15:08Z

tabular/src/autogluon/tabular/models/tabpfn/tabpfn_model.py

+
+        # TODO: Make sample_rows generic
+        if sample_rows is not None and len(X) > sample_rows:
+            X, y = self._subsample_train(X=X, y=y, num_rows=sample_rows)


We have a bit of data on this. The best strategy is to something like have multiple trees with a TabPFN in each leaf, but this is not too much worse in our experience.

That makes sense, I had a similar assumption as to what would probably work best.

For now, I've kept as is since doing the TabPFN Forest approach is very similar to using bagging in AutoGluon on TabPFN. I'd expect TabPFN_Forest to be slightly better, but will keep things simple for now. If you do have data that shows a compelling performance improvement I'd definitely take another look.

If TabPFN can do this logic internally, that could be a very nice addition. For example, when N_ensemble_configurations=10, each TabPFN in N could use a different sample from the train data input, which should maximize diversity.

SamuelGabriel · 2023-06-04T11:17:51Z

Hey :) I have left two comments on your points. It looks good in general, I think, besides the 0 imputation strategy. Generally I think it is important to know that the main costs of TabPFN are incurred with loading the model to (GPU) memory at the first construction of the TabPFNClassifer and in the fit function, where the full forward pass happens. Best, Sam

…

On 4. Jun 2023, at 05:47, Nick Erickson ***@***.***> wrote: Hello @noahho <https://github.com/noahho> and @SamuelGabriel <https://github.com/SamuelGabriel>, I am looking to add TabPFN to AutoGluon for the upcoming v0.8 release on June 13th! If you had time, it would be great if you could give a quick look at the PR and mention any suggestions/changes you'd recommend. In particular, I'd be curious your thoughts on the sample_rows value and how to best handle missing values. — Reply to this email directly, view it on GitHub <#3270 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACK7PSLIGCRM6UZ4QVS2TGTXJQALRANCNFSM6AAAAAAYZVQF3A>. You are receiving this because you were mentioned.

github-actions · 2023-06-05T19:29:21Z

Job PR-3270-3ed4522 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/3ed4522/index.html

github-actions · 2023-06-05T21:21:21Z

Job PR-3270-833da90 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/833da90/index.html

github-actions · 2023-06-06T03:46:42Z

Job PR-3270-a15baab is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/a15baab/index.html

github-actions · 2023-06-06T04:29:40Z

Job PR-3270-94ad668 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/94ad668/index.html

github-actions · 2023-06-07T03:10:54Z

Job PR-3270-fa2e75b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/fa2e75b/index.html

github-actions · 2023-06-07T04:13:09Z

Job PR-3270-34c9d1d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/34c9d1d/index.html

github-actions · 2023-06-10T03:06:29Z

Job PR-3270-ffd396d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/ffd396d/index.html

github-actions · 2023-06-10T03:08:59Z

Job PR-3270-d021ba8 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/d021ba8/index.html

yinweisu

LGTM!

* 'master' of https://github.com/awslabs/autogluon: (24 commits) [WIP] 0.8.0 release notes (autogluon#3303) Add model keys doc (autogluon#3321) Fix NaN warning in np.array(X) (autogluon#3315) [Draft] Upgrade networkx to 3.x (autogluon#3317) Add calibrate_decision_threshold tutorial (autogluon#3316) [Doc] AutoMM FAQ Updates (autogluon#3314) Update to v0.8.0 (autogluon#3313) Add Experimental Zeroshot HPO (autogluon#3312) Update GPU installation guide to use CUDA 11.7 (autogluon#3306) [Tutorial]Update tutorials for object detection (autogluon#3305) [timeseries] Update documentation (autogluon#3297) Update mac cpu install instructions (autogluon#3280) Add docstring for hyperparameter_tune_kwargs (autogluon#3307) [Doc] Add Search Space Page (autogluon#3311) Fewshot learning predict proba (autogluon#3267) Fix log to file Windows tests (autogluon#3302) Add missing doc pages (autogluon#3304) Add calibrate_decision_threshold (autogluon#3298) continuous training tutorial update (autogluon#3300) Add TabPFN (autogluon#3270) ...

Innixma added enhancement New feature or request module: tabular priority: 1 High priority labels Jun 4, 2023

Innixma added this to the 0.8 Release milestone Jun 4, 2023

SamuelGabriel reviewed Jun 4, 2023

View reviewed changes

Innixma force-pushed the tabpfn branch from 3ed4522 to 833da90 Compare June 5, 2023 18:46

Innixma force-pushed the tabpfn branch from 94ad668 to fa2e75b Compare June 7, 2023 00:35

Innixma force-pushed the tabpfn branch from 34c9d1d to 8d8c15d Compare June 7, 2023 03:40

Innixma and others added 7 commits June 10, 2023 00:28

Add TabPFN (WIP)

d88e1dd

Add TabPFN

a4677db

Update install instructions

4dd8b30

Fix unit test

2788a97

remove fillna(0)

8dedca0

max_sets=1 for TabPFN bagging

04d8cc4

Use sequential bagging for TabPFN

d021ba8

Innixma force-pushed the tabpfn branch from 8d8c15d to d021ba8 Compare June 10, 2023 00:28

code cleanup

ffd396d

Innixma requested a review from gradientsky June 10, 2023 02:28

yinweisu approved these changes Jun 12, 2023

View reviewed changes

Innixma merged commit 95f3caa into autogluon:master Jun 12, 2023
28 checks passed

Innixma mentioned this pull request Jun 13, 2023

[Tabular] Add TabPFN model #2806

Closed

mglowacki100 mentioned this pull request Jun 16, 2023

Random patches for naive scaling? automl/TabPFN#37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TabPFN #3270

Add TabPFN #3270

Innixma commented Jun 4, 2023 •

edited

Innixma commented Jun 4, 2023

SamuelGabriel Jun 4, 2023

Innixma Jun 6, 2023

SamuelGabriel Jun 4, 2023

Innixma Jun 6, 2023

SamuelGabriel commented Jun 4, 2023 via email

github-actions bot commented Jun 5, 2023

github-actions bot commented Jun 5, 2023

github-actions bot commented Jun 6, 2023

github-actions bot commented Jun 6, 2023

github-actions bot commented Jun 7, 2023

github-actions bot commented Jun 7, 2023

github-actions bot commented Jun 10, 2023

github-actions bot commented Jun 10, 2023

yinweisu left a comment

Add TabPFN #3270

Add TabPFN #3270

Conversation

Innixma commented Jun 4, 2023 • edited

Innixma commented Jun 4, 2023

SamuelGabriel Jun 4, 2023

Choose a reason for hiding this comment

Innixma Jun 6, 2023

Choose a reason for hiding this comment

SamuelGabriel Jun 4, 2023

Choose a reason for hiding this comment

Innixma Jun 6, 2023

Choose a reason for hiding this comment

SamuelGabriel commented Jun 4, 2023 via email

github-actions bot commented Jun 5, 2023

github-actions bot commented Jun 5, 2023

github-actions bot commented Jun 6, 2023

github-actions bot commented Jun 6, 2023

github-actions bot commented Jun 7, 2023

github-actions bot commented Jun 7, 2023

github-actions bot commented Jun 10, 2023

github-actions bot commented Jun 10, 2023

yinweisu left a comment

Choose a reason for hiding this comment

Innixma commented Jun 4, 2023 •

edited