Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TabPFN #3270

Merged
merged 8 commits into from Jun 12, 2023
Merged

Add TabPFN #3270

merged 8 commits into from Jun 12, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Jun 4, 2023

Issue #, if available:
#2806

Description of changes:

TODO:

  • Run AutoMLBenchmark to verify correctness & performance
  • Verify implementation with TabPFN authors

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added enhancement New feature or request module: tabular priority: 1 High priority labels Jun 4, 2023
@Innixma Innixma added this to the 0.8 Release milestone Jun 4, 2023
@Innixma
Copy link
Contributor Author

Innixma commented Jun 4, 2023

Hello @noahho and @SamuelGabriel, I am looking to add TabPFN to AutoGluon for the upcoming v0.8 release on June 13th!

If you had time, it would be great if you could give a quick look at the PR and mention any suggestions/changes you'd recommend.

In particular, I'd be curious your thoughts on the sample_rows value and how to best handle missing values.

)
return X, y

# TODO: Should we fillna 0? what about -1 or mean? Does TabPFN automatically fill missing values?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TabPFNClassifier can handle nans and fills them more or less with the mean of the feature. So, I think it is best to just hand the nans over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I did a quick test and it appears to work without doing the fillna, with comparable performance, so I've updated the PR to not fillna and instead pass the NaNs to TabPFN directly.


# TODO: Make sample_rows generic
if sample_rows is not None and len(X) > sample_rows:
X, y = self._subsample_train(X=X, y=y, num_rows=sample_rows)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bit of data on this. The best strategy is to something like have multiple trees with a TabPFN in each leaf, but this is not too much worse in our experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I had a similar assumption as to what would probably work best.

For now, I've kept as is since doing the TabPFN Forest approach is very similar to using bagging in AutoGluon on TabPFN. I'd expect TabPFN_Forest to be slightly better, but will keep things simple for now. If you do have data that shows a compelling performance improvement I'd definitely take another look.

If TabPFN can do this logic internally, that could be a very nice addition. For example, when N_ensemble_configurations=10, each TabPFN in N could use a different sample from the train data input, which should maximize diversity.

@SamuelGabriel
Copy link

SamuelGabriel commented Jun 4, 2023 via email

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

Job PR-3270-3ed4522 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/3ed4522/index.html

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

Job PR-3270-833da90 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/833da90/index.html

@github-actions
Copy link

github-actions bot commented Jun 6, 2023

Job PR-3270-a15baab is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/a15baab/index.html

@github-actions
Copy link

github-actions bot commented Jun 6, 2023

Job PR-3270-94ad668 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/94ad668/index.html

@github-actions
Copy link

github-actions bot commented Jun 7, 2023

Job PR-3270-fa2e75b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/fa2e75b/index.html

@github-actions
Copy link

github-actions bot commented Jun 7, 2023

Job PR-3270-34c9d1d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/34c9d1d/index.html

@Innixma Innixma requested a review from gradientsky June 10, 2023 02:28
@github-actions
Copy link

Job PR-3270-ffd396d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/ffd396d/index.html

@github-actions
Copy link

Job PR-3270-d021ba8 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3270/d021ba8/index.html

Copy link
Collaborator

@yinweisu yinweisu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Innixma Innixma merged commit 95f3caa into autogluon:master Jun 12, 2023
28 checks passed
ddelange added a commit to ddelange/autogluon that referenced this pull request Jun 16, 2023
* 'master' of https://github.com/awslabs/autogluon: (24 commits)
  [WIP] 0.8.0 release notes (autogluon#3303)
  Add model keys doc (autogluon#3321)
  Fix NaN warning in np.array(X) (autogluon#3315)
  [Draft] Upgrade networkx to 3.x (autogluon#3317)
  Add calibrate_decision_threshold tutorial (autogluon#3316)
  [Doc] AutoMM FAQ Updates (autogluon#3314)
  Update to v0.8.0 (autogluon#3313)
  Add Experimental Zeroshot HPO (autogluon#3312)
  Update GPU installation guide to use CUDA 11.7 (autogluon#3306)
  [Tutorial]Update tutorials for object detection (autogluon#3305)
  [timeseries] Update documentation (autogluon#3297)
  Update mac cpu install instructions (autogluon#3280)
  Add docstring for hyperparameter_tune_kwargs (autogluon#3307)
  [Doc] Add Search Space Page (autogluon#3311)
  Fewshot learning predict proba (autogluon#3267)
  Fix log to file Windows tests (autogluon#3302)
  Add missing doc pages (autogluon#3304)
  Add calibrate_decision_threshold (autogluon#3298)
  continuous training tutorial update (autogluon#3300)
  Add TabPFN (autogluon#3270)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module: tabular priority: 1 High priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants