Update 'balance probabilities test' and 'get_embeddings' test #589

bejaeger · 2025-11-04T16:35:55Z

Issue

This fixes 3 tests for upcoming V2.5:

The get_embeddings didn't consider thinking tokens. Change this so that "train_embeddings" only returns the embeddings for the data points
Balance probabilities test was flawed. We don't really expect the probabilities to be uniform for any model. This introduces a new test for the functionality of balance probabilities, as well as a softer criterion for the classifier, to test whether the proba distribution is more uniform with balance-probabilities included.
test_classifier_in_pipeline: This was using the balanced probabilities check as the previous test which didn't make too much sense because we only want to test whether the classifier works in a pipeline in this test.

gemini-code-assist

Code Review

This pull request addresses issues in two tests. The get_embeddings test is updated to correctly handle 'thinking tokens' by adjusting the slice for training embeddings. The balance_probabilities test is rewritten with a more robust criterion, checking for increased uniformity in probability distributions rather than expecting a perfectly uniform one. This change also involves refactoring the balancing logic into a new utility function, balance_probas_by_class_counts, and adding a corresponding unit test. The changes are logical and improve the test suite. I've provided a critical fix for a new test that would otherwise fail due to incorrect parameters, and a suggestion to improve the robustness of the new utility function.

tests/test_classifier_interface.py

src/tabpfn/utils.py

Copilot

Pull Request Overview

This PR refactors the probability balancing logic by extracting it into a standalone utility function and improves the corresponding tests. The changes focus on making the code more maintainable and testable.

Key Changes:

Extracted balance_probas_by_class_counts utility function from classifier-specific code
Added dedicated unit test for the probability balancing function
Improved test coverage for balanced probabilities with more realistic imbalanced dataset scenarios
Fixed train embeddings extraction in transformer to account for thinking tokens offset

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/tabpfn/utils.py	Added new `balance_probas_by_class_counts` function to centralize probability balancing logic
src/tabpfn/classifier.py	Refactored to use the extracted utility function instead of inline implementation
tests/test_utils.py	Added unit test for the new `balance_probas_by_class_counts` function
tests/test_classifier_interface.py	Updated test to use synthetic imbalanced dataset and improved assertion logic
src/tabpfn/architectures/base/transformer.py	Fixed train embeddings extraction to properly skip thinking token rows

Comments suppressed due to low confidence (1)

src/tabpfn/architectures/base/transformer.py:574

This assignment to 'train_encoder_out' is unnecessary as it is redefined before this value is used.

                train_encoder_out = self.encoder_compression_layer(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_classifier_interface.py

gemini-code-assist

Code Review

This pull request effectively addresses two testing issues. The get_embeddings logic is corrected to account for thinking tokens, and the balance_probabilities test is significantly improved to be more robust and meaningful. The refactoring of the balancing logic into a separate utility function with its own unit test is a great improvement for code organization and maintainability.

I've provided a few suggestions to enhance code conciseness, improve numerical stability, and ensure test reproducibility. Overall, these are solid changes.

tests/test_classifier_interface.py

src/tabpfn/architectures/base/transformer.py

src/tabpfn/utils.py

LeoGrin

LGTM! Agree that the previous balance probabilty test was too strong. (the new one might be a bit weak but don't have a great solution for that)

…dings' test (#231) * Record copied public PR 589 * Update 'balance probabilities test' and 'get_embeddings' test (#589) (cherry picked from commit 788bbdf) --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: Benjamin Jaeger <benjamin@priorlabs.ai>

bejaeger added 3 commits November 4, 2025 17:28

wip

f722cfb

update tests

48b3428

update get_embedding tests

ca37275

Copilot AI review requested due to automatic review settings November 4, 2025 16:35

bejaeger requested a review from a team as a code owner November 4, 2025 16:35

bejaeger requested review from LeoGrin and simo-prior and removed request for a team and simo-prior November 4, 2025 16:35

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

tests/test_classifier_interface.py Outdated Show resolved Hide resolved

src/tabpfn/utils.py Show resolved Hide resolved

Copilot AI reviewed Nov 4, 2025

View reviewed changes

tests/test_classifier_interface.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

tests/test_classifier_interface.py Outdated Show resolved Hide resolved

src/tabpfn/architectures/base/transformer.py Outdated Show resolved Hide resolved

src/tabpfn/utils.py Show resolved Hide resolved

bejaeger added 3 commits November 4, 2025 17:40

revision

3c6e8d6

Merge branch 'main' into ben/update-balance-prob-tests

a707252

fix regressor embedding size test

fbd1b54

LeoGrin approved these changes Nov 4, 2025

View reviewed changes

bejaeger merged commit 788bbdf into main Nov 5, 2025
26 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update 'balance probabilities test' and 'get_embeddings' test #589

Update 'balance probabilities test' and 'get_embeddings' test #589

Uh oh!

bejaeger commented Nov 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LeoGrin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update 'balance probabilities test' and 'get_embeddings' test #589

Update 'balance probabilities test' and 'get_embeddings' test #589

Uh oh!

Conversation

bejaeger commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LeoGrin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bejaeger commented Nov 4, 2025 •

edited

Loading