Hotfix: numercial stability of non-log-stabilized sinkhorn plan #531

LarsKue · 2025-07-04T09:24:26Z

This is an important hotfix, so sending it directly to main. Thanks @daniel-habermann for the report!

I also did some benchmarking and it seems the current implementation is optimal with respect to performance.

Edit: I also fixed an issue with the convergence check in log_sinkhorn_plan which takes its runtime down from ~4s to ~0.02s.

* add log_gamma diagnostic * add missing export for log_gamma * add missing export for gamma_null_distribution, gamma_discrepancy * fix broken unit tests * rename log_gamma module to sbc * add test_log_gamma unit test * add return information to log_gamma doc string * fix typo in docstring, use fixed-length np array to collect log_gammas instead of appending to an empty list

…525) * standardization: add test for multi-input values (failing) This test reveals to bugs in the standarization layer: - count is updated multiple times - batch_count is too small, as the sizes from reduce_axes have to be multiplied * breaking: fix bugs regarding count in standardization layer Fixes #524 This fixes the two bugs described in c4cc133: - count was accidentally updated, leading to wrong values - count was calculated wrongly, as only the batch size was used. Correct is the product of all reduce dimensions. This lead to wrong standard deviations While the batch dimension is the same for all inputs, the size of the second dimension might vary. For this reason, we need to introduce an input-specific `count` variable. This breaks serialization. * fix assert statement in test

Copilot

Pull Request Overview

This hotfix ensures numerical stability for the non-log-stabilized Sinkhorn implementation and aligns tests to cover both the new and existing normalization methods.

Updated the Sinkhorn plan initialization and normalization logic for better stability.
Changed max_steps default to None to run until convergence.
Extended existing tests to parameterize over "log_sinkhorn" and "sinkhorn" methods.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/test_utils/test_optimal_transport.py	Parameterized two tests over both Sinkhorn variants and removed a skip decorator
bayesflow/utils/optimal_transport/sinkhorn.py	Added numerical-stability steps, switched normalization from softmax to sum-based, and updated `max_steps` default

Comments suppressed due to low confidence (1)

bayesflow/utils/optimal_transport/sinkhorn.py:45

The default value of None conflicts with the int annotation. Consider changing the signature to max_steps: Optional[int] = None and importing Optional from typing for clarity.

    max_steps: int = None,

bayesflow/utils/optimal_transport/sinkhorn.py

codecov · 2025-07-04T09:39:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
bayesflow/diagnostics/metrics/__init__.py	`100.00% <100.00%> (ø)`
...sflow/diagnostics/metrics/calibration_log_gamma.py	`100.00% <100.00%> (ø)`
bayesflow/distributions/diagonal_normal.py	`95.34% <100.00%> (-0.11%)`	⬇️
bayesflow/distributions/diagonal_student_t.py	`95.91% <100.00%> (-0.09%)`	⬇️
bayesflow/distributions/mixture.py	`98.03% <ø> (ø)`
...esflow/networks/standardization/standardization.py	`95.45% <100.00%> (ø)`
bayesflow/scores/multivariate_normal_score.py	`97.72% <100.00%> (ø)`
bayesflow/utils/optimal_transport/log_sinkhorn.py	`100.00% <100.00%> (ø)`
bayesflow/utils/optimal_transport/sinkhorn.py	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

…w into fix-sinkhorn-plan

…ributions

… the plan, instead of the plan directly

…ch that x2[assignments] matches x1

…_plan), log_sinkhorn_plan returns logits of the transport plan

daniel-habermann · 2025-07-05T17:47:07Z

I have added some additional commits. Most notable changes are:

sinkhorn_plan and log_sinkhorn_plan now return proper transport plans, that is, the row and column sums match the marginals.
the assignments in sinkhorn and log_sinkhorn are now calculated as assignments = keras.ops.categorical(log_plan), because keras.ops.categorical expects log-probs instead of probabilities as inputs (which they confusingly call logits).
added some additional unit tests to ensure that the transport plans are correct.

@LarsKue could you please double check that the comment in this function definition is what you intended? (the such that.. part)

def sinkhorn(x1: Tensor, x2: Tensor, seed: int = None, **kwargs) -> (Tensor, Tensor):
    """
    Matches elements from x2 onto x1 using the Sinkhorn-Knopp algorithm.

    Sinkhorn-Knopp is an iterative algorithm that repeatedly normalizes the cost matrix into a
    transport plan, containing assignment probabilities.
    The permutation is then sampled randomly according to the transport plan.

    :param x1: Tensor of shape (n, ...)
        Samples from the first distribution.

    :param x2: Tensor of shape (m, ...)
        Samples from the second distribution.

    :param kwargs:
        Additional keyword arguments that are passed to :py:func:`sinkhorn_plan`.

    :param seed: Random seed to use for sampling indices.
        Default: None, which means the seed will be auto-determined for non-compiled contexts.

    :return: Tensor of shape (n,)
        Assignment indices for x2.

    """
    plan = sinkhorn_plan(x1, x2, **kwargs)

    # we sample from log(plan) to receive assignments of length n, corresponding to indices of x2
    # such that x2[assignments] matches x1
    assignments = keras.random.categorical(keras.ops.log(plan), num_samples=1, seed=seed)
    assignments = keras.ops.squeeze(assignments, axis=1)

    return assignments

From my point of view this is ready to be merged now.

vpratz and others added 8 commits June 22, 2025 04:02

fix trainable parameters in distributions (#520)

e329f4b

Improve numerical precision in MVNScore.log_prob

f916855

Merge remote-tracking branch 'upstream/main' into dev [skip ci]

17540b1

rename log_gamma to calibration_log_gamma (#527)

2a19d32

fix numerical stability issues in sinkhorn plan

e95f386

improve test suite

7edf36d

LarsKue requested a review from Copilot July 4, 2025 09:24

LarsKue self-assigned this Jul 4, 2025

LarsKue added the fix Pull request that fixes a bug label Jul 4, 2025

Copilot AI reviewed Jul 4, 2025

View reviewed changes

bayesflow/utils/optimal_transport/sinkhorn.py Outdated Show resolved Hide resolved

LarsKue added 2 commits July 4, 2025 12:12

fix ultra-strict convergence criterion in log_sinkhorn_plan

5efb00b

update dependencies

557095a

LarsKue requested a review from stefanradev93 July 4, 2025 10:14

LarsKue and others added 10 commits July 4, 2025 12:15

add comment about convergence check

e1ef07d

update docsting to reflect fixes

a0e36df

Merge branch 'fix-sinkhorn-plan' of github.com:bayesflow-org/bayesflo…

dfd8d81

…w into fix-sinkhorn-plan

sinkhorn_plan now returns a transport plan with uniform marginal dist…

bbad711

…ributions

add unit test for sinkhorn_plan

83b01c9

fix sinkhorn function by sampling from the logits of the transpose of…

4502a36

… the plan, instead of the plan directly

sinkhorn(x1, x2) now samples from log(plan) to receive assignments su…

286bea6

…ch that x2[assignments] matches x1

re-enable test_assignment_is_optimal() for method='sinkhorn'

6ee5244

log_sinkhorn now correctly uses log_plan instead of keras.ops.exp(log…

02c24c1

…_plan), log_sinkhorn_plan returns logits of the transport plan

add unit tests for log_sinkhorn_plan

f25df63

fix faulty indexing with tensor for tensorflow backend

8f7ddb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hotfix: numercial stability of non-log-stabilized sinkhorn plan #531

Hotfix: numercial stability of non-log-stabilized sinkhorn plan #531

Uh oh!

LarsKue commented Jul 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

codecov bot commented Jul 4, 2025 •

edited

Loading

Uh oh!

daniel-habermann commented Jul 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Hotfix: numercial stability of non-log-stabilized sinkhorn plan #531

Are you sure you want to change the base?

Hotfix: numercial stability of non-log-stabilized sinkhorn plan #531

Uh oh!

Conversation

LarsKue commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

codecov bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

daniel-habermann commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

LarsKue commented Jul 4, 2025 •

edited

Loading

codecov bot commented Jul 4, 2025 •

edited

Loading

daniel-habermann commented Jul 5, 2025 •

edited

Loading