Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds explicit .sort_index() calls to pandas API tests for bfill() and backfill() operations to ensure deterministic test results.

Why are the changes needed?

The bfill() (backward fill) and backfill() operations in PySpark pandas API don't guarantee a specific output ordering. Tests that directly compared the results without sorting could fail intermittently due to differences in the natural ordering of results between pandas and PySpark implementations.

By adding .sort_index() to both sides of the test assertions, we ensure tests are order-independent and deterministic.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@zhengruifeng
Copy link
Contributor

merged to master

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
…al order

### What changes were proposed in this pull request?
This PR adds explicit `.sort_index()` calls to pandas API tests for `bfill()` and `backfill()` operations to ensure deterministic test results.

### Why are the changes needed?
The `bfill()` (backward fill) and `backfill()` operations in PySpark pandas API don't guarantee a specific output ordering. Tests that directly compared the results without sorting could fail intermittently due to differences in the natural ordering of results between pandas and PySpark implementations.

By adding `.sort_index()` to both sides of the test assertions, we ensure tests are order-independent and deterministic.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52924 from Yicong-Huang/no-ticket/test/no-dependent-on-natural-order.

Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…al order

### What changes were proposed in this pull request?
This PR adds explicit `.sort_index()` calls to pandas API tests for `bfill()` and `backfill()` operations to ensure deterministic test results.

### Why are the changes needed?
The `bfill()` (backward fill) and `backfill()` operations in PySpark pandas API don't guarantee a specific output ordering. Tests that directly compared the results without sorting could fail intermittently due to differences in the natural ordering of results between pandas and PySpark implementations.

By adding `.sort_index()` to both sides of the test assertions, we ensure tests are order-independent and deterministic.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#52924 from Yicong-Huang/no-ticket/test/no-dependent-on-natural-order.

Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants