Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47184][PYTHON][CONNECT][TESTS] Make test_repartitionByRange_dataframe reusable #45281

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Make test_repartitionByRange_dataframe reusable

Why are the changes needed?

to make it reusable in Spark Connect

Does this PR introduce any user-facing change?

no, test-only

How was this patch tested?

updated ut

Was this patch authored or co-authored using generative AI tooling?

no

self.assertEqual(df3.rdd.first(), df2.rdd.first())
self.assertEqual(df3.rdd.take(3), df2.rdd.take(3))

self.assertEqual(df3.select(spark_partition_id()).distinct().count(), 2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.select(spark_partition_id()).distinct().count() is not always equivalent to df.rdd. getNumPartitions() (e.g. empty partitions, AQE rules), but in this UT they are the same

@zhengruifeng
Copy link
Contributor Author

merged to master

@zhengruifeng zhengruifeng deleted the connect_test_repartitionByRange_dataframe branch February 27, 2024 10:15
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
…ataframe` reusable

### What changes were proposed in this pull request?
Make `test_repartitionByRange_dataframe` reusable

### Why are the changes needed?
to make it reusable in Spark Connect

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
updated ut

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#45281 from zhengruifeng/connect_test_repartitionByRange_dataframe.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…ataframe` reusable

### What changes were proposed in this pull request?
Make `test_repartitionByRange_dataframe` reusable

### Why are the changes needed?
to make it reusable in Spark Connect

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
updated ut

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#45281 from zhengruifeng/connect_test_repartitionByRange_dataframe.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants