Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44708][PYTHON] Migrate test_reset_index assert_eq to use assertDataFrameEqual #45629

Closed
wants to merge 4 commits into from

Conversation

gdhuper
Copy link
Contributor

@gdhuper gdhuper commented Mar 21, 2024

What changes were proposed in this pull request?

This PR updates the python/pyspark/pandas/tests/test_sql.py to use the new PySpark test util function, assertDataFrameEqual, introduced in SPARK-44042.

Why are the changes needed?

Use the new assertDataFrameEqual util function across all tests in PySpark

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing Tests

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon HyukjinKwon changed the title [SPARK-44708] [PySpark] Migrate test_reset_index assert_eq to use assertDataFrameEqual [SPARK-44708][PYTHON] Migrate test_reset_index assert_eq to use assertDataFrameEqual Mar 21, 2024
@HyukjinKwon
Copy link
Member

cc @itholic

@@ -22,6 +22,7 @@
from pyspark import pandas as ps
from pyspark.testing.pandasutils import PandasOnSparkTestCase
from pyspark.testing.sqlutils import SQLTestUtils
from pyspark.testing.utils import assertDataFrameEqual
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a very big deal, but I personally recommend to use assert_frame_equal instead of assertDataFrameEqual for comparing Pandas API on Spark objects because assert_frame_equal directly leverage the Pandas testing utils internally so it is more proper to compare the Pandas objects:

from pyspark.pandas.testing import assert_frame_equal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #43398 which introduces Pandas-like testing utils FYI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Changed it to assert_frame_equal.

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passed. Also it would be great if we can migrate other pandas tests as well in batch. We can do it in a separate ticket.

@xinrong-meng
Copy link
Member

LGTM thanks!

@HyukjinKwon
Copy link
Member

Merged to master.

sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
…tDataFrameEqual

### What changes were proposed in this pull request?

This PR updates the [python/pyspark/pandas/tests/test_sql.py](https://github.com/apache/spark/blob/42e5daddf3ba16ff7d08e82e51cd8924cc56e180/python/pyspark/pandas/tests/indexes/test_reset_index.py#L46) to use the new PySpark test util function, assertDataFrameEqual, introduced in [SPARK-44042](https://issues.apache.org/jira/browse/SPARK-44042).

### Why are the changes needed?

Use the new `assertDataFrameEqual` util function across all tests in PySpark

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing Tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45629 from gdhuper/gdhuper/SPARK-44708.

Authored-by: Gurpreet Singh <gdhuper@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants