-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44708][PYTHON] Migrate test_reset_index assert_eq to use assertDataFrameEqual #45629
Conversation
cc @itholic |
@@ -22,6 +22,7 @@ | |||
from pyspark import pandas as ps | |||
from pyspark.testing.pandasutils import PandasOnSparkTestCase | |||
from pyspark.testing.sqlutils import SQLTestUtils | |||
from pyspark.testing.utils import assertDataFrameEqual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a very big deal, but I personally recommend to use assert_frame_equal
instead of assertDataFrameEqual
for comparing Pandas API on Spark objects because assert_frame_equal
directly leverage the Pandas testing utils internally so it is more proper to compare the Pandas objects:
from pyspark.pandas.testing import assert_frame_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #43398 which introduces Pandas-like testing utils FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Changed it to assert_frame_equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if CI passed. Also it would be great if we can migrate other pandas tests as well in batch. We can do it in a separate ticket.
LGTM thanks! |
Merged to master. |
…tDataFrameEqual ### What changes were proposed in this pull request? This PR updates the [python/pyspark/pandas/tests/test_sql.py](https://github.com/apache/spark/blob/42e5daddf3ba16ff7d08e82e51cd8924cc56e180/python/pyspark/pandas/tests/indexes/test_reset_index.py#L46) to use the new PySpark test util function, assertDataFrameEqual, introduced in [SPARK-44042](https://issues.apache.org/jira/browse/SPARK-44042). ### Why are the changes needed? Use the new `assertDataFrameEqual` util function across all tests in PySpark ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing Tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45629 from gdhuper/gdhuper/SPARK-44708. Authored-by: Gurpreet Singh <gdhuper@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR updates the python/pyspark/pandas/tests/test_sql.py to use the new PySpark test util function, assertDataFrameEqual, introduced in SPARK-44042.
Why are the changes needed?
Use the new
assertDataFrameEqual
util function across all tests in PySparkDoes this PR introduce any user-facing change?
No
How was this patch tested?
Existing Tests
Was this patch authored or co-authored using generative AI tooling?
No