[SPARK-45554][PYTHON] Introduce flexible parameter to `assertSchemaEqual` #43450

itholic · 2023-10-19T08:08:32Z

What changes were proposed in this pull request?

This PR proposes to add three new parameters to the assertSchemaEqual: ignoreNullable, ignoreColumnOrder and ignoreColumnName to provide users with more flexibility in schema testing.

Why are the changes needed?

To enhance the utility of assertSchemaEqual by accommodating various common schema comparison scenarios that users might encounter, without necessitating manual adjustments or workarounds.

Does this PR introduce any user-facing change?

Yes. assertDataFrameEqual now have the option to use the five new parameters:

Parameter	Type	Comment
ignoreNullable	Boolean [optional]	Specifies whether a column’s nullable property is included when checking for schema equality. When set to True (default), the nullable property of the columns being compared is not taken into account and the columns will be considered equal even if they have different nullable settings. When set to False, columns are considered equal only if they have the same nullable setting.
ignoreColumnOrder	Boolean [optional]	Specifies whether to compare columns in the order they appear in the DataFrames or by column name. When set to False (default), columns are compared in the order they appear in the DataFrames. When set to True, a column in the expected DataFrame is compared to the column with the same name in the actual DataFrame. ignoreColumnOrder cannot be set to True if ignoreColumnNames is also set to True.
ignoreColumnName	Boolean [optional]	Specifies whether to fail the initial schema equality check if the column names in the two DataFrames are different. When set to False (default), column names are checked and the function fails if they are different. When set to True, the function will succeed even if column names are different. Column data types are compared for columns in the order they appear in the DataFrames. ignoreColumnNames cannot be set to True if ignoreColumnOrder is also set to True.

How was this patch tested?

Added usage examples into doctest for each parameter.

Was this patch authored or co-authored using generative AI tooling?

No.

…5554

itholic · 2023-10-19T13:33:28Z

cc @HyukjinKwon @allanf-db

itholic · 2023-10-24T23:47:39Z

This also CI passed. Gentle reminder for @HyukjinKwon, also cc @ueshin @zhengruifeng .

allisonwang-db

These parameters will be super helpful!

python/pyspark/testing/utils.py

HyukjinKwon · 2023-10-30T02:06:43Z

Merged to master.

[SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual

d97cb1f

github-actions bot added the PYTHON label Oct 19, 2023

Merge branch 'master' of https://github.com/apache/spark into SPARK-4…

dcd915b

…5554

itholic changed the title ~~[SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual~~ [SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual Oct 19, 2023

allisonwang-db reviewed Oct 27, 2023

View reviewed changes

python/pyspark/testing/utils.py Outdated Show resolved Hide resolved

python/pyspark/testing/utils.py Outdated Show resolved Hide resolved

python/pyspark/testing/utils.py Outdated Show resolved Hide resolved

applied the comments

ea1e8f5

HyukjinKwon approved these changes Oct 30, 2023

View reviewed changes

HyukjinKwon closed this in 0245b84 Oct 30, 2023

itholic deleted the SPARK-45554 branch November 20, 2023 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45554][PYTHON] Introduce flexible parameter to `assertSchemaEqual` #43450

[SPARK-45554][PYTHON] Introduce flexible parameter to `assertSchemaEqual` #43450

itholic commented Oct 19, 2023

itholic commented Oct 19, 2023

itholic commented Oct 24, 2023

allisonwang-db left a comment

HyukjinKwon commented Oct 30, 2023

[SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual #43450

[SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual #43450

Conversation

itholic commented Oct 19, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

itholic commented Oct 19, 2023

itholic commented Oct 24, 2023

allisonwang-db left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Oct 30, 2023

[SPARK-45554][PYTHON] Introduce flexible parameter to `assertSchemaEqual` #43450

[SPARK-45554][PYTHON] Introduce flexible parameter to `assertSchemaEqual` #43450