[SPARK-45554][PYTHON] Introduce flexible parameter to assertSchemaEqual
#43450
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to add three new parameters to the
assertSchemaEqual
:ignoreNullable
,ignoreColumnOrder
andignoreColumnName
to provide users with more flexibility in schema testing.Why are the changes needed?
To enhance the utility of
assertSchemaEqual
by accommodating various common schema comparison scenarios that users might encounter, without necessitating manual adjustments or workarounds.Does this PR introduce any user-facing change?
Yes.
assertDataFrameEqual
now have the option to use the five new parameters:When set to True (default), the nullable property of the columns being compared is not taken into account and the columns will be considered equal even if they have different nullable settings.
When set to False, columns are considered equal only if they have the same nullable setting.
When set to False (default), columns are compared in the order they appear in the DataFrames.
When set to True, a column in the expected DataFrame is compared to the column with the same name in the actual DataFrame.
ignoreColumnOrder cannot be set to True if ignoreColumnNames is also set to True.
When set to False (default), column names are checked and the function fails if they are different.
When set to True, the function will succeed even if column names are different. Column data types are compared for columns in the order they appear in the DataFrames.
ignoreColumnNames cannot be set to True if ignoreColumnOrder is also set to True.
How was this patch tested?
Added usage examples into doctest for each parameter.
Was this patch authored or co-authored using generative AI tooling?
No.