[Data Synchronization/Matching] Delegate to Spark for checking existence of columns in the given dataframes #515

rdsharma26 · 2023-10-26T20:15:06Z

Description of changes:

Prior to this change, we were doing case sensitive equality checks of non-key columns.
This makes the utility more restrictive, as Spark does not care about the casing of column names.
With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.

Issue #, if available:
N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…nce of columns in the given dataframes - Prior to this change, we were doing case sensitive equality checks of non-key columns. - This makes the utility more restrictive, as Spark does not care about the casing of column names. - With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.

eycho-am

LGTM

…nce of columns in the given dataframes (#515) - Prior to this change, we were doing case sensitive equality checks of non-key columns. - This makes the utility more restrictive, as Spark does not care about the casing of column names. - With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.

…nce of columns in the given dataframes (awslabs#515) - Prior to this change, we were doing case sensitive equality checks of non-key columns. - This makes the utility more restrictive, as Spark does not care about the casing of column names. - With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.

…nce of columns in the given dataframes (#515) - Prior to this change, we were doing case sensitive equality checks of non-key columns. - This makes the utility more restrictive, as Spark does not care about the casing of column names. - With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.

eycho-am approved these changes Oct 26, 2023

View reviewed changes

rdsharma26 merged commit a529d4b into awslabs:master Oct 27, 2023
1 check passed

rdsharma26 deleted the dataset-match-column-name-case-issue branch October 27, 2023 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data Synchronization/Matching] Delegate to Spark for checking existence of columns in the given dataframes #515

[Data Synchronization/Matching] Delegate to Spark for checking existence of columns in the given dataframes #515

rdsharma26 commented Oct 26, 2023

eycho-am left a comment

[Data Synchronization/Matching] Delegate to Spark for checking existence of columns in the given dataframes #515

[Data Synchronization/Matching] Delegate to Spark for checking existence of columns in the given dataframes #515

Conversation

rdsharma26 commented Oct 26, 2023

eycho-am left a comment

Choose a reason for hiding this comment