Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Synchronization/Matching] Delegate to Spark for checking existence of columns in the given dataframes #515

Merged

Conversation

rdsharma26
Copy link
Contributor

Description of changes:

  • Prior to this change, we were doing case sensitive equality checks of non-key columns.
  • This makes the utility more restrictive, as Spark does not care about the casing of column names.
  • With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.

Issue #, if available:
N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…nce of columns in the given dataframes

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
Copy link
Contributor

@eycho-am eycho-am left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rdsharma26 rdsharma26 merged commit a529d4b into awslabs:master Oct 27, 2023
1 check passed
@rdsharma26 rdsharma26 deleted the dataset-match-column-name-case-issue branch October 27, 2023 15:07
rdsharma26 added a commit that referenced this pull request Oct 27, 2023
…nce of columns in the given dataframes (#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
javierdlrm pushed a commit to javierdlrm/deequ that referenced this pull request Oct 31, 2023
…nce of columns in the given dataframes (awslabs#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
rdsharma26 added a commit that referenced this pull request Nov 1, 2023
…nce of columns in the given dataframes (#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
rdsharma26 added a commit that referenced this pull request Apr 16, 2024
…nce of columns in the given dataframes (#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
rdsharma26 added a commit that referenced this pull request Apr 16, 2024
…nce of columns in the given dataframes (#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
rdsharma26 added a commit that referenced this pull request Apr 16, 2024
…nce of columns in the given dataframes (#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
rdsharma26 added a commit that referenced this pull request Apr 17, 2024
…nce of columns in the given dataframes (#515)

- Prior to this change, we were doing case sensitive equality checks of non-key columns.
- This makes the utility more restrictive, as Spark does not care about the casing of column names.
- With this change, we rely on Spark to check if a column exists in the given dataframe. If Spark can find the column, we can proceed with the rest of the check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants