Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Escape Column Name in Generated Spark SQL #85

Merged

Conversation

malcolmgreaves
Copy link
Contributor

@malcolmgreaves malcolmgreaves commented Jan 31, 2019

Issue #, if available:
#84

Description of changes:
This PR properly escapes column names in generated Spark SQL code.

Currently, when executing a isContainedIn Check on any DataFrame that contains any column whose name contains any characters that make up any reserved keyword in Spark SQL will cause a runtime failure. Digging into it, the runtime failure will show the root cause is a Spark SQL syntax error (specifically a org.apache.spark.sql.catalyst.parser.ParseException) due to incorrectly parsing the column name.

E.g. if the column name starts with [, then the exception will read as:

org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '[' expecting ...

The fix for this bug is to wrap all bare column names in generated Spark SQL with backticks: ``. I.e. a query such as $column IS NULL is unsafe, but it's escaped variant is ok to execute, regardless of the value of $column:

`$column` IS NULL

A new test showcasing this functionality is in the CheckTest class named "Check on column names with special characters". This test executes the isContainedIn Check on both a value list as well as a (minimum, maximum) boundary. It succeeds only when the associated generated Spark SQL code has the escaped column name.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@sscdotopen sscdotopen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that perfect pull request!

@sscdotopen sscdotopen merged commit f5afe8a into awslabs:master Feb 1, 2019
@malcolmgreaves
Copy link
Contributor Author

:D

Thank you for accepting & merging! deequ is wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants