-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
>,<,>=,<= checks fail on column names with special characters #89
Comments
// ensure that `spark` exists as an instance of `SparkSession`
import com.amazon.deequ.analyzers.Analysis
import com.amazon.deequ.analyzers.runners.AnalysisRunner
import com.amazon.deequ.checks.{ Check, CheckLevel, CheckResult }
case class Wpair(`[this] is a value!!!11!#$`: Long, `[ also a column name:`: Long)
val rawValues = Seq(Long.MaxValue, Long.MaxValue, 25252L, 25252L, 66231L)
val weirdNums = spark.createDataFrame(
rawValues.zip(rawValues.map{_-10L}).map { (Wpair.apply _).tupled }
)
val escapedCheck = Check(CheckLevel.Error, """escaped isGreaterThanOrEqualTo""").isGreaterThanOrEqualTo("`[this] is a value!!!11!#$`", "`[ also a column name:`")
val nonEscapedCheck = Check(CheckLevel.Error, """non-escaped isGreaterThanOrEqualTo""").isGreaterThanOrEqualTo("[this] is a value!!!11!#$", "[ also a column name:")
def analyze(df: DataFrame, check: Check): CheckResult =
check.evaluate(
AnalysisRunner
.onData(df)
.addAnalyzers(Analysis().analyzers ++ check.requiredAnalyzers())
.run()
)
println(s"Check on weird column name, escaped: ${analyze(weirdNums, escapedCheck).status}")
println(s"Check on weird column name, non-escaped: ${analyze(weirdNums, nonEscapedCheck).status}") Observed output is:
Investigating the print(analyze(weirdNums, nonEscapedCheck).constraintResults.head.message.get) Showing:
|
Fixed by PR #91 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
.isGreaterThan
,.isLessThan
,.isGreaterThanOrEqualTo
, and.isLessThanOrEqualTo
methods on theCheck
type will fail with a Spark SQLSyntaxError
at runtime when applied to columns whose names contain special characters or keywords.The text was updated successfully, but these errors were encountered: