New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44210][CONNECT][SQL][PYTHON] Strengthen type checking and better comply with Connect specifications for levenshtein
function
#41724
Conversation
…better comply with Connect specifications
cc @cloud-fan @MaxGekk |
…better comply with Connect specifications
"inputName" -> "threshold", | ||
"inputType" -> toSQLType(IntegerType), | ||
"inputExpr" -> toSQLExpr(e))) | ||
case Some(e) if e.eval().asInstanceOf[Int] < 0 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where threshold = 0 is ok.
checkAnswer(df.selectExpr("levenshtein(l, r, null)"), Seq(Row(null), Row(null))) | ||
checkAnswer(df.select(levenshtein($"l", $"r", lit(null))), Seq(Row(null), Row(null))) | ||
|
||
checkError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exception check
} | ||
if (children.length == 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems this if
is unnecessary, we can just do threshold match ...
} | ||
if (children.length == 3) { | ||
threshold match { | ||
case Some(e) if !e.foldable => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the API takes Column
now, I think it's fine to allow non-foldable threshold. We can do the check at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay!
…cking and better comply with Connect specifications
} | ||
threshold match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a constant, let's check in advance whether the value is valid
val vInt = v.asInstanceOf[Int] | ||
if (vInt < 0) { | ||
throw QueryExecutionErrors.invalidThresholdError(vInt) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the check at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really possible to enter the vInt < 0
branch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2223,13 +2236,20 @@ case class Levenshtein( | |||
val leftGen = children.head.genCode(ctx) | |||
val rightGen = children(1).genCode(ctx) | |||
val thresholdGen = thresholdExpr.genCode(ctx) | |||
val thresholdCheckCode = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the check at runtime.
@@ -2101,6 +2101,11 @@ | |||
], | |||
"sqlState" : "428EK" | |||
}, | |||
"THRESHOLD_VALUE_OUT_OF_RANGE" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an error class, or should we keep to using DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE
? @cloud-fan
@panbingkun I suggest use a new jira for this one |
OK. |
messageParameters = Map( | ||
"exprName" -> toSQLId("threshold"), | ||
"valueRange" -> s"[0, ${Int.MaxValue}]", | ||
"currentValue" -> toSQLValue(e.eval().asInstanceOf[Int], IntegerType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is it possible to calculate e.eval().asInstanceOf[Int]
only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Let me switch to a different code logic.
levenshtein
function
…er comply with Connect specifications for levenshtein function
…er comply with Connect specifications for levenshtein function
Friendly ping @cloud-fan |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Don't we need this PR? |
What changes were proposed in this pull request?
The pr aims to follow up [SPARK-43493][SPARK-43769][SPARK-43773].
Why are the changes needed?
Strengthen type checking and better comply with Connect specifications.
Does this PR introduce any user-facing change?
No.
How was this patch tested?