Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44210][CONNECT][SQL][PYTHON] Strengthen type checking and better comply with Connect specifications for levenshtein function #41724

Closed
wants to merge 9 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 25, 2023

What changes were proposed in this pull request?

The pr aims to follow up [SPARK-43493][SPARK-43769][SPARK-43773].

Why are the changes needed?

Strengthen type checking and better comply with Connect specifications.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Update existed UT.
  • Add new UT.
  • Pass GA.

@panbingkun
Copy link
Contributor Author

cc @cloud-fan @MaxGekk
Apologize again, I only saw the message today.

"inputName" -> "threshold",
"inputType" -> toSQLType(IntegerType),
"inputExpr" -> toSQLExpr(e)))
case Some(e) if e.eval().asInstanceOf[Int] < 0 =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where threshold = 0 is ok.

checkAnswer(df.selectExpr("levenshtein(l, r, null)"), Seq(Row(null), Row(null)))
checkAnswer(df.select(levenshtein($"l", $"r", lit(null))), Seq(Row(null), Row(null)))

checkError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception check

@panbingkun panbingkun changed the title [SPARK-43493][SPARK-43769][SPARK-43773] Strengthen type checking and better comply with Connect specifications [SPARK-43493][SPARK-43769][SPARK-43773][FOLLOWUP] Strengthen type checking and better comply with Connect specifications Jun 25, 2023
}
if (children.length == 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this if is unnecessary, we can just do threshold match ...

}
if (children.length == 3) {
threshold match {
case Some(e) if !e.foldable =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the API takes Column now, I think it's fine to allow non-foldable threshold. We can do the check at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!

}
threshold match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a constant, let's check in advance whether the value is valid

val vInt = v.asInstanceOf[Int]
if (vInt < 0) {
throw QueryExecutionErrors.invalidThresholdError(vInt)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the check at runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really possible to enter the vInt < 0 branch here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
b655f0bb7aff1a13776f5cddaf7e12cf

@@ -2223,13 +2236,20 @@ case class Levenshtein(
val leftGen = children.head.genCode(ctx)
val rightGen = children(1).genCode(ctx)
val thresholdGen = thresholdExpr.genCode(ctx)
val thresholdCheckCode =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the check at runtime.

@@ -2101,6 +2101,11 @@
],
"sqlState" : "428EK"
},
"THRESHOLD_VALUE_OUT_OF_RANGE" : {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an error class, or should we keep to using DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE? @cloud-fan

@LuciferYang
Copy link
Contributor

@panbingkun I suggest use a new jira for this one

@panbingkun
Copy link
Contributor Author

@panbingkun I suggest use a new jira for this one

OK.

messageParameters = Map(
"exprName" -> toSQLId("threshold"),
"valueRange" -> s"[0, ${Int.MaxValue}]",
"currentValue" -> toSQLValue(e.eval().asInstanceOf[Int], IntegerType)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it possible to calculate e.eval().asInstanceOf[Int] only once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Let me switch to a different code logic.

@panbingkun panbingkun changed the title [SPARK-43493][SPARK-43769][SPARK-43773][FOLLOWUP] Strengthen type checking and better comply with Connect specifications [SPARK-44210][CONNECT][SQL][PYTHON] Strengthen type checking and better comply with Connect specifications for levenshtein function Jun 27, 2023
@github-actions github-actions bot added the DOCS label Jul 4, 2023
@panbingkun
Copy link
Contributor Author

Friendly ping @cloud-fan

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 14, 2023
@LuciferYang
Copy link
Contributor

Don't we need this PR?

@github-actions github-actions bot closed this Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants