-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19912][SQL] String literals should be escaped for Hive metastore partition pruning #17266
Conversation
…re partition pruning
s"""'$str'""" | ||
} else { | ||
throw new UnsupportedOperationException( | ||
"""Partition filter cannot have both `"` and `'` characters""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current master also raise exception with this mixed case.
scala> spark.table("t1").filter($"p" === "'\"").select($"a").show
java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from ...
...
Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Error parsing partition filter : line 1:8 mismatched character '<EOF>' expecting '"'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table.filter($"p" === """a"'b""").select($"a")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that return the correct result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the current master branch, Hive does not accept that.
scala> Seq((1, "a\"'b")).toDF("a", "p").write.partitionBy("p").saveAsTable("t1")
scala> spark.table("t1").show()
+---+----+
| a| p|
+---+----+
| 1|a"'b|
+---+----+
scala> spark.table("t1").filter($"p" === """a"'b""").select($"a").show
java.lang.RuntimeException:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got the exception from Hive, because they are not escaped.
Test build #74400 has finished for PR 17266 at commit
|
We should add the escape, instead of adding quotes. Right? |
Based on Hive' doc (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types):
|
Yep. I tried escaping first, but it doesn't work inside Hive side. I mean for the mixed cases. |
BTW, I forgot to thank you. :) Thank you for review. |
Have you tried C-style escaping? |
Yes. What I meant was that is not supported correctly from Hive (as described in its documentation). You can test by changing here with the escaped string. |
Could you try Hive to double check it? Is this a bug in Hive? |
It's not bug of Hive CLI, it seems a limitation of that API, |
If you want, I will make some other failure test cases in this PR to make it sure for you and the others. |
Ok, how about submitting a separate PR by escaping the string? You can show the reviewers the failure cases there. |
Yep. I see. Thank for the guide, @gatorsmile ! |
#17271 failed as expected. Hive API does not handle the filters with escaped string, e.g. two escaped chars like |
Could you do more investigation about the impact of the following two Hive JIRAs? https://issues.apache.org/jira/browse/HIVE-11723 Thank you! |
Sure! |
For HIVE-11723, it resolved it in SemanticAnalyzer. I think it's possible to bring that into our @JoshRosen's repo. Let's me backport that to see if this is enough. |
can we say something more in the error message? We should explain that it's a hive bug and put the hive jira in it |
The following is the error message. Since we are not escaping in the spark master, the behavior (incorrect filtering or the error message) is the same from the master branch Spark.
HIVE-11723 seems to resove that in SemanticAnalyzer. So, I need to try that soon. |
For non-error message cases, incorrect result is also a problem in this issue. |
…re partition pruning ## What changes were proposed in this pull request? Since current `HiveShim`'s `convertFilters` does not escape the string literals. There exists the following correctness issues. This PR aims to return the correct result and also shows the more clear exception message. **BEFORE** ```scala scala> Seq((1, "p1", "q1"), (2, "p1\" and q=\"q1", "q2")).toDF("a", "p", "q").write.partitionBy("p", "q").saveAsTable("t1") scala> spark.table("t1").filter($"p" === "p1\" and q=\"q1").select($"a").show +---+ | a| +---+ +---+ scala> spark.table("t1").filter($"p" === "'\"").select($"a").show java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from ... ``` **AFTER** ```scala scala> spark.table("t1").filter($"p" === "p1\" and q=\"q1").select($"a").show +---+ | a| +---+ | 2| +---+ scala> spark.table("t1").filter($"p" === "'\"").select($"a").show java.lang.UnsupportedOperationException: Partition filter cannot have both `"` and `'` characters ``` ## How was this patch tested? Pass the Jenkins test with new test cases. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #17266 from dongjoon-hyun/SPARK-19912. (cherry picked from commit 21e366a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.1! |
Oh, thank you, @cloud-fan ! |
What changes were proposed in this pull request?
Since current
HiveShim
'sconvertFilters
does not escape the string literals. There exists the following correctness issues. This PR aims to return the correct result and also shows the more clear exception message.BEFORE
AFTER
How was this patch tested?
Pass the Jenkins test with new test cases.