New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33876][SQL] Add length-check for reading char/varchar from tables w/ a external location #30882
Conversation
…les w/ a external location
cc @cloud-fan @maropu @HyukjinKwon thanks for checking this |
Kubernetes integration test starting |
Kubernetes integration test status success |
*/ | ||
object ApplyCharTypePadding extends Rule[LogicalPlan] { | ||
object PaddingAndLengthCheckForCharVarChar extends Rule[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CharVarChar
-> CharVarchar
sql/core/src/test/scala/org/apache/spark/sql/execution/command/CharVarcharDDLTestBase.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
Outdated
Show resolved
Hide resolved
…les w/ a external location
sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
Outdated
Show resolved
Hide resolved
Test build #133194 has finished for PR 30882 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133200 has finished for PR 30882 at commit
|
* | ||
* For a CHAR(N) column/field and the length of string value is M | ||
* If M > N, raise runtime error | ||
* If M <= N, the value should be right-padded to N characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: M <= N
-> M < N
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<=
looks more specific with the 'right-padded to N' and the padding is actually applied when =
lgtm |
Test build #133201 has finished for PR 30882 at commit
|
Seq("char", "varchar").foreach { typ => | ||
withTempPath { dir => | ||
withTable("t") { | ||
sql("SELECT '12' as c0").write.option("path", dir.toString).save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we write sql("SELECT '12' as c0").write.format(format).save(dir.toString)
to be more robust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Seq("char", "varchar").foreach { typ => | ||
withTempPath { dir => | ||
withTable("t") { | ||
sql("SELECT '123456' as c0").write.option("path", dir.toString).save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
withTempPath { dir => | ||
withTable("t") { | ||
sql("SELECT '12' as c0").write.option("path", dir.toString).save() | ||
sql(s"CREATE TABLE t (c0 $typ(2)) using $format LOCATION '$dir'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is only one column, seems col
is a better name.
Test build #133202 has finished for PR 30882 at commit
|
Kubernetes integration test starting |
The last commit is a small update to the |
…les w/ a external location ### What changes were proposed in this pull request? This PR adds the length check to the existing ApplyCharPadding rule. Tables will have external locations when users execute SET LOCATION or CREATE TABLE ... LOCATION. If the location contains over length values we should FAIL ON READ. ### Why are the changes needed? ```sql spark-sql> INSERT INTO t2 VALUES ('1', 'b12345'); Time taken: 0.141 seconds spark-sql> alter table t set location '/tmp/hive_one/t2'; Time taken: 0.095 seconds spark-sql> select * from t; 1 b1234 ``` the above case should fail rather than implicitly applying truncation ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests Closes #30882 from yaooqinn/SPARK-33876. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6da5cdf) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Kubernetes integration test status success |
Test build #133219 has finished for PR 30882 at commit
|
Refer to this link for build results (access rights to CI server needed): |
What changes were proposed in this pull request?
This PR adds the length check to the existing ApplyCharPadding rule. Tables will have external locations when users execute
SET LOCATION or CREATE TABLE ... LOCATION. If the location contains over length values we should FAIL ON READ.
Why are the changes needed?
the above case should fail rather than implicitly applying truncation
Does this PR introduce any user-facing change?
no
How was this patch tested?
new tests