Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33876][SQL] Add length-check for reading char/varchar from tables w/ a external location #30882

Closed
wants to merge 6 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Dec 22, 2020

What changes were proposed in this pull request?

This PR adds the length check to the existing ApplyCharPadding rule. Tables will have external locations when users execute
SET LOCATION or CREATE TABLE ... LOCATION. If the location contains over length values we should FAIL ON READ.

Why are the changes needed?

spark-sql> INSERT INTO t2 VALUES ('1', 'b12345');
Time taken: 0.141 seconds
spark-sql> alter table t set location '/tmp/hive_one/t2';
Time taken: 0.095 seconds
spark-sql> select * from t;
1 b1234

the above case should fail rather than implicitly applying truncation

Does this PR introduce any user-facing change?

no

How was this patch tested?

new tests

@github-actions github-actions bot added the SQL label Dec 22, 2020
@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @HyukjinKwon thanks for checking this

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37792/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37792/

*/
object ApplyCharTypePadding extends Rule[LogicalPlan] {
object PaddingAndLengthCheckForCharVarChar extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CharVarChar -> CharVarchar

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133194 has finished for PR 30882 at commit 4c92503.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37799/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37798/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37799/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37798/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37801/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37801/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133200 has finished for PR 30882 at commit 85bf81d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*
* For a CHAR(N) column/field and the length of string value is M
* If M > N, raise runtime error
* If M <= N, the value should be right-padded to N characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: M <= N -> M < N?

Copy link
Member Author

@yaooqinn yaooqinn Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<= looks more specific with the 'right-padded to N' and the padding is actually applied when =

@maropu
Copy link
Member

maropu commented Dec 22, 2020

lgtm

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133201 has finished for PR 30882 at commit 59b6900.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Seq("char", "varchar").foreach { typ =>
withTempPath { dir =>
withTable("t") {
sql("SELECT '12' as c0").write.option("path", dir.toString).save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write sql("SELECT '12' as c0").write.format(format).save(dir.toString) to be more robust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Seq("char", "varchar").foreach { typ =>
withTempPath { dir =>
withTable("t") {
sql("SELECT '123456' as c0").write.option("path", dir.toString).save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

withTempPath { dir =>
withTable("t") {
sql("SELECT '12' as c0").write.option("path", dir.toString).save()
sql(s"CREATE TABLE t (c0 $typ(2)) using $format LOCATION '$dir'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is only one column, seems col is a better name.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133202 has finished for PR 30882 at commit b48071e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37817/

@cloud-fan
Copy link
Contributor

The last commit is a small update to the CharVarcharTestSuite and I've verified it locally. I'm merging it to master/3.1, thanks!

@cloud-fan cloud-fan closed this in 6da5cdf Dec 22, 2020
cloud-fan pushed a commit that referenced this pull request Dec 22, 2020
…les w/ a external location

### What changes were proposed in this pull request?
This PR adds the length check to the existing ApplyCharPadding rule. Tables will have external locations when users execute
SET LOCATION or CREATE TABLE ... LOCATION. If the location contains over length values we should FAIL ON READ.

### Why are the changes needed?

```sql
spark-sql> INSERT INTO t2 VALUES ('1', 'b12345');
Time taken: 0.141 seconds
spark-sql> alter table t set location '/tmp/hive_one/t2';
Time taken: 0.095 seconds
spark-sql> select * from t;
1 b1234
```
the above case should fail rather than implicitly applying truncation

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

Closes #30882 from yaooqinn/SPARK-33876.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 6da5cdf)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37817/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133219 has finished for PR 30882 at commit 8e0184e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133219/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants