Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33358][SQL] Return code when command process failed #30263

Closed
wants to merge 1 commit into from

Conversation

artiship
Copy link
Contributor

@artiship artiship commented Nov 5, 2020

What changes were proposed in this pull request?

Exit Spark SQL CLI processing loop if one of the commands (sub sql statement) process failed

Why are the changes needed?

This is a regression at Apache Spark 3.0.0.

$ cat 1.sql
select * from nonexistent_table;
select 2;

Apache Spark 2.4.7

spark-2.4.7-bin-hadoop2.7:$ bin/spark-sql -f 1.sql
20/11/15 16:14:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error in query: Table or view not found: nonexistent_table; line 1 pos 14

Apache Spark 3.0.1

$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: nonexistent_table; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [nonexistent_table]

2
Time taken: 2.786 seconds, Fetched 1 row(s)

Apache Hive 1.2.2

apache-hive-1.2.2-bin:$ bin/hive -f 1.sql

Logging initialized using configuration in jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'nonexistent_table'

Does this PR introduce any user-facing change?

Yes. This is a fix of regression.

How was this patch tested?

Pass the UT.

@github-actions github-actions bot added the SQL label Nov 5, 2020
@HyukjinKwon
Copy link
Member

@artiship how was this patch tested?

@artiship
Copy link
Contributor Author

artiship commented Nov 5, 2020

@HyukjinKwon I reproduced this issue and then verified the fix manually from my dev env, I would like to add unit test to cover this.

@HyukjinKwon
Copy link
Member

@artiship can you elaborate the reproducible steps and how you tested manually in your env?

@artiship
Copy link
Contributor Author

artiship commented Nov 5, 2020

@HyukjinKwon You can see after the first statement failed, following statement still got executed, and finally let the whole script succeed.

env

spark version: 3.0.1
os: centos 7

/tmp/tmp.sql

select * from nonexistent_table;
select 2;

submit command:

export HADOOP_USER_NAME=my-hadoop-user
bin/spark-sql  \
--master yarn \
--deploy-mode client \
--queue my.queue.name \
--conf spark.driver.host=$(hostname -i) \
--conf spark.app.name=spark-test  \
--name "spark-test" \
-f /tmp/tmp.sql 

execution log:

# bin/spark-sql  \
> --master yarn \
> --deploy-mode client \
> --queue my.queue.name \
> --conf spark.driver.host=$(hostname -i) \
> --conf spark.app.name=spark-test  \
> --name "spark-test" \
> -f /tmp/tmp.sql
20/11/06 00:06:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.spark.client.rpc.server.address.use.ip does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.spark.client.submit.timeout.interval does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.enforce.bucketing does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.run.timeout.seconds does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.support.sql11.reserved.keywords does not exist
20/11/06 00:06:20 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
20/11/06 00:06:20 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
20/11/06 00:06:22 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

Error in query: Table or view not found: nonexistent_table; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [nonexistent_table]

2
2
Time taken: 4.437 seconds, Fetched 1 row(s)

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35287/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Test build #130676 has finished for PR 30263 at commit 96c753c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35287/

@HyukjinKwon
Copy link
Member

Looks like something went wrong during rebase. You can either resolve it here or just open another PR.

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Test build #130687 has finished for PR 30263 at commit 3d71aca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35298/

@artiship
Copy link
Contributor Author

artiship commented Nov 6, 2020

add a new test case

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35298/

@artiship
Copy link
Contributor Author

artiship commented Nov 6, 2020

Looks like something went wrong during rebase. You can either resolve it here or just open another PR.

@HyukjinKwon Sorry for the incorrect rebase. It was resolved and now fix and test case have been merged into one commit.

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Test build #130710 has finished for PR 30263 at commit db231c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35320/

@SparkQA
Copy link

SparkQA commented Nov 6, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35320/

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 11, 2020

Hi, @artiship . Does this bug happen only in YARN environment? It seems to work correctly in the local environment.

spark-3.0.1-bin-hadoop3.2:$ cat 1.sql
select * from n;
select 2;

spark-3.0.1-bin-hadoop3.2:$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: n; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [n]

2
Time taken: 2.244 seconds, Fetched 1 row(s)

@artiship
Copy link
Contributor Author

artiship commented Nov 12, 2020

@dongjoon-hyun

I've verified it again using a distribution dowloaded from the apache spark website:

https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz

This bug still can reproduce in local mode. The spark-sql should break after the first statement failed.

➜  spark-3.0.1-bin-hadoop2.7 cat 1.sql
select * from n;
select 2+2;
➜  spark-3.0.1-bin-hadoop2.7 bin/spark-sql -f 1.sql
20/11/12 11:35:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/12 11:35:38 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
20/11/12 11:35:38 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
20/11/12 11:35:46 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
20/11/12 11:35:46 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore lichuanliang@10.104.16.36
Error in query: Table or view not found: n; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [n]

4
Time taken: 4.39 seconds, Fetched 1 row(s)

With no statement fail:

➜  spark-3.0.1-bin-hadoop2.7 cat 2.sql
select 1+1;
select 2+2;

➜  spark-3.0.1-bin-hadoop2.7 bin/spark-sql -f 2.sql
20/11/12 11:45:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/12 11:46:06 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
20/11/12 11:46:06 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
20/11/12 11:46:14 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
20/11/12 11:46:14 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore lichuanliang@10.104.16.36
2
Time taken: 5.167 seconds, Fetched 1 row(s)
4
Time taken: 0.151 seconds, Fetched 1 row(s)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given @artiship 's and my results, it seems that this does not happen always.

For me, this doesn't happen on Mac (Big Sur) so far.

spark-3.0.1-bin-hadoop3.2:$ sw_vers
ProductName:	macOS
ProductVersion:	11.0.1
spark-3.0.1-bin-hadoop3.2:$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: n; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [n]

4
Time taken: 3.251 seconds, Fetched 1 row(s)
spark-3.0.1-bin-hadoop2.7:$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: n; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [n]

4
Time taken: 5.6 seconds, Fetched 1 row(s)

I'm looking at linux environment.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 15, 2020

Ah, it seems that there is misunderstanding due to the PR description, @artiship .

In the PR description and your comment (#30263 (comment)), you mentioned that the output 2 is repeated twice. Is that true?

Screen Shot 2020-11-15 at 3 57 40 PM

@@ -571,4 +571,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging {
// the date formatter for `java.sql.LocalDate` must output negative years with sign.
runCliWithin(1.minute)("SELECT MAKE_DATE(-44, 3, 15);" -> "-0044-03-15")
}

test("SPARK-33358 CLI should break when have command failed") {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case seems to succeed without your patch.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artiship and @HyukjinKwon . I verified this patch manually.
However, the test code is wrong because it always succeeds.
In addition, I realized that the current test framework runCliWithin has limitation to test this kind of this. So, I'll proceed to merge without the test case now. We can revise the test framework later.

dongjoon-hyun pushed a commit that referenced this pull request Nov 16, 2020
Exit Spark SQL CLI processing loop if one of the commands (sub sql statement) process failed

This is a regression at Apache Spark 3.0.0.

```
$ cat 1.sql
select * from nonexistent_table;
select 2;
```

**Apache Spark 2.4.7**
```
spark-2.4.7-bin-hadoop2.7:$ bin/spark-sql -f 1.sql
20/11/15 16:14:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error in query: Table or view not found: nonexistent_table; line 1 pos 14
```

**Apache Spark 3.0.1**
```
$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: nonexistent_table; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [nonexistent_table]

2
Time taken: 2.786 seconds, Fetched 1 row(s)
```

**Apache Hive 1.2.2**
```
apache-hive-1.2.2-bin:$ bin/hive -f 1.sql

Logging initialized using configuration in jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'nonexistent_table'
```

Yes. This is a fix of regression.

Pass the UT.

Closes #30263 from artiship/SPARK-33358.

Authored-by: artiship <meilziner@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 1ae6d64)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Thank you so much for your contribution, @artiship .
This landed at master branch for Apache Spark 3.1.0 and branch-3.0 for Apache Spark 3.0.2.
SPARK-33358 will be assigned to you. What is your JIRA id?

@dongjoon-hyun
Copy link
Member

I found your id from the commit log.
SPARK-33358 is assigned to you. Thanks.

@artiship
Copy link
Contributor Author

@dongjoon-hyun Thanks for your carefully review. The output 2 repeated twice is a testing result I got from my production environment. It seems that it might have a configuration makes it print both command and result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants