-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33358][SQL] Return code when command process failed #30263
Conversation
@artiship how was this patch tested? |
@HyukjinKwon I reproduced this issue and then verified the fix manually from my dev env, I would like to add unit test to cover this. |
@artiship can you elaborate the reproducible steps and how you tested manually in your env? |
@HyukjinKwon You can see after the first statement failed, following statement still got executed, and finally let the whole script succeed. env
/tmp/tmp.sql select * from nonexistent_table;
select 2; submit command: export HADOOP_USER_NAME=my-hadoop-user
bin/spark-sql \
--master yarn \
--deploy-mode client \
--queue my.queue.name \
--conf spark.driver.host=$(hostname -i) \
--conf spark.app.name=spark-test \
--name "spark-test" \
-f /tmp/tmp.sql execution log: # bin/spark-sql \
> --master yarn \
> --deploy-mode client \
> --queue my.queue.name \
> --conf spark.driver.host=$(hostname -i) \
> --conf spark.app.name=spark-test \
> --name "spark-test" \
> -f /tmp/tmp.sql
20/11/06 00:06:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.spark.client.rpc.server.address.use.ip does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.spark.client.submit.timeout.interval does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.enforce.bucketing does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.run.timeout.seconds does not exist
20/11/06 00:06:20 WARN HiveConf: HiveConf of name hive.support.sql11.reserved.keywords does not exist
20/11/06 00:06:20 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
20/11/06 00:06:20 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
20/11/06 00:06:22 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Error in query: Table or view not found: nonexistent_table; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [nonexistent_table]
2
2
Time taken: 4.437 seconds, Fetched 1 row(s) |
ok to test |
Kubernetes integration test starting |
Test build #130676 has finished for PR 30263 at commit
|
Kubernetes integration test status success |
Looks like something went wrong during rebase. You can either resolve it here or just open another PR. |
Test build #130687 has finished for PR 30263 at commit
|
Kubernetes integration test starting |
add a new test case |
Kubernetes integration test status success |
@HyukjinKwon Sorry for the incorrect rebase. It was resolved and now fix and test case have been merged into one commit. |
Test build #130710 has finished for PR 30263 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Hi, @artiship . Does this bug happen only in YARN environment? It seems to work correctly in the local environment.
|
I've verified it again using a distribution dowloaded from the apache spark website: https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz This bug still can reproduce in local mode. The spark-sql should break after the first statement failed.
With no statement fail:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given @artiship 's and my results, it seems that this does not happen always.
For me, this doesn't happen on Mac (Big Sur)
so far.
spark-3.0.1-bin-hadoop3.2:$ sw_vers
ProductName: macOS
ProductVersion: 11.0.1
spark-3.0.1-bin-hadoop3.2:$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: n; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [n]
4
Time taken: 3.251 seconds, Fetched 1 row(s)
spark-3.0.1-bin-hadoop2.7:$ bin/spark-sql -f 1.sql
Error in query: Table or view not found: n; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation [n]
4
Time taken: 5.6 seconds, Fetched 1 row(s)
I'm looking at linux environment.
Ah, it seems that there is misunderstanding due to the PR description, @artiship . In the PR description and your comment (#30263 (comment)), you mentioned that the output |
@@ -571,4 +571,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { | |||
// the date formatter for `java.sql.LocalDate` must output negative years with sign. | |||
runCliWithin(1.minute)("SELECT MAKE_DATE(-44, 3, 15);" -> "-0044-03-15") | |||
} | |||
|
|||
test("SPARK-33358 CLI should break when have command failed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case seems to succeed without your patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artiship and @HyukjinKwon . I verified this patch manually.
However, the test code is wrong because it always succeeds.
In addition, I realized that the current test framework runCliWithin
has limitation to test this kind of this. So, I'll proceed to merge without the test case now. We can revise the test framework later.
Exit Spark SQL CLI processing loop if one of the commands (sub sql statement) process failed This is a regression at Apache Spark 3.0.0. ``` $ cat 1.sql select * from nonexistent_table; select 2; ``` **Apache Spark 2.4.7** ``` spark-2.4.7-bin-hadoop2.7:$ bin/spark-sql -f 1.sql 20/11/15 16:14:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Error in query: Table or view not found: nonexistent_table; line 1 pos 14 ``` **Apache Spark 3.0.1** ``` $ bin/spark-sql -f 1.sql Error in query: Table or view not found: nonexistent_table; line 1 pos 14; 'Project [*] +- 'UnresolvedRelation [nonexistent_table] 2 Time taken: 2.786 seconds, Fetched 1 row(s) ``` **Apache Hive 1.2.2** ``` apache-hive-1.2.2-bin:$ bin/hive -f 1.sql Logging initialized using configuration in jar:file:/Users/dongjoon/APACHE/hive-release/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'nonexistent_table' ``` Yes. This is a fix of regression. Pass the UT. Closes #30263 from artiship/SPARK-33358. Authored-by: artiship <meilziner@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 1ae6d64) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Thank you so much for your contribution, @artiship . |
I found your id from the commit log. |
@dongjoon-hyun Thanks for your carefully review. The output 2 repeated twice is a testing result I got from my production environment. It seems that it might have a configuration makes it print both command and result. |
What changes were proposed in this pull request?
Exit Spark SQL CLI processing loop if one of the commands (sub sql statement) process failed
Why are the changes needed?
This is a regression at Apache Spark 3.0.0.
Apache Spark 2.4.7
Apache Spark 3.0.1
Apache Hive 1.2.2
Does this PR introduce any user-facing change?
Yes. This is a fix of regression.
How was this patch tested?
Pass the UT.