Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows command scripts #19090

Closed
wants to merge 2 commits into from

Conversation

minixalpha
Copy link
Contributor

@minixalpha minixalpha commented Aug 31, 2017

What changes were proposed in this pull request?

All the windows command scripts can not handle quotes in parameter.

Run a windows command shell with parameter which has quotes can reproduce the bug:

C:\Users\meng\software\spark-2.2.0-bin-hadoop2.7> bin\spark-shell --driver-java-options " -Dfile.encoding=utf-8 "
'C:\Users\meng\software\spark-2.2.0-bin-hadoop2.7\bin\spark-shell2.cmd" --driver-java-options "' is not recognized as an internal or external command,
operable program or batch file.

Windows recognize "--driver-java-options" as part of the command.
All the Windows command script has the following code have the bug.

cmd /V /E /C "<other command>" %*

We should quote command and parameters like

cmd /V /E /C ""<other command>" %*"

How was this patch tested?

Test manually on Windows 10 and Windows 7

We can verify it by the following demo:

C:\Users\meng\program\demo>cat a.cmd
@echo off
cmd /V /E /C "b.cmd" %*

C:\Users\meng\program\demo>cat b.cmd
@echo off
echo %*

C:\Users\meng\program\demo>cat c.cmd
@echo off
cmd /V /E /C ""b.cmd" %*"

C:\Users\meng\program\demo>a.cmd "123"
'b.cmd" "123' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\meng\program\demo>c.cmd "123"
"123"

With the spark-shell.cmd example, change it to the following code will make the command execute succeed.

cmd /V /E /C ""%~dp0spark-shell2.cmd" %*"
C:\Users\meng\software\spark-2.2.0-bin-hadoop2.7> bin\spark-shell  --driver-java-options " -Dfile.encoding=utf-8 "
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
...

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Sep 1, 2017

Looks ok given the examples & syntax - https://ss64.com/nt/cmd.html and https://technet.microsoft.com/en-us/library/cc771320(v=ws.11).aspx and my manual tests.

I think here is the very entry point to Windows users. So, will take a closer look few times more. Meanwhile, @minixalpha would you mind if I ask check if there are any potential corner cases that might not work as well?

@SparkQA
Copy link

SparkQA commented Sep 1, 2017

Test build #81305 has finished for PR 19090 at commit 26fc756.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@minixalpha
Copy link
Contributor Author

@HyukjinKwon Thanks for your review! Should I provide more test cases to cover the potential corner cases?

@HyukjinKwon
Copy link
Member

I believe that would make this PR much more persuasive.

@minixalpha
Copy link
Contributor Author

ok, I will give more test cases later.

@minixalpha
Copy link
Contributor Author

minixalpha commented Sep 2, 2017

I design two groups test cases:

  • Test cases about windows command scripts options
  • Examples in Spark Document

All these test cases works well.

Environment

  • Windows 10
  • spark-2.2.0-bin-hadoop2.7

Test cases about windows command scripts options

All these test cases take bin\spark-shell as example, as other commands works similarly. For each test case, I record all the java program options when run class org.apache.spark.launcher.Main and org.apache.spark.deploy.SparkSubmit, and check the options.

No option

bin\spark-shell

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""C:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell"

C:\jdk1.8.0_65\bin\java -cp "C:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;C:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" spark-shell

Has options

One option

Option has no parameter
bin\spark-shell --verbose

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --verbose

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --verbose spark-shell
Option has parameter
Option parameter has no quotes
bin\spark-shell --driver-java-options -Dfile.encoding=utf-8

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --driver-java-options -Dfile.encoding=utf-8

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g "-Dfile.encoding=utf-8" org.apache.spark.deploy.SparkSubmit --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" --class org.apache.spark.repl.Main --name "Spark shell" spark-shell
Option parameter has quotes
  • quotes one parameter
bin\spark-shell --driver-java-options "-Dfile.encoding=utf-8"

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --driver-java-options "-Dfile.encoding=utf-8"

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g "-Dfile.encoding=utf-8" org.apache.spark.deploy.SparkSubmit --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" --class org.apache.spark.repl.Main --name "Spark shell" spark-shell
  • quotes multi parameter
bin\spark-shell --driver-java-options "-Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8"

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --driver-java-options "-Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8"

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g "-Dfile.encoding=utf-8" "-Dsun.jnu.encoding=utf-8" org.apache.spark.deploy.SparkSubmit --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8" --class org.apache.spark.repl.Main --name "Spark shell" spark-shell

Multi options

all options has no quotes
bin\spark-shell --name spark-shell-fix --driver-memory 2g

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --name spark-shell-fix --driver-memory 2g

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx2g org.apache.spark.deploy.SparkSubmit --conf "spark.driver.memory=2g" --class org.apache.spark.repl.Main --name "Spark shell" --name spark-shell-fix spark-shell
some options has no quotes
bin\spark-shell --name "spark shell fix" --driver-memory 2g

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --name "spark shell fix" --driver-memory 2g

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx2g org.apache.spark.deploy.SparkSubmit --conf "spark.driver.memory=2g" --class org.apache.spark.repl.Main --name "Spark shell" --name "spark shell fix" spark-shell
all options has quotes
  • all options quotes one parameter
bin\spark-shell --name "spark shell fix" --driver-java-options "-Dfile.encoding=utf-8"

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --name "spark shell fix" --driver-java-options "-Dfile.encoding=utf-8"

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g "-Dfile.encoding=utf-8" org.apache.spark.deploy.SparkSubmit --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" --class org.apache.spark.repl.Main --name "Spark shell" --name "spark shell fix" spark-shell
  • some options quotes multi parameters
bin\spark-shell --driver-java-options "-Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8" --name "spark shell fix"

"C:\jdk1.8.0_65\bin\java" -Xmx128m -cp ""c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars"\*" org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" --driver-java-options "-Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8" --name "spark shell fix"

C:\jdk1.8.0_65\bin\java -cp "c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\conf\;c:\spark-pr-19090\spark-2.2.0-bin-hadoop2.7-fix\bin\..\jars\*" "-Dscala.usejavacp=true" -Xmx1g "-Dfile.encoding=utf-8" "-Dsun.jnu.encoding=utf-8" org.apache.spark.deploy.SparkSubmit --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8 -Dsun.jnu.encoding=utf-8" --class org.apache.spark.repl.Main --name "Spark shell" --name "spark shell fix" spark-shell

Examples in Spark document

bin\run-example.cmd SparkPi 10

bin\spark-shell --master local[2]

bin\pyspark --master local[2]

bin\spark-submit examples\src\main\python\pi.py 10

bin\sparkR --master local[2]

bin\spark-submit examples\src\main\r\dataframe.R

bin\spark-shell
   val textFile = spark.read.textFile("README.md")
   textFile.count()

bin\spark-shell --master local[4] --jars C:\Users\meng\.ivy2\jars\com.databricks_spark-avro_2.11-3.2.0.jar

bin\spark-submit --class org.apache.spark.examples.SparkPi --master local[8] C:\Users\meng\IdeaProjects\spark\examples\target\original-spark-examples_2.11-2.2.0.jar 100

@HyukjinKwon According to the result of these test cases, I think this PR can works well in different situations, anything else should be tested ?

@HyukjinKwon
Copy link
Member

Thanks for thorough testing. Yea, looks fine. Will take a look few times more by myself.

@HyukjinKwon
Copy link
Member

@jsnowacki, would you mind if I ask double check this PR when you have some time?

@jsnowacki
Copy link
Contributor

I've also tested the solution and, indeed, it works as intended, though, I never seen a complaint about this, as people tend to omit quotes, not the other way around. Also, the changes looks right and they are well motivated. The only thing I'd suggest is adding a comment about the quotes, for someone to not optimize them out in the future.

@minixalpha
Copy link
Contributor Author

@jsnowacki Thanks for reviewing this PR! There are some situations people cannot omit the quotes, such as multiple parameters of "--driver-java-options". For example: passing multiple -D arguments to driver-java-options in spark-submit on windows

Actually, I find this bug when I try to start a Spark interpreter in Apache Zeppelin. When SPARK_HOME is set, Zeppelin will add some options to spark-submit, in these options, there are some quotes, which trigger this bug in Windows. I trace the launch process of Spark interpreter and finally I found that it is a bug of Spark. Without this bugfix, Zeppelin cannot start Spark interpreter when SPARK_HOME is set on Windows.

I will add some comments about these quotes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2017

Test build #82385 has finished for PR 19090 at commit 2734282.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2017

Test build #82386 has finished for PR 19090 at commit 27b6e0b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2017

Test build #82387 has finished for PR 19090 at commit 4795b0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@minixalpha
Copy link
Contributor Author

@jsnowacki I have already add comments to explain the quotes, could you help me review the comments? Thanks.

@jsnowacki
Copy link
Contributor

I think the comments are fine and sufficiently explain extra quotes existence.

@HyukjinKwon
Copy link
Member

Thanks for reviewing @jsnowacki, let me try to take a final look. I also checked what I could all but let me double check. Just want to be careful as it's the entry point.

@HyukjinKwon
Copy link
Member

@felixcheung, this one LGTM as I checked what I could all and quite confident; however, will leave this open for few days more considering importance. Let me please cc you here to double check when you have some times or leave some comments if you have some concerns.

@felixcheung
Copy link
Member

felixcheung commented Oct 5, 2017 via email

@HyukjinKwon
Copy link
Member

Build started: PR-19090

Just in case ..

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in c7b46d4 Oct 6, 2017
@minixalpha
Copy link
Contributor Author

Thanks, @HyukjinKwon @jsnowacki @felixcheung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants