Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11518] [Deploy, Windows] Handle spaces in Windows command scripts #10789

Closed
wants to merge 5 commits into from

Conversation

tritab
Copy link
Contributor

@tritab tritab commented Jan 17, 2016

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Seems reasonable to me on the surface, but I'm not a frequent Windows user so it would be good to have someone else look, too. @tsudukim, would you be interested in looking this over, since you've made a lot of modifications to these scripts in the past?

@tsudukim
Copy link
Contributor

@JoshRosen Thank you for your involvement.

It seems a good fix, but it doesn't work for my environment because we should fix more files to handle spaces properly.
For example, in pyspark2.cmd we should also fix these call lines because %SPARK_HOME% contains space.

...(snip)...
call %SPARK_HOME%\bin\load-spark-env.cmd
...(snip)...
call %SPARK_HOME%\bin\spark-submit2.cmd pyspark-shell-main --name "PySparkShell" %*

As this is just a example, there are many other codes that should be double-quoted other than this.

@srowen
Copy link
Member

srowen commented Jan 19, 2016

@tritab can you have a look at other similar instances in other .cmd files that may need quoting? it'd be good to deal with this all at once.

@JoshRosen
Copy link
Contributor

Aside: has someone written something like shellcheck for Windows shell scripts? That'd be a really quick way to identify all of the places that need quoting, provided such a tool exists.

@tritab
Copy link
Contributor Author

tritab commented Jan 20, 2016

Thanks for the feedback. I'll dig in a bit deeper.

@tritab
Copy link
Contributor Author

tritab commented Jan 20, 2016

I made a number of updates to all of the .cmd files, @tsudukim please review to see how it works in your environment. It appears to handle a spark install under "Program Files" now. However, I have a couple of outstanding concerns that could use some work.

  1. If you ctrl-c out of a command, the prompt is left in the spark directory rather than where you started. This isn't optimal.
  2. pyspark wasn't working for me, but I have 3.4 installed,which may be part of the issue
  3. I wasn't able to test R, so that may not be fully functional

Any feedback is welcome.

@JoshRosen I looked briefly, but wasn't able to find a tool similar to shellcheck for Windows. That certainly would be helpful though.

@tsudukim
Copy link
Contributor

Thank you @tritab .
Actually I'm not trying your new PR yet (because my new PC is not set up), but I looked into your code and I've got 2 concerns about using cd or pushd in the scripts.

The 1st is that as @tritab already mentioned, if we ctrl-c, our terminal might be left in the SPARK_HOME folder.

And the 2nd is, if we change current directory, I'm worried that some command which specifies relative path doesn't work properly. For example, when we execute spark-submit on yarn, we specify application JAR file like this:

bin/spark-submit.cmd --master yarn ...(snip)... lib\spark-examples*.jar

If we change current directory, the relative path seems not to work.
Same problem might occur in other situations, like sending JARs when spark-submit, or loading script when spark-shell or when pyspark etc.

Did you face some problems to use double quotations like

cmd /V /E /C "%~dp0spark-shell2.cmd" %*

instead of using pushd ?

@srowen
Copy link
Member

srowen commented Jan 23, 2016

@tsudukim @tritab if changing directory in controversial, shall we just focus on solving the quoting problem?

@tsudukim
Copy link
Contributor

I think just adding the quotation is good to solve this problem.

@tritab
Copy link
Contributor Author

tritab commented Jan 26, 2016

Thanks for the feedback, I'll rework with quotes and push another commit soon.

@tritab
Copy link
Contributor Author

tritab commented Feb 4, 2016

@tsudukim I reworked pushd with quoting as you suggested. This tests successfully on my Windows 10 machine for spark-shell, pyspark, sparkR and run-example. I would appreciate it if you could review as well.

@srowen
Copy link
Member

srowen commented Feb 6, 2016

The changes are mostly quoting and those look fine to me. There are a few cases where you removed %dp0 and I'm not entirely familiar with this, but I think you know what you're doing. The SPARK_HOME change looks good. I'd love to get a +1 from @tsudukim before merging

@tritab
Copy link
Contributor Author

tritab commented Feb 6, 2016

@srowen Thank you for your review. The%~dp0 changes were to get the current directory working properly. I should mention I tested with spark-submit as well. I did not find any issues with the latest commit. I would also like someone else to test on Windows before merging. Perhaps @andrewor14 would be willing to review this?

@srowen
Copy link
Member

srowen commented Feb 10, 2016

Merged to master. This looks well thought out and correct to my knowledge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants