Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1876] Windows fixes to deal with latest distribution layout changes #819

Closed
wants to merge 4 commits into from

Conversation

mateiz
Copy link
Contributor

@mateiz mateiz commented May 19, 2014

  • Look for JARs in the right place
  • Launch examples the same way as on Unix
  • Load datanucleus JARs if they exist
  • Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
  • Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)

Also fixed an issues where SparkSubmit was trying to parse local files
as URLs, which fails on Windows because they contain backslashes. We
didn't need to treat those as URLs to check if a file exists.
They are excluded in SBT, but the rule added in Maven didn't actually
remove the files from the JAR. The JARs built still worked despite this,
but it's better to remove them than have 2 copies on the classpath.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15072/

shift
else
echo "Usage: ./bin/run-example <example-class> [example-args]"
echo " - set MASTER=XX to use a specific master"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this deprecated? I thought in general we want people to use --master, since this goes through Spark submit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess that was already there before

@andrewor14
Copy link
Contributor

Did you mean to also remove the hive check here? https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L93

@mateiz
Copy link
Contributor Author

mateiz commented May 19, 2014

Yes, I don't want to rely on "jar" being installed. It's not installed by default when you grab a JRE (as far as I can tell). I'd like to eventually do that on Unix too but it's okay to do it in a later release.�

@mateiz
Copy link
Contributor Author

mateiz commented May 19, 2014

My worry with Windows is people downloading pre-built Spark and getting a bizarre behavior. I'm assuming most people will work with pre-built Spark (since you'd mostly use Windows for local development) so those who build it by hand can handle a bit more complexity.

)
set "datanucleus_jars="
for %%d in ("%datanucleus_dir%\datanucleus-*.jar") do (
set datanucleus_jars=!datanucleus_jars!;%%d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mateiz I just tried this on windows 7 and my classpath includes the string !datanucleus_jars!. It should probably be %datanucleus_jars% instead?

(or did you mean to also setlocal enabledelayedexpansion here?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, how did you try it, you just ran compute-classpath? I set this in spark-class.cmd but not compute-classpath but I guess both need to work.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@mateiz
Copy link
Contributor Author

mateiz commented May 19, 2014

Thanks for the review, @andrewor14! I think I've dealt with all the comments (modulo a few I replied to above). The enabledelayedexpansion thing was very weird; if you call compute-classpath from spark-shell2, you should not set it again in compute-classpath, otherwise its changes to variables will not propagate out. But if you run compute-classpath by itself in a new process (as we do to launch executors), you should set it.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15077/

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15078/

@andrewor14
Copy link
Contributor

I tested this again on Windows building with and without hive and I can verify that it works as we expect. I think this is ready to go.

@tdas
Copy link
Contributor

tdas commented May 19, 2014

@mateiz Is this good for merging?

@mateiz
Copy link
Contributor Author

mateiz commented May 19, 2014

@tdas sure, go for it.

@tdas
Copy link
Contributor

tdas commented May 19, 2014

Great! Thanks! Merging it.

@asfgit asfgit closed this in 7b70a70 May 19, 2014
asfgit pushed a commit that referenced this pull request May 19, 2014
…anges

- Look for JARs in the right place
- Launch examples the same way as on Unix
- Load datanucleus JARs if they exist
- Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
- Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)

Author: Matei Zaharia <matei@databricks.com>

Closes #819 from mateiz/win-fixes and squashes the following commits:

d558f96 [Matei Zaharia] Fix comment
228577b [Matei Zaharia] Review comments
d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout

(cherry picked from commit 7b70a70)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…anges

- Look for JARs in the right place
- Launch examples the same way as on Unix
- Load datanucleus JARs if they exist
- Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
- Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)

Author: Matei Zaharia <matei@databricks.com>

Closes apache#819 from mateiz/win-fixes and squashes the following commits:

d558f96 [Matei Zaharia] Fix comment
228577b [Matei Zaharia] Review comments
d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
Agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
…0 private package branch (apache#819)

Co-authored-by: Egor Krivokon <>
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
…0 private package branch (apache#819)

Co-authored-by: Egor Krivokon <>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants