-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac #3707
Conversation
…will automatically download, install, and execute maven with zinc for easier build capability
…a file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
Can one of the admins verify this patch? |
Hi @brennonyork I was trying this patch out. Seemed good overall. I felt it would be good to print some info messages that indicates what is happening for example "Maven not found, trying to install" etc. And I am not sure but if this script runs on jenkins then curl's progress bar adds to spam in the jenkins log. |
…gle echo output to denote start of a download if download is needed
@ScrapCodes good suggestion on the jenkins output. Modified it to print a single echo statement if a download is needed and quieted or silenced the |
@pwendell how is this looking from your end given the previous discussions through email threads? |
Bump @pwendell |
# check if we have curl installed | ||
# download application | ||
[ ! -f "${local_tarball}" ] && [ -n "`which curl 2>/dev/null`" ] && \ | ||
echo "exec: curl -s ${remote_tarball}" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these curls show a status bar similar to our sbt download script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove the -s
flag it would show the download progress, but, as @ScrapCodes mentioned, if we leave it in then it would pummel the Jenkins logs with said download progress. @pwendell your call if you'd rather see it in or not, just let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - I think @nchammas removed these progress bars in 19f61c1 for some other cases. I would actually like to have the progress bars because otherwise it's very confusing what is happening when developing locally. However, we can check whether AMPLAB_JENKINS
is set, and not show them in that case. Similar to here:
Line 80 in daaca14
if [ -n "$AMPLAB_JENKINS" ]; then |
If you wanted to fix the other instance as well @brennonyork, that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I removed them to clear up the Jenkins logs. It would probably be
nicer as Patrick suggested if the bars were suppressed only when Jenkins is
executing the build.
On 2014년 12월 22일 (월) at 오후 5:32 Patrick Wendell notifications@github.com
wrote:
In build/mvn
#3707 (diff):+# Installs any application tarball given a URL, the expected tarball name,
+# and, optionally, a checkable binary path to determine if the binary has
+# already been installed
+## Arg1 - URL
+## Arg2 - Tarball Name
+## Arg3 - Checkable Binary
+install_app() {
- local remote_tarball="$1/$2"
- local local_tarball="${_DIR}/$2"
- local binary="${_DIR}/$3"
- if [ -z "$3" -o ! -f "$binary" ]; then
check if we already have the tarball
check if we have curl installed
download application
- [ ! -f "${local_tarball}" ] && [ -n "
which curl 2>/dev/null
" ] && \echo "exec: curl -s ${remote_tarball}" && \
I see - I think @nchammas https://github.com/nchammas removed these
progress bars in 19f61c1
19f61c1
for some other cases. I would actually like to have the progress bars
because otherwise it's very confusing what is happening when developing
locally. However, we can check whether AMPLAB_JENKINS is set, and not
show them in that case. Similar to here:Line 80 in daaca14
if [ -n "$AMPLAB_JENKINS" ]; then If you wanted to fix the other instance as well @brennonyork
https://github.com/brennonyork, that would be great.—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/3707/files#r22194686.
Hey I took a pass with some comments. Thanks for doing this, it's looking great and I think it will speed up development a lot for people who don't want to bother themselves with installing zinc. |
Also one other thing - could you move the "sbt/sbt" scripts to the "build" folder, but write a small script that just invokes "build/sbt" with a notice saying that it has moved? Second, could you update the relevant docs to reflect both the sbt change (the old sbt/sbt path probably has references to it peppered throughout Spark) and the new build/mvn tool? I would say right at the top of our doc that we now bundle a correctly configured maven installation (i.e. in docs/building-spark.md). Also, looking at that doc reminds me that we should add the following in the script:
If |
local binary="${_DIR}/$3" | ||
|
||
# setup `curl` and `wget` silent options if we're running on Jenkins | ||
local curl_opts="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might need to pass --progress-bar
to curl and --progress=bar
to wget. At least this is what our old code did.
…ess bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
… sbt/sbt was referenced with build/sbt
@pwendell updated the script per the discussion, updated all places where |
It looks like there's a merge conflict in |
|
||
To run test suites of a specific sub project as follows: | ||
|
||
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test | ||
build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test | ||
|
||
# Speeding up Compilation with Zinc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could mention build/zinc
under this heading, and maybe mention something about how / when it's auto-started?
I'm testing this out on EC2 right now and it's pretty nice, especially since it's kind of a pain to install Maven since it's not in any of the default EC2 package manager repositories. |
Thanks for updating! It looks like this is still merge-conflicted for some reason, but we can probably resolve this ourselves on merge. This looks good to me. @pwendell, are you good to go on this? Also, do you think we should get Jenkins to run this? |
# the build/ folder | ||
install_scala() { | ||
# determine the Scala version used in Spark | ||
local scala_version=`grep "scala.version" "${_DIR}/../pom.xml" | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be overkill to use a command-line utility like xmlstarlet
to select this field? I'm worried that seemingly-innocuous changes to the POM might break the Maven build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would probably be less brittle, but I guess it would introduce another dependency which we'd have to install in this script (since we want it to be a one-click installer). Since xmlstarlet
binaries aren't portable, the installation logic might be complex since we'd need to have some platform-specific logic to download the right binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I had originally used python
to grab this out in a more concrete way, but @pwendell recommended we remove as many dependencies as possible so I switched over to the grep | cut
scenario. Let me think it over and I might be able to find bash
-only way to assuredly grab the correct version. That said, we could always add an entry to project/build.properties
(e.g. scala.version=2.10.4
) and confidently pull it from there without all this hassle. What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'd lean towards just leaving it as-is. Doesn't seem to be worth the hassle when compared against the requirement to reduce complexity and dependencies.
Jenkins, test this please. |
Test build #24840 has started for PR 3707 at commit
|
[[ -f "$sbt_opts_file" ]] && set -- $(loadConfigFile "$sbt_opts_file") "$@" | ||
|
||
run "$@" | ||
${_DIR}/../build/sbt $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to quote the "$@" to preserve argument arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does, great catch! I'll be back to a computer in a few hours and can commit the changes, but, if you want to merge before that, feel free to update. Thanks!
-----Original Message-----
From: Patrick Wendell [notifications@github.commailto:notifications@github.com]
Sent: Friday, December 26, 2014 04:13 PM Eastern Standard Time
To: apache/spark
Cc: York, Brennon
Subject: Re: [spark] [SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac (#3707)
In sbt/sbthttps://github.com//pull/3707#discussion-diff-22288516:
- done
Now, ensure sbt version is used.
- [[ "${sbt_version}XXX" != "XXX" ]] && addJava "-Dsbt.version=$sbt_version"
-}-loadConfigFile() {
- cat "$1" | sed '/^#/d'
-}-# if sbtopts files exist, prepend their contents to $@ so it can be processed by this runner
-[[ -f "$etc_sbt_opts_file" ]] && set -- $(loadConfigFile "$etc_sbt_opts_file") "$@"
-[[ -f "$sbt_opts_file" ]] && set -- $(loadConfigFile "$sbt_opts_file") "$@"-run "$@"
+${_DIR}/../build/sbt $@
Does this need to quote the "$@" to preserve argument arrays?
—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3707/files#r22288516.
The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
One small correctness question (around quoting) but looks good to me. I can merge this later today and fix it manually if @brennonyork doesn't get around to it. |
@brennonyork Does this handle relative paths passed to Maven correctly (if that's a valid potential use case)? We had this problem with the |
This will handle relative directories just fine. The last portion of this script changes the directory back to the -----Original Message----- @brennonyorkhttps://github.com/brennonyork Does this handle relative paths passed to Maven correctly (if that's a valid potential use case)? We had this problem with the spark-ec2 script which was caused by the script changing the working directory on the userhttps://github.com//pull/2988. — The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. |
Test build #24840 has finished for PR 3707 at commit
|
Test FAILed. |
…/sbt, fixed bug where relative paths would fail if passed in from build/mvn
Jenkins, test this please. LGTM pending tests. |
Test build #24842 has started for PR 3707 at commit
|
Test build #24842 has finished for PR 3707 at commit
|
Test PASSed. |
👍 On average, it would be better to avoid changing the working directly at all when possible, or change and restore it using For now, though, I think this is fine. |
@witgo commented on the actual commit: a3e51cc#commitcomment-9101060
Can you investigate? |
@JoshRosen I can't reproduce this error and, after looking through the code, I'm not seeing where an issue like that could crop up :/ @witgo could you help me understand when you're seeing this and provide me the output of |
@brennonyork Sorry, I tried many times, could not reproduce the issue. |
@witgo just for clarity, does this mean you aren't seeing this issue anymore? Want to ensure you aren't having any more troubles! :) |
Seems to be the issue caused by the command line is killed. |
Creates a top level directory script (as
build/mvn
) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under thebuild/
directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.