Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac #3707

Closed
wants to merge 20 commits into from

Conversation

brennonyork
Copy link

Creates a top level directory script (as build/mvn) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the build/ directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ScrapCodes
Copy link
Member

Hi @brennonyork

I was trying this patch out. Seemed good overall. I felt it would be good to print some info messages that indicates what is happening for example "Maven not found, trying to install" etc. And I am not sure but if this script runs on jenkins then curl's progress bar adds to spam in the jenkins log.

…gle echo output to denote start of a download if download is needed
@brennonyork
Copy link
Author

@ScrapCodes good suggestion on the jenkins output. Modified it to print a single echo statement if a download is needed and quieted or silenced the curl / wget output.

@brennonyork
Copy link
Author

@pwendell how is this looking from your end given the previous discussions through email threads?

@brennonyork
Copy link
Author

Bump @pwendell

# check if we have curl installed
# download application
[ ! -f "${local_tarball}" ] && [ -n "`which curl 2>/dev/null`" ] && \
echo "exec: curl -s ${remote_tarball}" && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these curls show a status bar similar to our sbt download script?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the -s flag it would show the download progress, but, as @ScrapCodes mentioned, if we leave it in then it would pummel the Jenkins logs with said download progress. @pwendell your call if you'd rather see it in or not, just let me know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - I think @nchammas removed these progress bars in 19f61c1 for some other cases. I would actually like to have the progress bars because otherwise it's very confusing what is happening when developing locally. However, we can check whether AMPLAB_JENKINS is set, and not show them in that case. Similar to here:

if [ -n "$AMPLAB_JENKINS" ]; then

If you wanted to fix the other instance as well @brennonyork, that would be great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I removed them to clear up the Jenkins logs. It would probably be
nicer as Patrick suggested if the bars were suppressed only when Jenkins is
executing the build.
On 2014년 12월 22일 (월) at 오후 5:32 Patrick Wendell notifications@github.com
wrote:

In build/mvn
#3707 (diff):

+# Installs any application tarball given a URL, the expected tarball name,
+# and, optionally, a checkable binary path to determine if the binary has
+# already been installed
+## Arg1 - URL
+## Arg2 - Tarball Name
+## Arg3 - Checkable Binary
+install_app() {

  • local remote_tarball="$1/$2"
  • local local_tarball="${_DIR}/$2"
  • local binary="${_DIR}/$3"
  • if [ -z "$3" -o ! -f "$binary" ]; then
  • check if we already have the tarball

  • check if we have curl installed

  • download application

  • [ ! -f "${local_tarball}" ] && [ -n "which curl 2>/dev/null" ] && \
  •  echo "exec: curl -s ${remote_tarball}" && \
    

I see - I think @nchammas https://github.com/nchammas removed these
progress bars in 19f61c1
19f61c1
for some other cases. I would actually like to have the progress bars
because otherwise it's very confusing what is happening when developing
locally. However, we can check whether AMPLAB_JENKINS is set, and not
show them in that case. Similar to here:

if [ -n "$AMPLAB_JENKINS" ]; then

If you wanted to fix the other instance as well @brennonyork
https://github.com/brennonyork, that would be great.


Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/3707/files#r22194686.

@pwendell
Copy link
Contributor

Hey I took a pass with some comments. Thanks for doing this, it's looking great and I think it will speed up development a lot for people who don't want to bother themselves with installing zinc.

@pwendell
Copy link
Contributor

Also one other thing - could you move the "sbt/sbt" scripts to the "build" folder, but write a small script that just invokes "build/sbt" with a notice saying that it has moved? Second, could you update the relevant docs to reflect both the sbt change (the old sbt/sbt path probably has references to it peppered throughout Spark) and the new build/mvn tool? I would say right at the top of our doc that we now bundle a correctly configured maven installation (i.e. in docs/building-spark.md). Also, looking at that doc reminds me that we should add the following in the script:

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

If MAVEN_OPTS is unset.

local binary="${_DIR}/$3"

# setup `curl` and `wget` silent options if we're running on Jenkins
local curl_opts=""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need to pass --progress-bar to curl and --progress=bar to wget. At least this is what our old code did.

Brennon York added 2 commits December 23, 2014 10:43
…ess bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
@brennonyork
Copy link
Author

@pwendell updated the script per the discussion, updated all places where sbt/sbt was referenced and replaced it with build/sbt, added the sbt/sbt stub, and created a blurb in the docs about how to build Spark with build/sbt. Check it out the revision and let me know what you think!

@JoshRosen
Copy link
Contributor

It looks like there's a merge conflict in dev/scalastyle; mind fixing that?


To run test suites of a specific sub project as follows:

sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test
build/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver core/test

# Speeding up Compilation with Zinc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could mention build/zinc under this heading, and maybe mention something about how / when it's auto-started?

@JoshRosen
Copy link
Contributor

I'm testing this out on EC2 right now and it's pretty nice, especially since it's kind of a pain to install Maven since it's not in any of the default EC2 package manager repositories.

@JoshRosen
Copy link
Contributor

Thanks for updating! It looks like this is still merge-conflicted for some reason, but we can probably resolve this ourselves on merge.

This looks good to me. @pwendell, are you good to go on this? Also, do you think we should get Jenkins to run this?

# the build/ folder
install_scala() {
# determine the Scala version used in Spark
local scala_version=`grep "scala.version" "${_DIR}/../pom.xml" | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be overkill to use a command-line utility like xmlstarlet to select this field? I'm worried that seemingly-innocuous changes to the POM might break the Maven build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would probably be less brittle, but I guess it would introduce another dependency which we'd have to install in this script (since we want it to be a one-click installer). Since xmlstarlet binaries aren't portable, the installation logic might be complex since we'd need to have some platform-specific logic to download the right binaries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had originally used python to grab this out in a more concrete way, but @pwendell recommended we remove as many dependencies as possible so I switched over to the grep | cut scenario. Let me think it over and I might be able to find bash-only way to assuredly grab the correct version. That said, we could always add an entry to project/build.properties (e.g. scala.version=2.10.4) and confidently pull it from there without all this hassle. What do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'd lean towards just leaving it as-is. Doesn't seem to be worth the hassle when compared against the requirement to reduce complexity and dependencies.

@pwendell
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Dec 26, 2014

Test build #24840 has started for PR 3707 at commit d2d41b6.

  • This patch does not merge cleanly.

[[ -f "$sbt_opts_file" ]] && set -- $(loadConfigFile "$sbt_opts_file") "$@"

run "$@"
${_DIR}/../build/sbt $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to quote the "$@" to preserve argument arrays?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does, great catch! I'll be back to a computer in a few hours and can commit the changes, but, if you want to merge before that, feel free to update. Thanks!

-----Original Message-----
From: Patrick Wendell [notifications@github.commailto:notifications@github.com]
Sent: Friday, December 26, 2014 04:13 PM Eastern Standard Time
To: apache/spark
Cc: York, Brennon
Subject: Re: [spark] [SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac (#3707)

In sbt/sbthttps://github.com//pull/3707#discussion-diff-22288516:

- done

  • Now, ensure sbt version is used.

  • [[ "${sbt_version}XXX" != "XXX" ]] && addJava "-Dsbt.version=$sbt_version"
    -}

-loadConfigFile() {

  • cat "$1" | sed '/^#/d'
    -}

-# if sbtopts files exist, prepend their contents to $@ so it can be processed by this runner
-[[ -f "$etc_sbt_opts_file" ]] && set -- $(loadConfigFile "$etc_sbt_opts_file") "$@"
-[[ -f "$sbt_opts_file" ]] && set -- $(loadConfigFile "$sbt_opts_file") "$@"

-run "$@"
+${_DIR}/../build/sbt $@

Does this need to quote the "$@" to preserve argument arrays?


Reply to this email directly or view it on GitHubhttps://github.com//pull/3707/files#r22288516.


The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

@pwendell
Copy link
Contributor

One small correctness question (around quoting) but looks good to me. I can merge this later today and fix it manually if @brennonyork doesn't get around to it.

@nchammas
Copy link
Contributor

@brennonyork Does this handle relative paths passed to Maven correctly (if that's a valid potential use case)? We had this problem with the spark-ec2 script which was caused by the script changing the working directory on the user.

@brennonyork
Copy link
Author

This will handle relative directories just fine. The last portion of this script changes the directory back to the cwd where the user was calling from so this isn't an issue :)

-----Original Message-----
From: Nicholas Chammas [notifications@github.commailto:notifications@github.com]
Sent: Friday, December 26, 2014 04:30 PM Eastern Standard Time
To: apache/spark
Cc: York, Brennon
Subject: Re: [spark] [SPARK-4501][Core] - Create build/mvn to automatically download maven/zinc/scalac (#3707)

@brennonyorkhttps://github.com/brennonyork Does this handle relative paths passed to Maven correctly (if that's a valid potential use case)? We had this problem with the spark-ec2 script which was caused by the script changing the working directory on the userhttps://github.com//pull/2988.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3707#issuecomment-68158971.


The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

@SparkQA
Copy link

SparkQA commented Dec 26, 2014

Test build #24840 has finished for PR 3707 at commit d2d41b6.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24840/
Test FAILed.

…/sbt, fixed bug where relative paths would fail if passed in from build/mvn
@brennonyork
Copy link
Author

@nchammas I spoke too soon earlier regarding it correctly handling relative paths. I fixed it and is now pwd-preserving. @pwendell I also fixed the improper quoting issue in sbt/sbt.

@pwendell
Copy link
Contributor

Jenkins, test this please. LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Dec 27, 2014

Test build #24842 has started for PR 3707 at commit 0e5a0e4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 27, 2014

Test build #24842 has finished for PR 3707 at commit 0e5a0e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24842/
Test PASSed.

@nchammas
Copy link
Contributor

@nchammas I spoke too soon earlier regarding it correctly handling relative paths. I fixed it and is now pwd-preserving.

👍

On average, it would be better to avoid changing the working directly at all when possible, or change and restore it using pushd and popd as opposed to cd. That makes it less likely that future changes to the script will introduce errors related to path handling.

For now, though, I think this is fine.

@asfgit asfgit closed this in a3e51cc Dec 27, 2014
@JoshRosen
Copy link
Contributor

@witgo commented on the actual commit: a3e51cc#commitcomment-9101060

It seems that every time you run ./build/mvn had to re-download scala-2.10.4.tgz

Can you investigate?

@brennonyork
Copy link
Author

@JoshRosen I can't reproduce this error and, after looking through the code, I'm not seeing where an issue like that could crop up :/ @witgo could you help me understand when you're seeing this and provide me the output of bash -x ./build/mvn clean? With that I can much better understand how to fix this.

@witgo
Copy link
Contributor

witgo commented Dec 30, 2014

@brennonyork Sorry, I tried many times, could not reproduce the issue.

@brennonyork
Copy link
Author

@witgo just for clarity, does this mean you aren't seeing this issue anymore? Want to ensure you aren't having any more troubles! :)

@witgo
Copy link
Contributor

witgo commented Dec 31, 2014

Seems to be the issue caused by the command line is killed.
But,I tested both in the mac os x and centos, and could not reproduce the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants