Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARKR][PYSPARK] Fix R source package name to match Spark version. Remove pip tar.gz from distribution #16221

Closed

Conversation

shivaram
Copy link
Contributor

@shivaram shivaram commented Dec 8, 2016

What changes were proposed in this pull request?

Fixes name of R source package so that the cp in release-build.sh works correctly.

Issue discussed in #16014 (comment)

@shivaram
Copy link
Contributor Author

shivaram commented Dec 8, 2016

cc @felixcheung @rxin - I tested this locally by running ./dev/make-distribution.sh --name "hadoop-2.6" --tgz --r -Phadoop-2.6 -Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos

@felixcheung
Copy link
Member

felixcheung commented Dec 8, 2016 via email

@felixcheung
Copy link
Member

felixcheung commented Dec 8, 2016 via email

@shivaram
Copy link
Contributor Author

shivaram commented Dec 9, 2016

cc @holdenk @rxin - I also tried to fix the pip issue in this PR.

The main thing here is that we dont have PYSPARK_VERSION inside the make-distribution script so I just remove all pyspark*.tar.gz right now. Let me know if there is a better solution. I'm testing this locally now

@shivaram shivaram changed the title [SPARKR][SPARK-18590] Fix R source package name to match Spark version [SPARKR][PYSPARK] Fix R source package name to match Spark version. Remove pip tar.gz from distribution Dec 9, 2016
@shivaram
Copy link
Contributor Author

shivaram commented Dec 9, 2016

@holdenk this seems to work in my machine in that the ./python/dist/pyspark-2.1.1.dev0.tar.gz was removed from spark-2.1.1-SNAPSHOT-bin-hadoop-2.6.tgz that I built using the command[1].

One more question for you: Is it expected that the Spark dependency JARs are a part of the pip installable package ? i.e. when I look at the contents of say pyspark-2.1.0+hadoop2.7.tar.gz from [2], I find that it has all the Spark dependencies in pyspark-2.1.0+hadoop2.7/deps/jars/. I just wanted to check if that was the expected behavior

[1] ./dev/make-distribution.sh --name "hadoop-2.6" --tgz --pip --r -Phadoop-2.6 -Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos

[2]http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/pyspark-2.1.0+hadoop2.7.tar.gz

@holdenk
Copy link
Contributor

holdenk commented Dec 9, 2016 via email

@SparkQA
Copy link

SparkQA commented Dec 9, 2016

Test build #69889 has finished for PR 16221 at commit 8a349f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 9, 2016

Test build #69890 has finished for PR 16221 at commit c15916a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor Author

shivaram commented Dec 9, 2016

Thanks @holdenk - I'm going to merge this as this script isn't tested by Jenkins. I will manually test this by triggering a nightly build in branch-2.1

asfgit pushed a commit that referenced this pull request Dec 9, 2016
…emove pip tar.gz from distribution

## What changes were proposed in this pull request?

Fixes name of R source package so that the `cp` in release-build.sh works correctly.

Issue discussed in #16014 (comment)

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #16221 from shivaram/fix-sparkr-release-build-name.

(cherry picked from commit 4ac8b20)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 4ac8b20 Dec 9, 2016
@SparkQA
Copy link

SparkQA commented Dec 9, 2016

Test build #69896 has finished for PR 16221 at commit 74d779b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 9, 2016

Test build #69897 has finished for PR 16221 at commit 9b97765.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor Author

shivaram commented Dec 9, 2016

FYI the pip issue is fixed as you can see in the nightly build at http://people.apache.org/~pwendell/spark-nightly/spark-branch-2.1-bin/spark-2.1.1-SNAPSHOT-2016_12_08_18_31-ef5646b-bin/ --

spark-2.1.1-SNAPSHOT-bin-hadoop2.7.tgz	2016-12-09 02:52	187M	

Further the SparkR build was successful [1] but we are right now missing a line to copy the source archive with FTP - I am sending a PR for that

[1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-package/7/console

@felixcheung
Copy link
Member

I tested a source package with a different version in filename vs DESCRIPTION and it seems to be working fine.

robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
…emove pip tar.gz from distribution

## What changes were proposed in this pull request?

Fixes name of R source package so that the `cp` in release-build.sh works correctly.

Issue discussed in apache#16014 (comment)

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes apache#16221 from shivaram/fix-sparkr-release-build-name.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…emove pip tar.gz from distribution

## What changes were proposed in this pull request?

Fixes name of R source package so that the `cp` in release-build.sh works correctly.

Issue discussed in apache#16014 (comment)

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes apache#16221 from shivaram/fix-sparkr-release-build-name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants