Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1637: Clean up examples for 1.0 #571

Closed
wants to merge 4 commits into from

Conversation

techaddict
Copy link
Contributor

  • Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
  • Move Python examples into examples/src/main/python
  • Update docs to reflect these changes

….apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
@@ -22,10 +22,10 @@ import org.apache.spark.SparkContext._
import org.apache.spark.util.collection.OpenHashMap
import scala.collection.JavaConversions.mapAsScalaMap

private[streaming]
private[spark]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, needed by RawNetworkGrep and KafkaWordCount. There has to be better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use any private methods in examples. The usage of RawTextHelper in KafkaWordCount is very simple. warmUp is used in RawNetworkGrep, is it necessary? @tdas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the RawTextHelper was essentially the way to provide data for running benchmarks. However, I havent quite maintained it, so that example code has been languishing in unknown state. I am inclined to keep it there as it is, as long as it hidden behind private[*]. Though I dont see the logic behind changing it from private[streaming] to private[spark]

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@techaddict
Copy link
Contributor Author

@mateiz @rxin We already have KMeans, LogisticRegressionWithSGD and other implementation examples as main functions in their respective objects.

@mateiz
Copy link
Contributor

mateiz commented Apr 28, 2014

Jenkins, add to whitelist and test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14539/

@mengxr
Copy link
Contributor

mengxr commented Apr 28, 2014

@techaddict FYI, I moved main methods to examples/mllib in #584

@techaddict
Copy link
Contributor Author

@mateiz @rxin so this is complete then ?

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2014

One question: is org.apache.spark.examples a good place for examples? Examples here have access to private[spark] classes. If we are not careful, users may have trouble compiling their own apps if they use the examples as templates to build their own.

@techaddict
Copy link
Contributor Author

@mengxr ya exactly, thats why i commented on the private[spark] change. I think better way would be to make the examples use there own methods, in case of private methods.

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2014

If Apache allows to use a name space not starting with org.apache, I would recommend using spark.examples instead of org.apache.spark.examples. It also saves some typing with spark-submit.

@pwendell
Copy link
Contributor

@mengxr I'm pretty sure we are only allowed to release classes in the org.apache namespace (unfortunately).

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2014

I know this is a little weird, but either org.apache.examples.spark or org.apache.spark_examples can help prevent using private[spark] classes.

@techaddict
Copy link
Contributor Author

@mengxr IMO we should re-implement the necessary private functions in the examples, they are not too many just 2-3 methods. It will make using/modifying the examples easier for users.

@tdas
Copy link
Contributor

tdas commented Apr 29, 2014

I have a general question about what should file and package organization of the examples be. For example, for Spark Streaming, is org.apache.spark.examples.streaming.* better than org.apache.spark.streaming.examples.* ? I kind of feel that the former makes mores sense, as it fits better when we refer to them as "spark streaming examples", or "graphx examples".

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2014

@tdas I'm a little confused. Which one do you prefer, examples.streaming or streaming.examples?

@tdas
Copy link
Contributor

tdas commented Apr 29, 2014

Oops, sorry for the confusion. I meant "latter" instead of "former". That is, I prefer org.apache.spark.streaming.examples over org.apache.spark.examples.streaming., because

  1. Its natural to refer to them as "spark streaming examples"
  2. Consistent with rest of the API, where all the streaming related stuff, library and examples alike, is in org.apache.spark.streaming

@tdas
Copy link
Contributor

tdas commented Apr 29, 2014

Bringing @marmbrus into this discussion as he had added the Spark SQL examples in org.apache.spark.sql.examples.

@marmbrus
Copy link
Contributor

No strong opinions here other than we should be consistent.

@tdas
Copy link
Contributor

tdas commented Apr 29, 2014

Okay. As I said earlier, I find it more consistent to keep it as "org.apache.spark.streaming.examples.*"

@techaddict
Copy link
Contributor Author

@mengxr so what changes should i make other than streaming one suggested by tdas.

@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2014

@techaddict I don't see anything necessary except the one using private[spark] or private[streaming] methods. For consistency, org.apache.examples may be better because many examples are already under this package name.

@techaddict
Copy link
Contributor Author

@mengxr what do you suggest how should we resolve the private[spark] problem ? I'm short on idea.

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2014

I would say using org.apache.spark.examples for now, which requires little code change. Then we try to avoid using private[spark] in examples. Maybe there is an automatic way to detect it.

@techaddict
Copy link
Contributor Author

@mengxr @47ef86c392badc58052a0414115e49c2970b31eb looks good ?

@AmplabJenkins
Copy link

Build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14582/

@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2014

@techaddict It looks good to me.

@techaddict
Copy link
Contributor Author

@mateiz ready for merging

@tdas
Copy link
Contributor

tdas commented May 1, 2014

No, I have strong feelings against "org.apache.spark.examples.streaming.*" This is entirely inconsistent with rest of the API.

@techaddict
Copy link
Contributor Author

@tdas ok I'll move streaming examples to "org.apache.spark.streaming.examples.*" as you suggest. Any other changes @mateiz

@mateiz
Copy link
Contributor

mateiz commented May 6, 2014

@tdas @techaddict let's not move them IMO. There are two reasons: First, when you browse the folder tree, you want to get into spark/examples and immediately see subfolders, instead of wondering "should I go into spark/examples or spark/streaming or spark/mllib or whatever". Second, it makes it less likely for us to use private[streaming] APIs in there.

@mateiz
Copy link
Contributor

mateiz commented May 6, 2014

BTW the other point of this was to make them consistent across packages, and MLlib and all the Python examples are under examples/ already.

@techaddict
Copy link
Contributor Author

@mateiz So this is good to merge. ?
On May 7, 2014 1:13 AM, "Matei Zaharia" notifications@github.com wrote:

BTW the other point of this was to make them consistent across packages,
and MLlib and all the Python examples are under examples/ already.


Reply to this email directly or view it on GitHub.

@tdas
Copy link
Contributor

tdas commented May 6, 2014

@mateiz I dont think that decision is that big a deal. Its fairly obvious if one sees a streaming folder inside spark/ that those are streaming examples. It feels more inconsistent to have spark.streaming.XXX suddenly become spark.XXX.streaming when everything else is spark.streaming.XXX

@mateiz
Copy link
Contributor

mateiz commented May 7, 2014

Talked with TD offline and he was okay with changing the package name, so I've merged this in. Thanks Sandeep!

techaddict added a commit that referenced this pull request May 7, 2014
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- [x] Move Python examples into examples/src/main/python
- [x] Update docs to reflect these changes

Author: Sandeep <sandeep@techaddict.me>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #571 from techaddict/SPARK-1637 and squashes the following commits:

47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
5f96121 [Sandeep] Move Python examples into examples/src/main/python
0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
techaddict added a commit that referenced this pull request May 7, 2014
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- [x] Move Python examples into examples/src/main/python
- [x] Update docs to reflect these changes

Author: Sandeep <sandeep@techaddict.me>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #571 from techaddict/SPARK-1637 and squashes the following commits:

47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
5f96121 [Sandeep] Move Python examples into examples/src/main/python
0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)

(cherry picked from commit a000b5c)
Signed-off-by: Matei Zaharia <matei@databricks.com>
@techaddict techaddict closed this May 7, 2014
techaddict added a commit to techaddict/spark that referenced this pull request May 7, 2014
…e PR apache#571 was last updated.

This resulted in Compilation Errors.
asfgit pushed a commit that referenced this pull request May 7, 2014
…e PR #571 was last updated.

This resulted in Compilation Errors.
cc @mateiz project not compiling currently.

Author: Sandeep <sandeep@techaddict.me>

Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits:

b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors.
asfgit pushed a commit that referenced this pull request May 7, 2014
…e PR #571 was last updated.

This resulted in Compilation Errors.
cc @mateiz project not compiling currently.

Author: Sandeep <sandeep@techaddict.me>

Closes #673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits:

b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors.
(cherry picked from commit fdae095)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
pwendell pushed a commit to pwendell/spark that referenced this pull request May 12, 2014
SPARK-1072 Use binary search when needed in RangePartioner

Author: Holden Karau <holden@pigscanfly.ca>

Closes apache#571 and squashes the following commits:

f31a2e1 [Holden Karau] Swith to using CollectionsUtils in Partitioner
4c7a0c3 [Holden Karau] Add CollectionsUtil as suggested by aarondav
7099962 [Holden Karau] Add the binary search to only init once
1bef01d [Holden Karau] CR feedback
a21e097 [Holden Karau] Use binary search if we have more than 1000 elements inside of RangePartitioner
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- [x] Move Python examples into examples/src/main/python
- [x] Update docs to reflect these changes

Author: Sandeep <sandeep@techaddict.me>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes apache#571 from techaddict/SPARK-1637 and squashes the following commits:

47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
5f96121 [Sandeep] Move Python examples into examples/src/main/python
0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…e PR apache#571 was last updated.

This resulted in Compilation Errors.
cc @mateiz project not compiling currently.

Author: Sandeep <sandeep@techaddict.me>

Closes apache#673 from techaddict/SPARK-1637-HOTFIX and squashes the following commits:

b512f4f [Sandeep] [SPARK-1637][HOTFIX] There are some Streaming examples added after the PR apache#571 was last updated. This resulted in Compilation Errors.
meisam pushed a commit to meisam/spark that referenced this pull request Sep 24, 2014
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
The new role "install-docker" install docker-ce by using get.docker.com
script, it support multiple OS and arch, and keep consistent docker-ce
version, mark role "install-docker-ce" as deprecated, and apply new role
in all of docker-machine jobs. Update tag of docker-machine job from
latest-release to 18.06 in order to avoid confusions.

Related-Bug: theopenlab/openlab#308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants