General cleanup of Scala Similarity and Drivers by pferrel · Pull Request #76 · apache/mahout

pferrel · 2015-02-18T19:39:37Z

Many small cleanup changes:

simplified drivers
removed any reference to cross-indicator and most references to indicator
shortened long lines
cleaned up comments and scaladoc annotations
replaced old use of o.a.m.math.Pair with Scala tuples

Tested spark-itemsimilarity on cluster but not naive Bayes stuff

Decided not to remove scopt. Removing it would be more trouble than it's worth given how small the lib is. It still may be worth using case classes for options to get rid of verbose casts but doesn't seem pressing.

This PR doesn't touch the job.jar assembly in the spark module. I have a pare down of that waiting for other refactoring commits from @dlyubimov.

No other work planned on this PR

… config in every driver

…ly untouched

dlyubimov · 2015-02-18T20:28:31Z

use of canonical relative paths is IMO preferred.

i still don't understand why canonical path is replaced with non-canonical path here.

No idea. I don't remember doing anything with that, I'll look.

andrewpalumbo · 2015-02-19T00:19:21Z

Tested TrainNBDriver and TestNBDriver from the command line with no problems. I need to do some cleanup of the rest of the Naive Bayes classes as well.

andrewpalumbo · 2015-02-19T00:21:59Z

An accidental replacement of "indicator" with "similarity" here?

thought I got those, thanks

pferrel · 2015-02-19T18:54:22Z

As soon as this looks ok, I'd like to work on the getting 1.2.1 in next so let me know

andrewpalumbo · 2015-02-19T21:39:18Z

The only thing that stands out to me is the move to scaladoc style doc comments (away from the spark/java style doc comments). I'm not sure if it matters, but as Dmitriy mentioned above, we should really pick one style to go with. Spark style seems to be the de facto convention. It would make it easier to point people to the Spark style guide as discussed on the dev list.

pferrel · 2015-02-19T22:26:10Z

I don't recall a decision on that or I wouldn't have set my line width to 120. If we are going to be that exacting let's make sure everyone agrees and I'll change this later if necessary. I need to get Spark 1.2.1 in this week if possible

andrewpalumbo · 2015-02-19T22:34:45Z

I don't think we really made any hard decision on it- just discussed it a bit.. I don't see any reason not to commit and revise later after we decide.

pferrel · 2015-02-19T23:26:08Z

Let me see what I can do. It should be easy to change the scaladoc comments, it's just the first line. 100 chars will be harder. Would be nice to get it over with.

dlyubimov · 2015-02-19T23:30:55Z

i'd suggest to be ok with 120 characters. at least our java style is 120-line-long compliant.

Spark people were also ok with 120 characters in my contributions (in fact, initially 100 character constraint applied only to comment style, not the code there. For reasons nobody truly knows).

pferrel · 2015-02-19T23:33:15Z

That will make it much easier, ok Andrew?

andrewpalumbo · 2015-02-19T23:33:28Z

I was only referring to the comment style ... not the 100 chars limit.

andrewpalumbo · 2015-02-19T23:41:04Z

Basically what we were talking about on this thread:

http://mail-archives.apache.org/mod_mbox/mahout-dev/201501.mbox/%3CBLU436-SMTP226CD60AFFD7C2AE452278694320%40phx.gbl%3E

pferrel · 2015-02-19T23:53:02Z

So something like this?

javadoc style documentation comments like Spark
For Scala follow: http://docs.scala-lang.org/style/
We are relaxing the required use of infix for all appropriate method calls and encouraging
“dot” notation. Spark goes further to say you must not use infix unless it’s an operator
method.
Line lengths not to exceed 120 chars—Spark limits to 100

I'm all for using infix as little as possible. It really does make
scala unnecessarily difficult to read for those coming from other
languages.

I think we're really only using using infix (aside from operators) in
tests where it seems useful and relatively straightforward. eg:

 InCore(dslCat2, 3) - aggInCore(dslCat2, 3) should be < epsilon

BTW I like being able to use infix for little DSLs even though they don't use operators they are sometimes easier to read--like the test ones.

dlyubimov · 2015-02-19T23:59:08Z

+1. General notion is that custom dsls are big consumers of infix. Indeed.
but OOA should not be one.

On Thu, Feb 19, 2015 at 3:53 PM, Pat Ferrel notifications@github.com
wrote:

So something like this?

javadoc style documentation comments like Spark

For Scala follow: http://docs.scala-lang.org/style/

We are relaxing the required use of infix for all appropriate
method calls and encouraging “dot” notation. Spark goes further to say you
must not use infix unless it’s an operator method.

Line lengths not to exceed 120 chars—Spark limits to 100

I'm all for using infix as little as possible. It really does make
scala unnecessarily difficult to read for those coming from other
languages.

I think we're really only using using infix (aside from operators) in
tests where it seems useful and relatively straightforward. eg:

InCore(dslCat2, 3) - aggInCore(dslCat2, 3) should be < epsilon

—
Reply to this email directly or view it on GitHub
#76 (comment).

andrewpalumbo · 2015-02-20T00:01:05Z

Yes- that's what I was thinking. On the list there is also a link to the Spark Code Style giude:

https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

and I was suggesting that we use the (javadoc) comment style from that rather than the scaladoc style. Since it seems to be what a lot of projects are adopting and what we presently use in our codebase.

pferrel · 2015-02-20T00:01:43Z

OOA?

andrewpalumbo · 2015-02-20T00:05:01Z

oh sorry i didn't see that you had "javadoc style documentation comments like Spark" in that list.

andrewpalumbo · 2015-02-20T00:15:43Z

+1 on doc commit.

pferrel · 2015-02-20T00:17:57Z

OK, well thanks. Hopefully this cleans my desk up a bit.

andrewpalumbo · 2015-02-20T00:23:19Z

BTW I like being able to use infix for little DSLs even though they don't use operators they are sometimes easier to read--like the test ones.

agreed

…ent and worker

…n pare down further since not all guava is needed

pferrel · 2015-02-27T21:08:16Z

Since 1.2.1 was pushed I had to merge those changes but they break execution of spark-itemsimilarity due to HashBiMap class missing at deserialization.

This is a known bug in Spark 1.2.x with at work around that I am trying.

…driver can be used with the -D

pferrel · 2015-03-02T18:23:16Z

Seems like we should avoid 1.2 and 1.2.1 though we could tell someone how to make it work and only item and rowsimilairty are broken anyway.

So I'm testing this with Spark 1.1.1 and will push that for now. All that is required to upgrade to 1.2 or 1.2.1 would be the root pom Spark version change and build from source

…und but is not in the pom.

apache#74

pferrel added 14 commits December 21, 2014 09:24

NOJIRA simplified driver api, moved base kryo config in to base class

2edadab

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/mahout

18a931a

cleaning up driver api a bit to not repeat default kryo and SparkConf…

449ade5

… config in every driver

leaving the job.jar assembly alone for now see MAHOUT-1636

b690a38

simplified driver, made changes to all drivers, note: left job assemb…

e625264

…ly untouched

Merge branch 'master' into nbfix

1104238

lots of clean up, comment updates, scaladoc annotations, etc

11a0001

minor reformat

0ddd6e9

cleaning up comments

0bc07c1

took out what I hope are unneeded imports

7fc7a54

trivial error in comment fixed

d0a5a86

filled in missing scaladoc annotations

0ed9f04

indentation wrong in NBTestDriver fixed

5e488e7

more comment cleanup

b181080

dlyubimov reviewed Feb 18, 2015
View reviewed changes

andrewpalumbo reviewed Feb 19, 2015
View reviewed changes

pferrel added 2 commits February 19, 2015 09:53

more formatting cleanup

2944032

comment rewording

e69a0a7

pferrel added 2 commits February 19, 2015 11:04

comment reformatting

7d0ce29

canonical path used in spark/pom

d964fb8

changed to Spark/javadoc style doc comments

df7b33b

pferrel added 4 commits February 21, 2015 08:40

comment tweaks

564fa53

Merge branch 'master' into remove-pair

edefd8c

creating a dependency-reduced jar for just things needed in spark cli…

33e36d3

…ent and worker

tinkering with a dependency-reduced jar with only scopt and guava, ca…

5e06d8f

…n pare down further since not all guava is needed

leaving 1.2.1 in since there is a work around for most cases and the …

26ff90c

…driver can be used with the -D

pferrel added 3 commits March 2, 2015 11:13

rolled back to Spark 1.1.1, 1.2.1 will work with a SparkConf work aro…

17cfac8

…und but is not in the pom.

added master changes for PR compare page

d126cab

fixed a couple things that were lost in a merge with the Spark 1.2.1 PR

dee0f8e

apache#74

pferrel mentioned this pull request Mar 4, 2015

MAHOUT-1636 #69

Closed

pferrel closed this Mar 4, 2015

Conversation

pferrel commented Feb 18, 2015

Uh oh!

dlyubimov Feb 18, 2015

Choose a reason for hiding this comment

Uh oh!

dlyubimov Feb 19, 2015

Choose a reason for hiding this comment

Uh oh!

pferrel Feb 19, 2015

Choose a reason for hiding this comment

Uh oh!

andrewpalumbo commented Feb 19, 2015

Uh oh!

andrewpalumbo Feb 19, 2015

Choose a reason for hiding this comment

Uh oh!

pferrel Feb 19, 2015

Choose a reason for hiding this comment

Uh oh!

pferrel commented Feb 19, 2015

Uh oh!

andrewpalumbo commented Feb 19, 2015

Uh oh!

pferrel commented Feb 19, 2015

Uh oh!

andrewpalumbo commented Feb 19, 2015

Uh oh!

pferrel commented Feb 19, 2015

Uh oh!

dlyubimov commented Feb 19, 2015

Uh oh!

pferrel commented Feb 19, 2015

Uh oh!

andrewpalumbo commented Feb 19, 2015

Uh oh!

andrewpalumbo commented Feb 19, 2015

Uh oh!

pferrel commented Feb 19, 2015

Uh oh!

dlyubimov commented Feb 19, 2015

Uh oh!

andrewpalumbo commented Feb 20, 2015

Uh oh!

pferrel commented Feb 20, 2015

Uh oh!

andrewpalumbo commented Feb 20, 2015

Uh oh!

andrewpalumbo commented Feb 20, 2015

Uh oh!

pferrel commented Feb 20, 2015

Uh oh!

andrewpalumbo commented Feb 20, 2015

Uh oh!

pferrel commented Feb 27, 2015

Uh oh!

pferrel commented Mar 2, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants