Skip to content

General cleanup of Scala Similarity and Drivers#76

Closed
pferrel wants to merge 27 commits intoapache:masterfrom
pferrel:remove-pair
Closed

General cleanup of Scala Similarity and Drivers#76
pferrel wants to merge 27 commits intoapache:masterfrom
pferrel:remove-pair

Conversation

@pferrel
Copy link
Copy Markdown
Contributor

@pferrel pferrel commented Feb 18, 2015

Many small cleanup changes:

  • simplified drivers
  • removed any reference to cross-indicator and most references to indicator
  • shortened long lines
  • cleaned up comments and scaladoc annotations
  • replaced old use of o.a.m.math.Pair with Scala tuples

Tested spark-itemsimilarity on cluster but not naive Bayes stuff

Decided not to remove scopt. Removing it would be more trouble than it's worth given how small the lib is. It still may be worth using case classes for options to get rid of verbose casts but doesn't seem pressing.

This PR doesn't touch the job.jar assembly in the spark module. I have a pare down of that waiting for other refactoring commits from @dlyubimov.

No other work planned on this PR

Comment thread spark/pom.xml Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of canonical relative paths is IMO preferred.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't understand why canonical path is replaced with non-canonical path here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. I don't remember doing anything with that, I'll look.

@andrewpalumbo
Copy link
Copy Markdown
Member

Tested TrainNBDriver and TestNBDriver from the command line with no problems. I need to do some cleanup of the rest of the Naive Bayes classes as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An accidental replacement of "indicator" with "similarity" here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought I got those, thanks

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 19, 2015

As soon as this looks ok, I'd like to work on the getting 1.2.1 in next so let me know

@andrewpalumbo
Copy link
Copy Markdown
Member

The only thing that stands out to me is the move to scaladoc style doc comments (away from the spark/java style doc comments). I'm not sure if it matters, but as Dmitriy mentioned above, we should really pick one style to go with. Spark style seems to be the de facto convention. It would make it easier to point people to the Spark style guide as discussed on the dev list.

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 19, 2015

I don't recall a decision on that or I wouldn't have set my line width to 120. If we are going to be that exacting let's make sure everyone agrees and I'll change this later if necessary. I need to get Spark 1.2.1 in this week if possible

@andrewpalumbo
Copy link
Copy Markdown
Member

I don't think we really made any hard decision on it- just discussed it a bit.. I don't see any reason not to commit and revise later after we decide.

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 19, 2015

Let me see what I can do. It should be easy to change the scaladoc comments, it's just the first line. 100 chars will be harder. Would be nice to get it over with.

@dlyubimov
Copy link
Copy Markdown
Contributor

i'd suggest to be ok with 120 characters. at least our java style is 120-line-long compliant.

Spark people were also ok with 120 characters in my contributions (in fact, initially 100 character constraint applied only to comment style, not the code there. For reasons nobody truly knows).

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 19, 2015

That will make it much easier, ok Andrew?

@andrewpalumbo
Copy link
Copy Markdown
Member

I was only referring to the comment style ... not the 100 chars limit.

@andrewpalumbo
Copy link
Copy Markdown
Member

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 19, 2015

So something like this?

  • javadoc style documentation comments like Spark
  • For Scala follow: http://docs.scala-lang.org/style/
  • We are relaxing the required use of infix for all appropriate method calls and encouraging
    “dot” notation. Spark goes further to say you must not use infix unless it’s an operator
    method.
  • Line lengths not to exceed 120 chars—Spark limits to 100

I'm all for using infix as little as possible. It really does make
scala unnecessarily difficult to read for those coming from other
languages.

I think we're really only using using infix (aside from operators) in
tests where it seems useful and relatively straightforward. eg:

 InCore(dslCat2, 3) - aggInCore(dslCat2, 3) should be < epsilon

BTW I like being able to use infix for little DSLs even though they don't use operators they are sometimes easier to read--like the test ones.

@dlyubimov
Copy link
Copy Markdown
Contributor

+1. General notion is that custom dsls are big consumers of infix. Indeed.
but OOA should not be one.

On Thu, Feb 19, 2015 at 3:53 PM, Pat Ferrel notifications@github.com
wrote:

So something like this?

  • javadoc style documentation comments like Spark
  • For Scala follow: http://docs.scala-lang.org/style/
  • We are relaxing the required use of infix for all appropriate
    method calls and encouraging “dot” notation. Spark goes further to say you
    must not use infix unless it’s an operator method.
  • Line lengths not to exceed 120 chars—Spark limits to 100

I'm all for using infix as little as possible. It really does make
scala unnecessarily difficult to read for those coming from other
languages.

I think we're really only using using infix (aside from operators) in
tests where it seems useful and relatively straightforward. eg:

InCore(dslCat2, 3) - aggInCore(dslCat2, 3) should be < epsilon


Reply to this email directly or view it on GitHub
#76 (comment).

@andrewpalumbo
Copy link
Copy Markdown
Member

Yes- that's what I was thinking. On the list there is also a link to the Spark Code Style giude:

https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

and I was suggesting that we use the (javadoc) comment style from that rather than the scaladoc style. Since it seems to be what a lot of projects are adopting and what we presently use in our codebase.

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 20, 2015

OOA?

@andrewpalumbo
Copy link
Copy Markdown
Member

oh sorry i didn't see that you had "javadoc style documentation comments like Spark" in that list.

@andrewpalumbo
Copy link
Copy Markdown
Member

+1 on doc commit.

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 20, 2015

OK, well thanks. Hopefully this cleans my desk up a bit.

@andrewpalumbo
Copy link
Copy Markdown
Member

BTW I like being able to use infix for little DSLs even though they don't use operators they are sometimes easier to read--like the test ones.

agreed

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Feb 27, 2015

Since 1.2.1 was pushed I had to merge those changes but they break execution of spark-itemsimilarity due to HashBiMap class missing at deserialization.

This is a known bug in Spark 1.2.x with at work around that I am trying.

@pferrel
Copy link
Copy Markdown
Contributor Author

pferrel commented Mar 2, 2015

Seems like we should avoid 1.2 and 1.2.1 though we could tell someone how to make it work and only item and rowsimilairty are broken anyway.

So I'm testing this with Spark 1.1.1 and will push that for now. All that is required to upgrade to 1.2 or 1.2.1 would be the root pom Spark version change and build from source

@pferrel pferrel mentioned this pull request Mar 4, 2015
@pferrel pferrel closed this Mar 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants