General cleanup of Scala Similarity and Drivers#76
General cleanup of Scala Similarity and Drivers#76pferrel wants to merge 27 commits intoapache:masterfrom
Conversation
… config in every driver
There was a problem hiding this comment.
use of canonical relative paths is IMO preferred.
There was a problem hiding this comment.
i still don't understand why canonical path is replaced with non-canonical path here.
There was a problem hiding this comment.
No idea. I don't remember doing anything with that, I'll look.
|
Tested |
There was a problem hiding this comment.
An accidental replacement of "indicator" with "similarity" here?
There was a problem hiding this comment.
thought I got those, thanks
|
As soon as this looks ok, I'd like to work on the getting 1.2.1 in next so let me know |
|
The only thing that stands out to me is the move to scaladoc style doc comments (away from the spark/java style doc comments). I'm not sure if it matters, but as Dmitriy mentioned above, we should really pick one style to go with. Spark style seems to be the de facto convention. It would make it easier to point people to the Spark style guide as discussed on the dev list. |
|
I don't recall a decision on that or I wouldn't have set my line width to 120. If we are going to be that exacting let's make sure everyone agrees and I'll change this later if necessary. I need to get Spark 1.2.1 in this week if possible |
|
I don't think we really made any hard decision on it- just discussed it a bit.. I don't see any reason not to commit and revise later after we decide. |
|
Let me see what I can do. It should be easy to change the scaladoc comments, it's just the first line. 100 chars will be harder. Would be nice to get it over with. |
|
i'd suggest to be ok with 120 characters. at least our java style is 120-line-long compliant. Spark people were also ok with 120 characters in my contributions (in fact, initially 100 character constraint applied only to comment style, not the code there. For reasons nobody truly knows). |
|
That will make it much easier, ok Andrew? |
|
I was only referring to the comment style ... not the 100 chars limit. |
|
Basically what we were talking about on this thread: |
|
So something like this?
I'm all for using infix as little as possible. It really does make I think we're really only using using infix (aside from operators) in BTW I like being able to use infix for little DSLs even though they don't use operators they are sometimes easier to read--like the test ones. |
|
+1. General notion is that custom dsls are big consumers of infix. Indeed. On Thu, Feb 19, 2015 at 3:53 PM, Pat Ferrel notifications@github.com
|
|
Yes- that's what I was thinking. On the list there is also a link to the Spark Code Style giude: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide and I was suggesting that we use the (javadoc) comment style from that rather than the scaladoc style. Since it seems to be what a lot of projects are adopting and what we presently use in our codebase. |
|
OOA? |
|
oh sorry i didn't see that you had "javadoc style documentation comments like Spark" in that list. |
|
+1 on doc commit. |
|
OK, well thanks. Hopefully this cleans my desk up a bit. |
agreed |
…n pare down further since not all guava is needed
|
Since 1.2.1 was pushed I had to merge those changes but they break execution of spark-itemsimilarity due to HashBiMap class missing at deserialization. This is a known bug in Spark 1.2.x with at work around that I am trying. |
…driver can be used with the -D
|
Seems like we should avoid 1.2 and 1.2.1 though we could tell someone how to make it work and only item and rowsimilairty are broken anyway. So I'm testing this with Spark 1.1.1 and will push that for now. All that is required to upgrade to 1.2 or 1.2.1 would be the root pom Spark version change and build from source |
Many small cleanup changes:
Tested
spark-itemsimilarityon cluster but not naive Bayes stuffDecided not to remove scopt. Removing it would be more trouble than it's worth given how small the lib is. It still may be worth using case classes for options to get rid of verbose casts but doesn't seem pressing.
This PR doesn't touch the job.jar assembly in the spark module. I have a pare down of that waiting for other refactoring commits from @dlyubimov.
No other work planned on this PR