-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7212][MLlib]Add sequence learning flag #6997
Conversation
@@ -62,13 +62,14 @@ class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) ex | |||
@Experimental | |||
class FPGrowth private ( | |||
private var minSupport: Double, | |||
private var numPartitions: Int) extends Logging with Serializable { | |||
private var numPartitions: Int, | |||
private var mineSequences: Boolean) extends Logging with Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
itemsets
and sequences
are correct. The issue is that we have freqItemsets
in the model, no matter whether it stores unordered sets or sequences. This may be confusing to users. I recommend renaming this variable to something like ordered
or preservesOrdering
. cc @jkbradley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, it would be nice to pass this information to the model. So users know wether the ordering is preserved or not from the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #35722 has finished for PR 6997 at commit
|
Test build #35761 has finished for PR 6997 at commit
|
Test build #35762 has finished for PR 6997 at commit
|
* @tparam Item item type | ||
*/ | ||
class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable { | ||
class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a break change. Please create an auxiliary constructor with the original signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAVER - http://www.naver.com/
sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-7212][MLlib]Add sequence learning flag (#6997)> 이 다음과 같은 이유로 전송 실패했습니다.
받는 사람이 회원님의 메일을 수신차단 하였습니다.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Test build #35828 has finished for PR 6997 at commit
|
@@ -155,7 +165,7 @@ class FPGrowth private ( | |||
.flatMap { case (part, tree) => | |||
tree.extract(minCount, x => partitioner.getPartition(x) == part) | |||
}.map { case (ranks, count) => | |||
new FreqItemset(ranks.map(i => freqItems(i)).toArray, count) | |||
new FreqItemset(ranks.map(i => freqItems(i)).reverse.toArray, count, ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you updated the ordering, we need to update Python doctest. See the Jenkins build log: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35828/consoleFull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Test build #35885 has finished for PR 6997 at commit
|
LGTM. Merged into master. Thanks! |
Support mining of ordered frequent item sequences.