[SPARK-7212][MLlib]Add sequence learning flag #6997

feynmanliang · 2015-06-24T21:13:42Z

Support mining of ordered frequent item sequences.

mengxr · 2015-06-24T21:39:01Z

mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala

@@ -62,13 +62,14 @@ class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) ex
 @Experimental
 class FPGrowth private (
    private var minSupport: Double,
-    private var numPartitions: Int) extends Logging with Serializable {
+    private var numPartitions: Int,
+    private var mineSequences: Boolean) extends Logging with Serializable {


itemsets and sequences are correct. The issue is that we have freqItemsets in the model, no matter whether it stores unordered sets or sequences. This may be confusing to users. I recommend renaming this variable to something like ordered or preservesOrdering. cc @jkbradley

Btw, it would be nice to pass this information to the model. So users know wether the ordering is preserved or not from the model.

SparkQA · 2015-06-24T22:28:26Z

Test build #35722 has finished for PR 6997 at commit 648d4d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-25T06:23:55Z

Test build #35761 has finished for PR 6997 at commit f04bd50.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)
- s"Using output committer class $
- logInfo(s"Using user defined output committer class $
- s"Using output committer class $

SparkQA · 2015-06-25T06:29:00Z

Test build #35762 has finished for PR 6997 at commit 34ef8f2.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)

mengxr · 2015-06-25T15:19:30Z

mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala

   * @tparam Item item type
   */
-  class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable {
+  class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)


This is a break change. Please create an auxiliary constructor with the original signature.

NAVER - http://www.naver.com/

sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-7212][MLlib]Add sequence learning flag (#6997)> 이 다음과 같은 이유로 전송 실패했습니다.

받는 사람이 회원님의 메일을 수신차단 하였습니다.

SparkQA · 2015-06-26T06:41:54Z

Test build #35828 has finished for PR 6997 at commit ce987cb.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)

mengxr · 2015-06-26T14:08:42Z

mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala

@@ -155,7 +165,7 @@ class FPGrowth private (
    .flatMap { case (part, tree) =>
      tree.extract(minCount, x => partitioner.getPartition(x) == part)
    }.map { case (ranks, count) =>
-      new FreqItemset(ranks.map(i => freqItems(i)).toArray, count)
+      new FreqItemset(ranks.map(i => freqItems(i)).reverse.toArray, count, ordered)


Since you updated the ordering, we need to update Python doctest. See the Jenkins build log: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35828/consoleFull

SparkQA · 2015-06-26T22:41:46Z

Test build #35885 has finished for PR 6997 at commit 7c14e15.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)

mengxr · 2015-06-29T05:26:23Z

LGTM. Merged into master. Thanks!

Feynman Liang added 2 commits June 24, 2015 12:52

Add sequence learning flag

252a36a

Test case for frequent item sequences

648d4d4

mengxr reviewed Jun 24, 2015
View reviewed changes

Feynman Liang added 2 commits June 24, 2015 23:09

Naming, add ordered to FreqItemsets, test ordering using Seq

f04bd50

Fix failing test due to reverse orderering

34ef8f2

mengxr reviewed Jun 25, 2015
View reviewed changes

Backwards compatibility aux constructor

ce987cb

mengxr reviewed Jun 26, 2015
View reviewed changes

Feynman Liang added 2 commits June 26, 2015 13:35

Fix python test

0d3e4b6

Improve scalatests with R code and Seq

7c14e15

asfgit closed this in 25f574e Jun 29, 2015

feynmanliang deleted the fp-sequence branch August 17, 2015 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7212][MLlib]Add sequence learning flag #6997

[SPARK-7212][MLlib]Add sequence learning flag #6997

feynmanliang commented Jun 24, 2015

mengxr Jun 24, 2015

mengxr Jun 24, 2015

feynmanliang Jun 25, 2015

SparkQA commented Jun 24, 2015

SparkQA commented Jun 25, 2015

SparkQA commented Jun 25, 2015

mengxr Jun 25, 2015

sujkh85 Jun 25, 2015

feynmanliang Jun 26, 2015

SparkQA commented Jun 26, 2015

mengxr Jun 26, 2015

feynmanliang Jun 26, 2015

SparkQA commented Jun 26, 2015

mengxr commented Jun 29, 2015

[SPARK-7212][MLlib]Add sequence learning flag #6997

[SPARK-7212][MLlib]Add sequence learning flag #6997

Conversation

feynmanliang commented Jun 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 24, 2015

SparkQA commented Jun 25, 2015

SparkQA commented Jun 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NAVER - http://www.naver.com/

Choose a reason for hiding this comment

SparkQA commented Jun 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 26, 2015

mengxr commented Jun 29, 2015