Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7212][MLlib]Add sequence learning flag #6997

Closed
wants to merge 7 commits into from

Conversation

feynmanliang
Copy link
Contributor

Support mining of ordered frequent item sequences.

@@ -62,13 +62,14 @@ class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) ex
@Experimental
class FPGrowth private (
private var minSupport: Double,
private var numPartitions: Int) extends Logging with Serializable {
private var numPartitions: Int,
private var mineSequences: Boolean) extends Logging with Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

itemsets and sequences are correct. The issue is that we have freqItemsets in the model, no matter whether it stores unordered sets or sequences. This may be confusing to users. I recommend renaming this variable to something like ordered or preservesOrdering. cc @jkbradley

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it would be nice to pass this information to the model. So users know wether the ordering is preserved or not from the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35722 has finished for PR 6997 at commit 648d4d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35761 has finished for PR 6997 at commit f04bd50.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)
    • s"Using output committer class $
    • logInfo(s"Using user defined output committer class $
    • s"Using output committer class $

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35762 has finished for PR 6997 at commit 34ef8f2.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)

* @tparam Item item type
*/
class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable {
class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a break change. Please create an auxiliary constructor with the original signature.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAVER - http://www.naver.com/

sujkh@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-7212][MLlib]Add sequence learning flag (#6997)> 이 다음과 같은 이유로 전송 실패했습니다.


받는 사람이 회원님의 메일을 수신차단 하였습니다.


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35828 has finished for PR 6997 at commit ce987cb.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)

@@ -155,7 +165,7 @@ class FPGrowth private (
.flatMap { case (part, tree) =>
tree.extract(minCount, x => partitioner.getPartition(x) == part)
}.map { case (ranks, count) =>
new FreqItemset(ranks.map(i => freqItems(i)).toArray, count)
new FreqItemset(ranks.map(i => freqItems(i)).reverse.toArray, count, ordered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you updated the ordering, we need to update Python doctest. See the Jenkins build log: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35828/consoleFull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35885 has finished for PR 6997 at commit 7c14e15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class FreqItemset[Item](val items: Array[Item], val freq: Long, val ordered: Boolean)

@mengxr
Copy link
Contributor

mengxr commented Jun 29, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 25f574e Jun 29, 2015
@feynmanliang feynmanliang deleted the fp-sequence branch August 17, 2015 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants