Highly optimized Scala implementation of Chinese Restaurant Process based non-parametric Bayesian clustering
Scala
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
project
src
.gitignore
LICENSE
MAINTAINERS
README.md
build.sbt
version.sbt

README.md

chinese-restaurant-process

Highly optimized Scala implementation of Chinese Restaurant Process based non-parametric Bayesian clustering

Include in your project

Add the following to your SBT dependencies:

"com.monsanto.stats" %% "chinese-restaurant-process" % "1.0.2"

For older versions of SBT you may also need to add:

resolvers += Resolver.bintrayRepo("monsanto", "maven")

Basic Usage

import com.monsanto.stats.tables._
import com.monsanto.stats.tables.clustering._

val cannedAllTopicVectorResults: Vector[TopicVectorInput] = MnMGen.cannedData
val cannedCrp = new CRP(ModelParams(5, 2, 2), cannedAllTopicVectorResults)
val crpResult = cannedCrp.findClusters(200, RealRandomNumGen, cannedCrp.selectCluster)

Iteration 1: cluster count was 365, reseat: 35, score: -29578.83920*
Iteration 2: cluster count was 118, reseat: 15, score: -29111.34349*
Iteration 3: cluster count was 61, reseat: 7, score: -28919.62995*
Iteration 4: cluster count was 40, reseat: 6, score: -28852.91482*
Iteration 5: cluster count was 29, reseat: 6, score: -28804.38123*
Iteration 6: cluster count was 24, reseat: 5, score: -28741.68993*
Iteration 7: cluster count was 16, reseat: 5, score: -28734.04974*
Iteration 8: cluster count was 14, reseat: 6, score: -28742.16624
Iteration 9: cluster count was 12, reseat: 5, score: -28739.19560
Iteration 10: cluster count was 10, reseat: 5, score: -28738.64498
...
Iteration 190: cluster count was 4, reseat: 10, score: -28724.77273
Iteration 191: cluster count was 3, reseat: 11, score: -28724.77273
Iteration 192: cluster count was 3, reseat: 10, score: -28724.77273
Iteration 193: cluster count was 3, reseat: 10, score: -28724.77273
Iteration 194: cluster count was 3, reseat: 10, score: -28724.77273
Iteration 195: cluster count was 3, reseat: 10, score: -28724.77273
Iteration 196: cluster count was 3, reseat: 10, score: -28724.77273
Iteration 197: cluster count was 3, reseat: 11, score: -28724.77273
Iteration 198: cluster count was 3, reseat: 10, score: -28724.77273
Iteration 199: cluster count was 3, reseat: 13, score: -28724.77273
Iteration 200: cluster count was 3, reseat: 12, score: -28724.77273