Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6517][mllib] Implement the Algorithm of Hierarchical Clustering #5267

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from 76 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
295bdde
[SPARK-6517][mllib] Implement the Algorithm of Hierarchical Clustering
yu-iskw Mar 30, 2015
c51017c
Fix the some comments
yu-iskw Mar 30, 2015
a8cd7ab
Remove parentheses for getters and add a test for HierarchicalCluster…
yu-iskw Apr 2, 2015
306f603
Remove the static train() method from HierarchicalClustering object
yu-iskw Apr 8, 2015
b2d0369
Add a test for HierarchicalClustering.findClosestCenter()
yu-iskw Apr 8, 2015
0ddfcfb
Add a function to compute Within Set Sum of Squared Error into Scala/…
yu-iskw Apr 7, 2015
ecb3fd7
Change the visibility of constructer parameters of HierarchicalCluste…
yu-iskw Apr 2, 2015
08f0101
Rename getSubIterations to getMaxIterations
yu-iskw Apr 27, 2015
2a14900
Modify how to broadcast variables
yu-iskw Apr 27, 2015
38f07bd
Remove unnecessary parentheses
yu-iskw Apr 27, 2015
99e703b
Add toLinkageMatrix() and toAdjacencyList()
yu-iskw Apr 28, 2015
344d14e
Add a java test file for HierarchicalClustering
yu-iskw May 20, 2015
e7256f5
Support save and load functions in Java
yu-iskw May 20, 2015
e294795
Change the specification of HierarchicalClusteringModel.save()
yu-iskw May 21, 2015
1c66e09
Format code and modify the comments
yu-iskw Apr 27, 2015
59480d3
Format the code because there is a long line
yu-iskw May 21, 2015
ec9f85f
Fix some comments for HierarchicalClustering in Scala
yu-iskw Jun 12, 2015
58999db
Sort `import` statements in HierarchicalClustering.scala and
yu-iskw Jun 12, 2015
a077e99
Format HierarchicalClusteringSuite and HierarchicalClusteringModelSuite
yu-iskw Jun 13, 2015
fa74f20
Rename ClusterTree to ClusterNode
yu-iskw Jun 18, 2015
1143920
Remove save/load from HierarchicalClusteringModel
yu-iskw Jun 18, 2015
16cc823
Fix some mislenious code pointed out by IntelliJ
yu-iskw Jun 18, 2015
c02134e
Remove python API. We will implement it at another issue.
yu-iskw Jul 3, 2015
def81e2
Rename HierarchicalClustering to BisectingKMeans
yu-iskw Jul 3, 2015
707609a
Remove the unnecessary parentheses
yu-iskw Jul 3, 2015
4e1653d
Change the way how to initialize the children centers
yu-iskw Jul 6, 2015
6a51b12
Change the criterion for building a cluster tree from variance vector…
yu-iskw Jul 14, 2015
c8a2a19
Change `toArray` to avoid the TimSort error
yu-iskw Jul 14, 2015
5f899b3
Format the code, since there are some validation problems
yu-iskw Jul 14, 2015
313e87f
Remove unnecesary comment and import
yu-iskw Jul 14, 2015
3f6b14a
Fix a typo and a few comments
yu-iskw Jul 15, 2015
52b4704
Add a new line above spark project classes
yu-iskw Jul 29, 2015
fe87715
Arrange the order of import statements
yu-iskw Jul 29, 2015
eeef1e7
Use `isDefined`, instead of `!= None`
yu-iskw Jul 29, 2015
3156dd7
Update the bisecting k-means
yu-iskw Oct 21, 2015
31623ea
Improve a performance
yu-iskw Oct 22, 2015
052c9d6
Remove `toAdjacencyList` and `toLinkageList`
yu-iskw Oct 26, 2015
e13e47f
Remove an unnecessary constructor arg: `clusterMap`
yu-iskw Oct 26, 2015
ad6b9e2
Add `import BisectingKMeans._` inside of `BisectingKMeans` class
yu-iskw Oct 26, 2015
043f5f3
Rename `rddArray` to `updatedDataHistory` in order to make the name m…
yu-iskw Oct 26, 2015
31b05ec
Modify `math.log10` to `math.log`
yu-iskw Oct 26, 2015
12a60cf
Add `getMinimumNumNodeInTree` to calculate the minimum number of node…
yu-iskw Oct 26, 2015
a13a404
Rename `leafClusters` to `leafClusterStats`
yu-iskw Oct 26, 2015
75564b5
Modify `BisectingKMeans.updateClusterIndex`
yu-iskw Oct 26, 2015
084b992
Move checking whether there are dividable clusters or not to below
yu-iskw Oct 26, 2015
b6a952d
Make sure the input data keeps the storage level and unpersist unnece…
yu-iskw Oct 26, 2015
1ba4e45
Remove `closestCenter` from `findClosestCenter` because it was a unne…
yu-iskw Oct 26, 2015
cb4fbfe
Modify `summarizeClusters`
yu-iskw Oct 26, 2015
fbcb9ea
Modify a type
yu-iskw Oct 26, 2015
ffbe399
Replace `criterion` with `cost`
yu-iskw Oct 26, 2015
6f37028
Change the constructor args of `BisectingClusterStats`
yu-iskw Oct 26, 2015
010fd2c
Convert `Long` to `BigInt`
yu-iskw Oct 26, 2015
e39f69a
Modify `BisectingKMeansModel.predict`
yu-iskw Oct 26, 2015
2e8226d
Organize import statements
yu-iskw Oct 26, 2015
622499e
Modify the comment inside of `updateClusterIndex`
yu-iskw Oct 26, 2015
267908a
Change the default value of `k` to 2
yu-iskw Oct 27, 2015
a664f6d
Change `10E-4` to `10e-4`
yu-iskw Oct 27, 2015
165e191
Remove `toLinkageMatrix` and `toAdjacencyList` from `BisectingKMeans`
yu-iskw Oct 27, 2015
c417dd1
Change 10e-4 to 1e-4
yu-iskw Oct 28, 2015
e2f6966
Fix a test for the default value of k
yu-iskw Oct 29, 2015
69e0910
Replace `mapPartition(...).reduceByKey(...)` with `aggregateByKey`
yu-iskw Oct 28, 2015
675bafb
Change `sumOfSquares` vector to scholar at `divideClusters`
yu-iskw Oct 29, 2015
704e145
Replace a chain of `mapPartition` and `reduceByKey` with `aggregateBy…
yu-iskw Oct 29, 2015
ee3ea62
Modify `getMinimumNumNodesInTree` with `1 << multiplier`
yu-iskw Oct 29, 2015
73e2c7a
Rename `numClusters` parameter to `k`
yu-iskw Oct 29, 2015
a876ba2
Rename a variable in `BisectingKMeansModelSuite`
yu-iskw Oct 29, 2015
1985fea
tmp
yu-iskw Oct 29, 2015
629f897
Make this implementation more simple
yu-iskw Oct 29, 2015
1f84ded
Reorganize import statements and adjust parameters and return values
yu-iskw Oct 29, 2015
12b3223
Rename `WSSSE` to `computeCost`
yu-iskw Oct 29, 2015
ef4a3e8
Remove `updateClusterIndex`
yu-iskw Oct 29, 2015
57b06ba
Remove BisectingClusterStat object
yu-iskw Oct 29, 2015
5da05d3
Fix minors
yu-iskw Oct 29, 2015
a50689a
Improve `initNextCenters`
yu-iskw Oct 29, 2015
d422be7
refactor
mengxr Nov 9, 2015
75ca2a0
Merge pull request #4 from mengxr/SPARK-6517
yu-iskw Nov 9, 2015
29ccdf9
Remove a magic number 63 for level limitation
yu-iskw Nov 9, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading