Skip to content

Commit

Permalink
[SPARK-22399][ML] update the location of reference paper
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?
Update the url of reference paper.

## How was this patch tested?
It is comments, so nothing tested.

Author: bomeng <bmeng@us.ibm.com>

Closes #19614 from bomeng/22399.
  • Loading branch information
bomeng authored and srowen committed Oct 31, 2017
1 parent 1ff41d8 commit aa6db57
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/mllib-clustering.md
Expand Up @@ -134,7 +134,7 @@ Refer to the [`GaussianMixture` Python docs](api/python/pyspark.mllib.html#pyspa

Power iteration clustering (PIC) is a scalable and efficient algorithm for clustering vertices of a
graph given pairwise similarities as edge properties,
described in [Lin and Cohen, Power Iteration Clustering](http://www.icml2010.org/papers/387.pdf).
described in [Lin and Cohen, Power Iteration Clustering](http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf).
It computes a pseudo-eigenvector of the normalized affinity matrix of the graph via
[power iteration](http://en.wikipedia.org/wiki/Power_iteration) and uses it to cluster vertices.
`spark.mllib` includes an implementation of PIC using GraphX as its backend.
Expand Down
Expand Up @@ -28,7 +28,8 @@ import org.apache.spark.mllib.clustering.PowerIterationClustering
import org.apache.spark.rdd.RDD

/**
* An example Power Iteration Clustering http://www.icml2010.org/papers/387.pdf app.
* An example Power Iteration Clustering app.
* http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf
* Takes an input of K concentric circles and the number of points in the innermost circle.
* The output should be K clusters - each cluster containing precisely the points associated
* with each of the input circles.
Expand Down
Expand Up @@ -103,9 +103,9 @@ object PowerIterationClusteringModel extends Loader[PowerIterationClusteringMode

/**
* Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
* <a href="http://www.icml2010.org/papers/387.pdf">Lin and Cohen</a>. From the abstract: PIC finds
* a very low-dimensional embedding of a dataset using truncated power iteration on a normalized
* pair-wise similarity matrix of the data.
* <a href="http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf">Lin and Cohen</a>.
* From the abstract: PIC finds a very low-dimensional embedding of a dataset using
* truncated power iteration on a normalized pair-wise similarity matrix of the data.
*
* @param k Number of clusters.
* @param maxIterations Maximum number of iterations of the PIC algorithm.
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/mllib/clustering.py
Expand Up @@ -636,7 +636,7 @@ def load(cls, sc, path):
class PowerIterationClustering(object):
"""
Power Iteration Clustering (PIC), a scalable graph clustering algorithm
developed by [[http://www.icml2010.org/papers/387.pdf Lin and Cohen]].
developed by [[http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf Lin and Cohen]].
From the abstract: PIC finds a very low-dimensional embedding of a
dataset using truncated power iteration on a normalized pair-wise
similarity matrix of the data.
Expand Down

0 comments on commit aa6db57

Please sign in to comment.