-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-24810] Add Estimator and Model for the k-means clustering algorithm #27
Conversation
8d514ea
to
deffa84
Compare
ab0d74a
to
43a54e6
Compare
@gaoyunhaii Could you review this PR? |
fc1dd58
to
572eaf5
Compare
flink-ml-api/src/main/java/org/apache/flink/ml/distance/DistanceMeasure.java
Show resolved
Hide resolved
flink-ml-api/src/main/java/org/apache/flink/ml/linalg/Vectors.java
Outdated
Show resolved
Hide resolved
flink-ml-api/src/main/java/org/apache/flink/ml/linalg/typeinfo/DenseVectorSerializer.java
Show resolved
Hide resolved
flink-ml-lib/src/main/java/org/apache/flink/ml/param/HasPredictionCol.java
Outdated
Show resolved
Hide resolved
flink-ml-lib/src/main/java/org/apache/flink/ml/param/HasFeaturesCol.java
Outdated
Show resolved
Hide resolved
...-ml-lib/src/main/java/org/apache/flink/ml/common/datastream/MapPartitionFunctionWrapper.java
Outdated
Show resolved
Hide resolved
...-ml-lib/src/main/java/org/apache/flink/ml/common/datastream/MapPartitionFunctionWrapper.java
Outdated
Show resolved
Hide resolved
flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeansModel.java
Show resolved
Hide resolved
flink-ml-lib/src/main/java/org/apache/flink/ml/param/HasFeaturesCol.java
Outdated
Show resolved
Hide resolved
572eaf5
to
50d440a
Compare
50d440a
to
c1ebf28
Compare
Thanks for your review @gaoyunhaii @yunfengzhou-hub! All comments have been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thanks @lindong28 for opening the PR! Will merge after the test get green~
public KMeansModel fit(Table... inputs) { | ||
StreamTableEnvironment tEnv = | ||
(StreamTableEnvironment) ((TableImpl) inputs[0]).getTableEnvironment(); | ||
DataStream<DenseVector> points = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe DataStream<Vector>
is enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply making this change will break compilation. Maybe we can make this work by also changing many other places. But I am not sure it is useful to make this change given that we don't support SparseVector yet.
Maybe we can make this change after SparseVector
is supported. Then we can write unit tests to validate that KMeans (as well as other algorithms) can support both SparseVector and DenseVector as inputs.
What is the purpose of the change
This PR adds the Estimator and Model classes for the KMeans algorithm.
Brief change log
This PR mades the following changes:
KMeans
,KMeansModel
and a few other supporting classes.KMeansTest
to validate the behavior ofKMeans
andKMeansModel
.Verifying this change
The changes are validated by unit tests in the
KMeansTest
.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation