[FLINK-1723] [ml] [WIP] Add cross validation for model evaluation #891

thvasilo · 2015-07-08T09:22:25Z

Cross validation (CV) [1] is a standard tool to estimate the test error for a model. As such it is a crucial tool for every machine learning library.

This builds upon the ongoing work on the evaluation framework for FlinkML.
As such, the current version supports calculating the score of Predictors only, however the end goal is to be able to have CV for Estimators as well to cover the unsupervised learning case.

We are using some code from the Apache Spark project, mostly simple routines for probabilistic sampling of datasets and generation of KFold CV data.

More and better tests need to be added to the implementation, and the current sampling approaches probably will not work if used within an iteration.

…instead of functions. Not too happy with the extra biolerplate of Score as classes will probably revert, and have objects like RegressionsScores, ClassificationScores that contain the definitions of the relevant scores.

…hms.

…lassifiers.

…lasses.

All predictors must now implement a calculateScore function. We are for now assuming that predictors are supervised learning algorithms, once unsupervised learning algorithms are added this will need to be reworked. Also added an evaluate dataset operation to ALS, to allow for scoring of the algorithm. Default performance measure for ALS is RMSE.

… a predictor.

tillrohrmann · 2015-07-08T11:55:12Z

...-staging/flink-ml/src/test/scala/org/apache/flink/ml/evaluation/CrossValidationITSuite.scala

+      .setStepsize(10.0)
+      .setIterations(100)
+
+    println()


What does println do?

It prints a line between the consecutive test runs, I just have it there so I can more easily see what is happening. The tests don't really do anything yet, just print results.

chobeat · 2016-02-10T11:04:15Z

No news?

zentol · 2019-02-28T22:51:33Z

Closing since flink-ml is effectively frozen.

mikiobraun and others added 16 commits July 2, 2015 15:08

Adding some first loss functions for the evaluation framework

305b43a

Scorer for evaluation

bdb1a69

Adds a evaluate operation for LabeledVector input

5c89c47

Adds Regressor interface, and a score function for regression algorit…

e7bb4b4

…hms.

Added Classifier intermediate class, and default score function for c…

3d8a692

…lassifiers.

Going back to having scores defined in objects instead of their own c…

e1a26ed

…lasses.

Removed ParameterMap from predict function of PredictOperation

0dd251a

Made calculateScore only take DataSet[(Double, Double)]

d9715ed

Added test for DataSet.mean()

4983c47

Added simple sampling algorithms, using filter()

250a754

Added KFold splitting

2a3de88

Made KFold into a class, added folds class parameter

1febc84

Switched from cross to mapWithBcVariable

85f8ed0

Added crossValScore function to compute the cross-validated score for…

44d9251

… a predictor.

tillrohrmann reviewed Jul 8, 2015
View reviewed changes

thvasilo mentioned this pull request Mar 29, 2016

[FLINK-2157] [ml] Create evaluation framework for ML library #871

Closed

thvasilo mentioned this pull request Nov 23, 2016

[FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & evaluation (WIP) #2838

Closed

zentol closed this Feb 28, 2019

rmetzger added the component=Library/MachineLearning label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-1723] [ml] [WIP] Add cross validation for model evaluation #891

[FLINK-1723] [ml] [WIP] Add cross validation for model evaluation #891

thvasilo commented Jul 8, 2015

tillrohrmann Jul 8, 2015

thvasilo Jul 8, 2015

chobeat commented Feb 10, 2016

zentol commented Feb 28, 2019

[FLINK-1723] [ml] [WIP] Add cross validation for model evaluation #891

[FLINK-1723] [ml] [WIP] Add cross validation for model evaluation #891

Conversation

thvasilo commented Jul 8, 2015

tillrohrmann Jul 8, 2015

Choose a reason for hiding this comment

thvasilo Jul 8, 2015

Choose a reason for hiding this comment

chobeat commented Feb 10, 2016

zentol commented Feb 28, 2019