Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-1723] [ml] [WIP] Add cross validation for model evaluation #891

Closed
wants to merge 16 commits into from

Conversation

thvasilo
Copy link

@thvasilo thvasilo commented Jul 8, 2015

Cross validation (CV) [1] is a standard tool to estimate the test error for a model. As such it is a crucial tool for every machine learning library.

This builds upon the ongoing work on the evaluation framework for FlinkML.
As such, the current version supports calculating the score of Predictors only, however the end goal is to be able to have CV for Estimators as well to cover the unsupervised learning case.

We are using some code from the Apache Spark project, mostly simple routines for probabilistic sampling of datasets and generation of KFold CV data.

More and better tests need to be added to the implementation, and the current sampling approaches probably will not work if used within an iteration.

mikiobraun and others added 16 commits July 2, 2015 15:08
…instead of functions.

Not too happy with the extra biolerplate of Score as classes will probably revert,
and have objects like RegressionsScores, ClassificationScores that contain the definitions
of the relevant scores.
All predictors must now implement a calculateScore function.
We are for now assuming that predictors are supervised learning algorithms,
once unsupervised learning algorithms are added this will need to be reworked.

Also added an evaluate dataset operation to ALS, to allow for scoring of the
algorithm. Default performance measure for ALS is RMSE.
.setStepsize(10.0)
.setIterations(100)

println()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does println do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prints a line between the consecutive test runs, I just have it there so I can more easily see what is happening. The tests don't really do anything yet, just print results.

@chobeat
Copy link
Contributor

chobeat commented Feb 10, 2016

No news?

@zentol
Copy link
Contributor

zentol commented Feb 28, 2019

Closing since flink-ml is effectively frozen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants