-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default for max_variables
in fit_class_random_forest
and fit_regr_random_forest
#365
Comments
related tickets:
|
Just to clarify, the proposal was meant to say: Make it required (and as such have no default value, so the user needs to choose). The implementations are wildly different and as such it seems weird to have a default that does some kind of a black box thing. That is not really reproducible then. Required and without default value is how the processes are defined right now. So I don't see an inconsistency here? Or was this meant to go into openeo-python-client and the back-end to align? |
You have a reproducibility problem anyway because of these wildly different implementations. Requiring the user to specify a I think a ML/AI user is aware of the multitude of degrees of freedom in ML tech and is not going to expect reproducible models or results when switching to a different back-end with different technology. But by requiring to set Likewise, it's is or will be annoying for documentation/demo/tutorial purposes if we can't give a generic
The inconsistency I want to refer to is between the spec in openeo-processes and the actual implementations as they current are. |
To turn that around: each random forest implementation already has a default for their I understand that you're worried that the behavior for |
My argument in short: in this case I think user friendliness and interoperability (of code) is more important than reproducibility of results (which is practically impossible here anyway I'm afraid) |
Okay, but should we add a default value or remove it completely? #358 |
Well, I proposed to drop it in #358 to have a quick solution so that implementers could proceed in function of the SRR deadlines, with the idea to reintroduce it again at a later point once we have a better idea how to tackle the interoperability issues (string versus int, enum values, defaults, ...). To give an implementer's data point: in the VITO backend we practically ignore the |
I just noticed this inconsistency:
The process specs of
fit_class_random_forest
andfit_regr_random_forest
do no have a default value for themax_variables
parameter.e.g.
openeo-processes/proposals/fit_class_random_forest.json
Lines 27 to 44 in 589bfd2
In the python client we apparently have default
null
with behavior defined as:And in the VITO back-end implementation of fit_class_random_forest we even don't use the max_variables yet, and just use the default ("auto") behavior of Spark MlLib's RandomForest.
Obviously, there are a couple of things to be resolved.
For the process spec of
fit_class_random_forest
andfit_regr_random_forest
I would propose to allow a default value (null
) for themax_variables
parameter, with the behavior "back-end is free to chose the best strategy".The text was updated successfully, but these errors were encountered: