-
Notifications
You must be signed in to change notification settings - Fork 7
Implementation for #126: supporting evaluation & forward feature selection for custom metrics (e.g. lift) #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, good additions! Looks almost finished, I just added a few documentation remarks, and some questions about the (code) distinction between y_pred, y_score, etc.
Wrt the mocking, you need to install pytest-mock (pip install pytest-mock). |
Thanks, I've added that info to the Contributing guidelins & workflow for in case other will have the same question as I did. |
Hey Sam, just FYI: I'm committing my code changes I did based on your review, but then some meetings came up in between because I'm going to start at a new client, I couldn't finish the unit tests, will do that after the long weekend. |
…reworking into base + child classes (had too much common code).
Hi Sam. I've finished changes for this one. I started fixing the failing unit tests, but during the work I started noticing how much code of LogisticRegressionModel and LinearRegressionModel actually copied eachother, so I've also abstracted the duplicate code out of those classes into a base class Model. So maybe have a look at it again if you have the time, before merging. Or actually you also just merge, in my opinion it's quite fine, and all unit tests + the new unit tests run successfully. Optional read:
(I've added pytest-mock to my development plan to learn this magic!)
... which is why I went for passing a custom scoring function directly, rather than mocking one from sklearn:
So, I sort of circumvented the mocking difficulty, but I think in the end the custom function tests the functionality well anyway. Finally: YAAY, a new pull request that is finished! |
if higher_is_better is None: | ||
if metric is None: | ||
if self.MLModel == LogisticRegressionModel: | ||
# If no custom evaluation metric is chosen, | ||
# the LogisticRegressionModel uses AUC as default metric, | ||
# so "higher is better" evaluation logic is applied on the | ||
# evaluation scores. | ||
self.higher_is_better = True | ||
elif self.MLModel == LinearRegressionModel: | ||
# If no custom evaluation metric is chosen, | ||
# the LinearRegressionModel uses RMSE as default metric, | ||
# so "lower is better" evaluation logic is applied on the | ||
# evaluation scores. | ||
self.higher_is_better = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really convinced this is the best way to do this. metric
should also support str
, which makes it a lot more user-friendly.
Since higher_is_better
is completely dependent on the metric, I think that a mapping of metric -> higher_is_better
would be more efficient. The type of model should not be relevant, only for the default value. It is true that some (or most) metrics are only used for classification or regression and not both, but we shouldn't make our code more complex because of this.
default_higher_is_better = self.MLModel == LogisticRegressionModel
metric_map = {
"RMSE": False,
"AUC": True,
None: default_higher_is_better
}
if isinstance(metric, str):
self.higher_is_better = metric_map[metric]
else: # metric should be a Callable in this case; or you could add another if statement: if callable(metric):
# and higher_is_better should be mandatory in this case, else we raise an error
<some raising here in case of missing higher_is_better or if metric is not a callable>
raise ValueError("The configured machine learning model is " | ||
"not the standard logistic regression or " | ||
"linear regression model. " | ||
"Therefore, please fill the metric and " | ||
"higher_is_better arguments.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which other types are supported? L79-L82 only states 2 types.
linear : LinearRegression | ||
scikit-learn linear regression model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only has predictors as attribute
@@ -229,42 +251,72 @@ def compute_variable_importance(self, data: pd.DataFrame) -> pd.DataFrame: | |||
return (df.sort_values(by="importance", ascending=False) | |||
.reset_index(drop=True)) | |||
|
|||
def _is_valid_dict(self, model_dict: dict) -> bool: | |||
def _is_valid_dict(self, model_dict: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _is_valid_dict(self, model_dict: dict): | |
def _is_valid_dict(self, model_dict: dict) -> bool: |
self.predictors = [] # placeholder to keep track of a list of predictors | ||
self._eval_metrics_by_split = {} | ||
super().__init__() | ||
self.logit = LogisticRegression(fit_intercept=True, C=1e9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change logit
to model
in LogisticRegression class and linear
to model
in LinearRegression class, you can place the fit() method in the base Model class. Both self.logit
and self.linear
have the same functionality, i.e. the actual model of this class, so the more general name should be self.model
or something similar (and equal).
return sqrt( | ||
mean_squared_error(y_true=y_true, | ||
# LinearRegressionModel actually returns y_pred | ||
# inside y_score: | ||
y_pred=y_score) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return sqrt( | |
mean_squared_error(y_true=y_true, | |
# LinearRegressionModel actually returns y_pred | |
# inside y_score: | |
y_pred=y_score) | |
) | |
return mean_squared_error(y_true=y_true, | |
# LinearRegressionModel actually returns y_pred | |
# inside y_score: | |
y_pred=y_score, | |
squared=False) | |
) |
====> Do not merge yet! See below for explanation. <=====
Story Title
See issue #126: Tuning classification model on different metrics (lift, recall, precision,...)
Changes made
I tried out my changes in a notebook on a model, looks good. :-)
But do not merge yet!
I'm stuck on the 25 failing unit tests, mainly due to the cryptical error "mocker not found":
=> Are we using fixtures? With a find, I can't locate where
mocker
is instantiated.Anyway, next to this question, of course fixing the unit tests remain, before we can merge this into develop.
But meanwhile Hendrik can use the issue branch, and I can discuss with you Sam how to fix the above unit test shizzle. :-)
I'll plan in that work for next week.
How does the solution address the problem
This PR lets any data scientist the freedom to pick a different evaluation metric for forward feature selection or just for simple evaluation of a model.
Linked issues
Resolves #126