Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused fit_needs_proba parameter from objective class #320

Merged
merged 2 commits into from Jan 31, 2020

Conversation

dsherry
Copy link
Collaborator

@dsherry dsherry commented Jan 29, 2020

I don't see this parameter used anywhere in our codebase and would like to delete it


After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of docs/source/changelog.rst to include this pull request by adding :pr:123.

@dsherry dsherry requested review from kmax12 and angela97lin Jan 29, 2020
@codecov
Copy link

codecov bot commented Jan 29, 2020

Codecov Report

Merging #320 into master will increase coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #320      +/-   ##
==========================================
+ Coverage   97.04%   97.14%   +0.09%     
==========================================
  Files          96       96              
  Lines        2982     2974       -8     
==========================================
- Hits         2894     2889       -5     
+ Misses         88       85       -3
Impacted Files Coverage Δ
evalml/objectives/fraud_cost.py 100% <ø> (ø) ⬆️
evalml/objectives/lead_scoring.py 96.55% <ø> (-0.12%) ⬇️
evalml/objectives/objective_base.py 97.43% <ø> (-0.07%) ⬇️
evalml/pipelines/pipeline_base.py 98.6% <100%> (+1.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4874fc...3c09065. Read the comment docs.

@jeremyliweishih
Copy link
Contributor

jeremyliweishih commented Jan 29, 2020

It is to my understanding that fit_needs_proba is for objectives that require predicted probabilities instead of predicted labels or values when fitting for the objective. The clearest example would be any objective that needs to be able to learn the best probability threshold for a binary classification objective. We currently do that for both our custom objectives (fraud and lead scoring) and I believe sklearn does it internally for binary classification objectives. If we were to standardize objective fitting to use probabilities instead of predicted labels or values it could break fitting objectives for any custom regression objective or any objective that needs fitting with predicted values or labels.

@dsherry
Copy link
Collaborator Author

dsherry commented Jan 29, 2020

@jeremyliweishih oh, interesting. So you're saying that this flag would be used by a custom objective for regression? My understanding after reading the code was that we don't support custom objectives for regression, but you're causing me to think again. So you'd set fit_needs_proba to False for a regression problem? I will reread the code with this in mind and see if I can summarize the problem I'm seeing

@jeremyliweishih
Copy link
Contributor

jeremyliweishih commented Jan 29, 2020

To clarify: the flag is being used to denote whether an objective needs to fit using predicted probabilities or predicted labels/values. In the case of our two custom objectives, they both require the predicted probability but I can imagine in the future where a custom objective would need predicted labels/values instead.

@dsherry
Copy link
Collaborator Author

dsherry commented Jan 29, 2020

@jeremyliweishih right, I agree. This parameter would be useful to support a future feature where custom objectives can pass non-probabilities into decision_function and objective_function. However, I think there's enough rewriting of other stuff we'd need to do in order to get there (for example, decision_function is not applicable for regression), that the easiest thing to do is to remove this parameter right now, and add that functionality back in later when we have better abstractions in place. Does that seem reasonable?

@jeremyliweishih
Copy link
Contributor

jeremyliweishih commented Jan 29, 2020

@dsherry :shipit:

kmax12
kmax12 approved these changes Jan 29, 2020
Copy link
Contributor

@kmax12 kmax12 left a comment

good to merge, unless we want to hold off for full redesign.

Copy link
Contributor

@angela97lin angela97lin left a comment

LGTM, but as Max suggested, would it be better to wait until our design is settled on and fix it afterwards? :o

@dsherry dsherry self-assigned this Jan 31, 2020
@dsherry dsherry force-pushed the ds_remove_fit_needs_proba branch from 6ecde08 to 3c09065 Compare Jan 31, 2020
@dsherry
Copy link
Collaborator Author

dsherry commented Jan 31, 2020

Thanks for the reviews. I'm gonna merge this! Deleting code is one of my favorite things to do 😁🎊

To summarize / for future reference: my understanding of what fit_needs_proba does currently is it allows users to choose if they want to provide classification probabilities v.s. classified values to both a) the custom objective decision_function during both a) binary classifier threshold optimization (aka "fitting" and b) classifier prediction. I don't understand why that would be desirable. We decided to remove it because 1) it's unused and undocumented in our codebase, and it's difficult to articulate its value 2) if this parameter was intended to support custom objectives for regression, there's plenty of other things which need to change or be added to custom metrics in order to support that, 3) we'll be removing this as part of the objective API project anyways.

@dsherry dsherry merged commit 6451201 into master Jan 31, 2020
2 checks passed
@dsherry dsherry deleted the ds_remove_fit_needs_proba branch Jan 31, 2020
@angela97lin angela97lin mentioned this pull request Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants