Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add predictor.predict_multi and predictor.predict_proba_multi #2727

Merged
merged 4 commits into from
Feb 2, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Jan 19, 2023

Issue #, if available:

Description of changes:

  • Add predictor.predict_dict and predictor.predict_proba_dict
  • This is an optimized way to get predictions for a list of models, faster than looping through predictor.predict or predictor.predict_proba
  • Also serves as a clean way to vend per-model out-of-fold predictions, validation predictions, and test predictions for the purposes of more advanced logic (Meta-learning / Zero-shot HPO)
  • Fixed inconsistency with predictions, now will round pred_proba ties to 0 instead of 1 always when converting proba to pred, previously if object was pandas DF we would round to 0 and if it was a pandas Series we would round to 1.
  • Fixed bug in save_pkl where it would crash if saving file to working directory root.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 0.7 Release milestone Jan 19, 2023
@github-actions
Copy link

Job PR-2727-bd4488f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2727/bd4488f/index.html

@sxjscience
Copy link
Collaborator

Shall we just extend .predict() with more flags?

@Innixma
Copy link
Contributor Author

Innixma commented Jan 19, 2023

Shall we just extend .predict() with more flags?

That is possible, but it would add a significant amount of complexity to the doc of .predict(). I think I'd rather keep it separate.

In general most users wouldn't need to use .predict_dict.

@Innixma
Copy link
Contributor Author

Innixma commented Jan 19, 2023

Another thought: A different name could be predict_multi instead of predict_dict, but unsure what the best name would be

# Using > instead of >= to align with Pandas `.idxmax` logic which picks the left-most column during ties.
# If this is not done, then predictions can be inconsistent when converting in binary classification from multiclass-form pred_proba and
# binary-form pred_proba when the pred_proba is 0.5 for positive and negative classes.
y_pred = [1 if pred > 0.5 else 0 for pred in y_pred_proba]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move 0.5 into a property or method method call? Rationale: at some point we'll add optimization of the threshold so having either method or named property would be a natural place to change instead of looking for magic value here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this should be a separate PR as it is a major change. I've had this in mind for awhile

@@ -1491,7 +1491,7 @@ def evaluate_predictions(self, y_true, y_pred, sample_weight=None, silent=False,
return self._learner.evaluate_predictions(y_true=y_true, y_pred=y_pred, sample_weight=sample_weight, silent=silent,
auxiliary_metrics=auxiliary_metrics, detailed_report=detailed_report)

def leaderboard(self, data=None, extra_info=False, extra_metrics=None, only_pareto_frontier=False, skip_score=False, silent=False):
def leaderboard(self, data=None, extra_info=False, extra_metrics=None, only_pareto_frontier=False, skip_score=False, silent=False) -> pd.DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add parameter types too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, note type hint for data will wait for future PR due to multiple types support

transform_features=transform_features,
inverse_transform=inverse_transform)

def predict_dict(self, data=None, models=None, as_pandas=True, transform_features=True, inverse_transform=True) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... -> Dict[str, DataFrame] + add parameter types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, note type hint for data and output will wait for future PR due to multiple types support

X: pd.DataFrame,
models: List[str],
record_pred_time: bool = False,
**kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add return type hint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will wait for future PR, the return type is not trivial as it can be a tuple

@Innixma Innixma changed the title Add predictor.predict_dict and predictor.predict_proba_dict Add predictor.predict_multi and predictor.predict_proba_multi Feb 2, 2023
@github-actions
Copy link

github-actions bot commented Feb 2, 2023

Job PR-2727-0caa888 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2727/0caa888/index.html

@Innixma
Copy link
Contributor Author

Innixma commented Feb 2, 2023

Note: renamed to predict_multi and predict_proba_multi to better communicate the intended usage.

@Innixma Innixma merged commit 37e4496 into autogluon:master Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants