Add predictor.predict_multi and predictor.predict_proba_multi #2727

Innixma · 2023-01-19T00:01:53Z

Issue #, if available:

Description of changes:

Add predictor.predict_dict and predictor.predict_proba_dict
This is an optimized way to get predictions for a list of models, faster than looping through predictor.predict or predictor.predict_proba
Also serves as a clean way to vend per-model out-of-fold predictions, validation predictions, and test predictions for the purposes of more advanced logic (Meta-learning / Zero-shot HPO)
Fixed inconsistency with predictions, now will round pred_proba ties to 0 instead of 1 always when converting proba to pred, previously if object was pandas DF we would round to 0 and if it was a pandas Series we would round to 1.
Fixed bug in save_pkl where it would crash if saving file to working directory root.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2023-01-19T01:46:28Z

Job PR-2727-bd4488f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2727/bd4488f/index.html

sxjscience · 2023-01-19T16:55:33Z

Shall we just extend .predict() with more flags?

Innixma · 2023-01-19T20:50:23Z

Shall we just extend .predict() with more flags?

That is possible, but it would add a significant amount of complexity to the doc of .predict(). I think I'd rather keep it separate.

In general most users wouldn't need to use .predict_dict.

Innixma · 2023-01-19T20:51:32Z

Another thought: A different name could be predict_multi instead of predict_dict, but unsure what the best name would be

gradientsky · 2023-01-20T23:17:30Z

core/src/autogluon/core/utils/utils.py

+        # Using > instead of >= to align with Pandas `.idxmax` logic which picks the left-most column during ties.
+        # If this is not done, then predictions can be inconsistent when converting in binary classification from multiclass-form pred_proba and
+        # binary-form pred_proba when the pred_proba is 0.5 for positive and negative classes.
+        y_pred = [1 if pred > 0.5 else 0 for pred in y_pred_proba]


Shall we move 0.5 into a property or method method call? Rationale: at some point we'll add optimization of the threshold so having either method or named property would be a natural place to change instead of looking for magic value here.

Yes, but this should be a separate PR as it is a major change. I've had this in mind for awhile

gradientsky · 2023-01-20T23:18:44Z

tabular/src/autogluon/tabular/predictor/predictor.py

@@ -1491,7 +1491,7 @@ def evaluate_predictions(self, y_true, y_pred, sample_weight=None, silent=False,
        return self._learner.evaluate_predictions(y_true=y_true, y_pred=y_pred, sample_weight=sample_weight, silent=silent,
                                                  auxiliary_metrics=auxiliary_metrics, detailed_report=detailed_report)

-    def leaderboard(self, data=None, extra_info=False, extra_metrics=None, only_pareto_frontier=False, skip_score=False, silent=False):
+    def leaderboard(self, data=None, extra_info=False, extra_metrics=None, only_pareto_frontier=False, skip_score=False, silent=False) -> pd.DataFrame:


nit: add parameter types too

added, note type hint for data will wait for future PR due to multiple types support

gradientsky · 2023-01-20T23:19:56Z

tabular/src/autogluon/tabular/predictor/predictor.py

+                                                transform_features=transform_features,
+                                                inverse_transform=inverse_transform)
+
+    def predict_dict(self, data=None, models=None, as_pandas=True, transform_features=True, inverse_transform=True) -> dict:


nit: ... -> Dict[str, DataFrame] + add parameter types

added, note type hint for data and output will wait for future PR due to multiple types support

gradientsky · 2023-01-20T23:20:27Z

core/src/autogluon/core/trainer/abstract_trainer.py

+                            X: pd.DataFrame,
+                            models: List[str],
+                            record_pred_time: bool = False,
+                            **kwargs):


add return type hint

will wait for future PR, the return type is not trivial as it can be a tuple

github-actions · 2023-02-02T03:36:04Z

Job PR-2727-0caa888 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2727/0caa888/index.html

Innixma · 2023-02-02T03:36:54Z

Note: renamed to predict_multi and predict_proba_multi to better communicate the intended usage.

Innixma added this to the 0.7 Release milestone Jan 19, 2023

Innixma requested review from gradientsky and yinweisu January 19, 2023 00:01

gradientsky reviewed Jan 20, 2023

View reviewed changes

Innixma added 4 commits February 1, 2023 18:10

Add zeroshot simulation methods

9969529

Add predictor.predict_dict and predictor.predict_proba_dict

641f550

reduce code dupe

b83df44

Address PR comments + cleanup self.__get_dataset

0caa888

Innixma force-pushed the zero_shot_hpo branch from bd4488f to 0caa888 Compare February 2, 2023 02:15

Innixma changed the title ~~Add predictor.predict_dict and predictor.predict_proba_dict~~ Add predictor.predict_multi and predictor.predict_proba_multi Feb 2, 2023

gradientsky approved these changes Feb 2, 2023

View reviewed changes

Innixma merged commit 37e4496 into autogluon:master Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add predictor.predict_multi and predictor.predict_proba_multi #2727

Add predictor.predict_multi and predictor.predict_proba_multi #2727

Innixma commented Jan 19, 2023

github-actions bot commented Jan 19, 2023

sxjscience commented Jan 19, 2023

Innixma commented Jan 19, 2023 •

edited

Innixma commented Jan 19, 2023

gradientsky Jan 20, 2023

Innixma Jan 21, 2023

gradientsky Jan 20, 2023

Innixma Feb 2, 2023

gradientsky Jan 20, 2023

Innixma Feb 2, 2023

gradientsky Jan 20, 2023

Innixma Feb 2, 2023

github-actions bot commented Feb 2, 2023

Innixma commented Feb 2, 2023

Add predictor.predict_multi and predictor.predict_proba_multi #2727

Add predictor.predict_multi and predictor.predict_proba_multi #2727

Conversation

Innixma commented Jan 19, 2023

github-actions bot commented Jan 19, 2023

sxjscience commented Jan 19, 2023

Innixma commented Jan 19, 2023 • edited

Innixma commented Jan 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 2, 2023

Innixma commented Feb 2, 2023

Innixma commented Jan 19, 2023 •

edited