Create binary and multiclass objective classes (#504)

* creating new binary / multiclass variants of pipelines, duplicating code for now * moving common fxns back to pipeline base * more moving around fxns in pipelines classes * capping xgboost * fixing typo * more cleanup, making predict_proba standard regardless of binary/multiclass * renaming other_objectives to objectives * updating score's objective parameter to calculate all objectives, not just additional * removing self.objective for scoring * removing objectives from pipeline initialization, adding objective as predict param * remove xgboost cap from branch * changelog * capping xgboost on local branch since tests timing out * cleaning up * more cleanup * reverting requirements file * adding classification pipeline subclass, cleaning up via PR comments * more cleanup for docstrings * more cleanup of changelog and comments * putting tests in subfolders and adding few more tests * Update dependencies (#412) * Update latest dependencies * Hide features with zero importance in plot by default (#413) * adding functionality and test * changelog and adding boolean param * Update dependencies check: package whitelist (#417) * Add a whitelist for update_deps check * Remove from expected * Update deps * Changelog * adding skeleton for subclasses * fixing test and linting * updating change from master * fixing fixture * cherry picked wip remove ROC and confusion matrix * fixing merge * fixing merge * cleaning up * make test use static attribute instead of instance' * deleting needs_fitting * updating code to use new objective classes, still broken * updating threshold, still need to clean up tests * comment out for now * more cleanup * cleaning up * more cleanup * still more cleanup * fixing plot unsuccessful merging * more cleanup but still some things to work out * cleaning up using multiclass objectives for binary classification problems * fixing typo with recall and cleanup * cleaning up * adding default * some more cleaning up * removing irrelevant test * forgot to add attribute, breaking things again * cleanup and change objective of test * removing objective from predict * more cleanup :d * remove unused attribute * cleaning up via comments * more comments * changelog * order of decorators changed * fixing copy and paste err * update random state for binary class pipelines * updating objective * typo * fixing? * fixing imports * fixing tests * adding objective as parameter for predict, removing for fit * cleaning up test * more fixing test * minor linting, need more to go * more cleanup * forgot to fix test * more merging :x * starting to add tuning logic to automl * changelog * cleaning up * change conditional for objective split * cleaning up docstrings * forgot to use classificationobjective class... * add additional cond * adding tests * cleanup * removing decision function for multiclass * updating via comments * removing classification_objective file * add test + more updates * use cls instead for pep8 standards * updating can_optimize to property * update score * fix tests * minor cleanup from comments * updating predict behavior * add separate objective check * fixing some merge conflicts cont * add fraud test * patching * remove old test * updating for now * add another test * add more tests * adding test structure, still need to fix * adding test * fix iloc * fix tests * fix import * fix test? * removing can_optimize_threshold * linting * update docs a little * remove accuracy * add more doc fixes * move binary and multi pipelines in api ref * revert components notebook * updating from comments * oops, fix none set * update docstring * update api ref? * addressing comments * revert and update * update docstring * updating docstrings and lint * updating unnecessary call to constructor * pushing empty commit to refresh Co-authored-by: Jeremy Shih <jeremyliweishih@gmail.com> Co-authored-by: Dylan Sherry <sharshofski@gmail.com>
alteryx · Apr 3, 2020 · 55d737a · 55d737a
1 parent 06bc5a8
commit 55d737a
Show file tree

Hide file tree

Showing 41 changed files with 788 additions and 526 deletions.
diff --git a/docs/source/api_reference.rst b/docs/source/api_reference.rst
@@ -104,6 +104,8 @@ Estimators
     LinearRegressor
     RandomForestRegressor
 
+.. currentmodule:: evalml.pipelines
+
 
 .. currentmodule:: evalml.pipelines
 
@@ -185,7 +187,6 @@ Domain Specific
     FraudCost
     LeadScoring
 
-
 Classification
 ~~~~~~~~~~~~~~
 
@@ -194,10 +195,18 @@ Classification
     :template: class.rst
     :nosignatures:
 
+    AUC
+    AUCMacro
+    AUCMicro
+    AUCWeighted
     F1
     F1Micro
     F1Macro
     F1Weighted
+    LogLossBinary
+    LogLossMulticlass
+    MCCBinary
+    MCCMulticlass
     Precision
     PrecisionMicro
     PrecisionMacro
@@ -206,15 +215,6 @@ Classification
     RecallMicro
     RecallMacro
     RecallWeighted
-    AUC
-    AUCMicro
-    AUCMacro
-    AUCWeighted
-    LogLoss
-    MCC
-    ROC
-    ConfusionMatrix
-
 
 Regression
 ~~~~~~~~~~
@@ -224,14 +224,13 @@ Regression
     :template: class.rst
     :nosignatures:
 
-    R2
+    ExpVariance
     MAE
+    MaxError
+    MedianAE
     MSE
     MSLE
-    MedianAE
-    MaxError
-    ExpVariance
-
+    R2
 
 Plot Metrics
 ~~~~~~~~~~~~

diff --git a/docs/source/objectives/custom_objectives.ipynb b/docs/source/objectives/custom_objectives.ipynb
@@ -34,9 +34,7 @@
     "class FraudCost(ObjectiveBase):\n",
     "    \"\"\"Score the percentage of money lost of the total transaction amount process due to fraud\"\"\"\n",
     "    name = \"Fraud Cost\"\n",
-    "    needs_fitting = True\n",
     "    greater_is_better = False\n",
-    "    uses_extra_columns = True\n",
     "    score_needs_proba = False\n",
     "\n",
     "    def __init__(self, retry_percentage=.5, interchange_fee=.02,\n",
@@ -116,4 +114,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
diff --git a/evalml/automl/auto_base.py b/evalml/automl/auto_base.py
@@ -5,6 +5,7 @@
 
 import numpy as np
 import pandas as pd
+from sklearn.model_selection import train_test_split
 from tqdm import tqdm
 
 from .pipeline_search_plots import PipelineSearchPlots
@@ -40,12 +41,11 @@ def __init__(self, problem_type, tuner, cv, objective, max_pipelines, max_time,
         self.verbose = verbose
         self.possible_pipelines = get_pipelines(problem_type=self.problem_type, model_families=allowed_model_families)
         self.objective = get_objective(objective)
+        if self.problem_type != self.objective.problem_type:
+            raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))
 
         logger.verbose = verbose
 
-        if self.problem_type not in self.objective.problem_types:
-            raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))
-
         if additional_objectives is not None:
             additional_objectives = [get_objective(o) for o in additional_objectives]
         else:
@@ -228,10 +228,10 @@ def _check_stopping_condition(self, start):
     def _check_multiclass(self, y):
         if y.nunique() <= 2:
             return
-        if ProblemTypes.MULTICLASS not in self.objective.problem_types:
+        if self.objective.problem_type != ProblemTypes.MULTICLASS:
             raise ValueError("Given objective {} is not compatible with a multiclass problem.".format(self.objective.name))
         for obj in self.additional_objectives:
-            if ProblemTypes.MULTICLASS not in obj.problem_types:
+            if obj.problem_type != ProblemTypes.MULTICLASS:
                 raise ValueError("Additional objective {} is not compatible with a multiclass problem.".format(obj.name))
 
     def _transform_parameters(self, pipeline_class, parameters, number_features):
@@ -290,7 +290,18 @@ def _do_iteration(self, X, y, pbar, raise_errors):
 
             objectives_to_score = [self.objective] + self.additional_objectives
             try:
-                pipeline.fit(X_train, y_train, self.objective)
+                X_threshold_tuning = None
+                y_threshold_tuning = None
+
+                if self.objective.problem_type == ProblemTypes.BINARY and self.objective.can_optimize_threshold:
+                    X_train, X_threshold_tuning, y_train, y_threshold_tuning = train_test_split(X_train, y_train, test_size=0.2, random_state=pipeline.estimator.random_state)
+                pipeline.fit(X_train, y_train)
+                if self.objective.problem_type == ProblemTypes.BINARY:
+                    pipeline.threshold = 0.5
+                    if self.objective.can_optimize_threshold:
+                        y_predict_proba = pipeline.predict_proba(X_threshold_tuning)
+                        y_predict_proba = y_predict_proba[:, 1]
+                        pipeline.threshold = self.objective.optimize_threshold(y_predict_proba, y_threshold_tuning, X=X_threshold_tuning)
                 scores = pipeline.score(X_test, y_test, objectives=objectives_to_score)
                 score = scores[self.objective.name]
                 plot_data.append(pipeline.get_plot_data(X_test, y_test, self.plot_metrics))

diff --git a/evalml/automl/auto_classification_search.py b/evalml/automl/auto_classification_search.py
@@ -86,7 +86,8 @@ def __init__(self,
             objective = "precision_micro"
             problem_type = ProblemTypes.MULTICLASS
         else:
-            problem_type = self._set_problem_type(objective, multiclass)
+            objective = get_objective(objective)
+            problem_type = objective.problem_type
 
         super().__init__(
             tuner=tuner,
@@ -110,27 +111,3 @@ def __init__(self,
             self.plot_metrics = [ROC(), ConfusionMatrix()]
         else:
             self.plot_metrics = [ConfusionMatrix()]
-
-    def _set_problem_type(self, objective, multiclass):
-        """Sets the problem type of the AutoClassificationSearch to either binary or multiclass.
-
-        If there is an objective either:
-            a. Set problem_type to MULTICLASS if objective is only multiclass and multiclass is false
-            b. Set problem_type to MUTLICLASS if multiclass is true
-            c. Default to BINARY
-
-        Arguments:
-            objective (Object): the objective to optimize
-            multiclass (bool): boolean representing whether search is for multiclass problems or not
-
-        Returns:
-            ProblemTypes enum representing type of problem to set AutoClassificationSearch to
-
-        """
-        problem_type = ProblemTypes.BINARY
-        # if exclusively multiclass: infer
-        if [ProblemTypes.MULTICLASS] == get_objective(objective).problem_types:
-            problem_type = ProblemTypes.MULTICLASS
-        elif multiclass:
-            problem_type = ProblemTypes.MULTICLASS
-        return problem_type
diff --git a/evalml/objectives/__init__.py b/evalml/objectives/__init__.py
@@ -4,17 +4,18 @@
 from .objective_base import ObjectiveBase
 from .standard_metrics import (
     AUC,
-    F1,
-    MCC,
-    R2,
     AUCMacro,
     AUCMicro,
     AUCWeighted,
     ExpVariance,
+    F1,
     F1Macro,
     F1Micro,
     F1Weighted,
-    LogLoss,
+    LogLossBinary,
+    LogLossMulticlass,
+    MCCBinary,
+    MCCMulticlass,
     MaxError,
     MAE,
     MedianAE,
@@ -24,6 +25,7 @@
     PrecisionMacro,
     PrecisionMicro,
     PrecisionWeighted,
+    R2,
     Recall,
     RecallMacro,
     RecallMicro,
@@ -32,3 +34,6 @@
     ConfusionMatrix
 )
 from .utils import get_objective, get_objectives
+from .binary_classification_objective import BinaryClassificationObjective
+from .multiclass_classification_objective import MultiClassificationObjective
+from .regression_objective import RegressionObjective
diff --git a/evalml/objectives/binary_classification_objective.py b/evalml/objectives/binary_classification_objective.py
@@ -0,0 +1,62 @@
+import pandas as pd
+from scipy.optimize import minimize_scalar
+
+from .objective_base import ObjectiveBase
+
+from evalml.problem_types import ProblemTypes
+
+
+class BinaryClassificationObjective(ObjectiveBase):
+    """
+    Base class for all binary classification objectives.
+
+    problem_type (ProblemTypes): Specifies the type of problem this objective is defined for (binary classification)
+    can_optimize_threshold (bool): Determines if threshold used by objective can be optimized or not.
+    """
+    problem_type = ProblemTypes.BINARY
+
+    @property
+    def can_optimize_threshold(cls):
+        """Returns a boolean determining if we can optimize the binary classification objective threshold. This will be false for any objective that works directly with predicted probabilities, like log loss and AUC. Otherwise, it will be true."""
+        return not cls.score_needs_proba
+
+    def optimize_threshold(self, ypred_proba, y_true, X=None):
+        """Learn a binary classification threshold which optimizes the current objective.
+
+        Arguments:
+            ypred_proba (list): The classifier's predicted probabilities
+
+            y_true (list): The ground truth for the predictions.
+
+            X (pd.DataFrame, optional): Any extra columns that are needed from training data.
+
+        Returns:
+            Optimal threshold for this objective
+        """
+        if not self.can_optimize_threshold:
+            raise RuntimeError("Trying to optimize objective that can't be optimized!")
+
+        def cost(threshold):
+            predictions = self.decision_function(ypred_proba=ypred_proba, threshold=threshold, X=X)
+            cost = self.objective_function(predictions, y_true, X=X)
+            return -cost if self.greater_is_better else cost
+
+        optimal = minimize_scalar(cost, method='Golden', options={"maxiter": 100})
+        return optimal.x
+
+    def decision_function(self, ypred_proba, threshold=0.5, X=None):
+        """Apply a learned threshold to predicted probabilities to get predicted classes.
+
+        Arguments:
+            ypred_proba (list): The classifier's predicted probabilities
+
+            threshold (float, optional): Threshold used to make a prediction. Defaults to 0.5.
+
+            X (pd.DataFrame, optional): Any extra columns that are needed from training data.
+
+        Returns:
+            predictions
+        """
+        if not isinstance(ypred_proba, pd.Series):
+            ypred_proba = pd.Series(ypred_proba)
+        return ypred_proba > threshold
diff --git a/evalml/objectives/fraud_cost.py b/evalml/objectives/fraud_cost.py
@@ -1,74 +1,68 @@
 import pandas as pd
 
-from .objective_base import ObjectiveBase
+from .binary_classification_objective import BinaryClassificationObjective
 
-from evalml.problem_types import ProblemTypes
 
-
-class FraudCost(ObjectiveBase):
+class FraudCost(BinaryClassificationObjective):
     """Score the percentage of money lost of the total transaction amount process due to fraud"""
     name = "Fraud Cost"
-    problem_types = [ProblemTypes.BINARY]
-    needs_fitting = True
     greater_is_better = False
-    uses_extra_columns = True
     score_needs_proba = False
 
     def __init__(self, retry_percentage=.5, interchange_fee=.02,
-                 fraud_payout_percentage=1.0, amount_col='amount', verbose=False):
+                 fraud_payout_percentage=1.0, amount_col='amount'):
         """Create instance of FraudCost
 
         Arguments:
-            retry_percentage (float): what percentage of customers will retry a transaction if it
-                is declined? Between 0 and 1. Defaults to .5
+            retry_percentage (float): What percentage of customers that will retry a transaction if it
+                is declined. Between 0 and 1. Defaults to .5
 
-            interchange_fee (float): how much of each successful transaction do you collect?
+            interchange_fee (float): How much of each successful transaction you can collect.
                 Between 0 and 1. Defaults to .02
 
-            fraud_payout_percentage (float):  how percentage of fraud will you be unable to collect.
+            fraud_payout_percentage (float): Percentage of fraud you will not be able to collect.
                 Between 0 and 1. Defaults to 1.0
 
-            amount_col (str): name of column in data that contains the amount. defaults to "amount"
+            amount_col (str): Name of column in data that contains the amount. Defaults to "amount"
         """
         self.retry_percentage = retry_percentage
         self.interchange_fee = interchange_fee
         self.fraud_payout_percentage = fraud_payout_percentage
         self.amount_col = amount_col
-        super().__init__(verbose=verbose)
 
-    def decision_function(self, y_predicted, extra_cols, threshold):
-        """Determine if transaction is fraud given predicted probabilities, dataframe with transaction amount, and threshold
+    def decision_function(self, ypred_proba, threshold=0.0, X=None):
+        """Determine if a transaction is fraud given predicted probabilities, threshold, and dataframe with transaction amount
 
             Arguments:
-                y_predicted (pd.Series): predicted labels
-                extra_cols (pd.DataFrame): extra data needed
-                threshold (float): dollar threshold to determine if transaction is fraud
+                ypred_proba (pd.Series): Predicted probablities
+                X (pd.DataFrame): Dataframe containing transaction amount
+                threshold (float): Dollar threshold to determine if transaction is fraud
 
             Returns:
-                pd.Series: series of predicted fraud label using extra cols and threshold
+                pd.Series: Series of predicted fraud labels using X and threshold
         """
-        if not isinstance(extra_cols, pd.DataFrame):
-            extra_cols = pd.DataFrame(extra_cols)
+        if not isinstance(X, pd.DataFrame):
+            X = pd.DataFrame(X)
 
-        if not isinstance(y_predicted, pd.Series):
-            y_predicted = pd.Series(y_predicted)
+        if not isinstance(ypred_proba, pd.Series):
+            ypred_proba = pd.Series(ypred_proba)
 
-        transformed_probs = (y_predicted.values * extra_cols[self.amount_col])
+        transformed_probs = (ypred_proba.values * X[self.amount_col])
         return transformed_probs > threshold
 
-    def objective_function(self, y_predicted, y_true, extra_cols):
+    def objective_function(self, y_predicted, y_true, X):
         """Calculate amount lost to fraud per transaction given predictions, true values, and dataframe with transaction amount
 
             Arguments:
                 y_predicted (pd.Series): predicted fraud labels
                 y_true (pd.Series): true fraud labels
-                extra_cols (pd.DataFrame): extra data needed
+                X (pd.DataFrame): dataframe with transaction amounts
 
             Returns:
                 float: amount lost to fraud per transaction
         """
-        if not isinstance(extra_cols, pd.DataFrame):
-            extra_cols = pd.DataFrame(extra_cols)
+        if not isinstance(X, pd.DataFrame):
+            X = pd.DataFrame(X)
 
         if not isinstance(y_predicted, pd.Series):
             y_predicted = pd.Series(y_predicted)
@@ -77,7 +71,10 @@ def objective_function(self, y_predicted, y_true, extra_cols):
             y_true = pd.Series(y_true)
 
         # extract transaction using the amount columns in users data
-        transaction_amount = extra_cols[self.amount_col]
+        try:
+            transaction_amount = X[self.amount_col]
+        except KeyError:
+            raise ValueError("`{}` is not a valid column in X.".format(self.amount_col))
 
         # amount paid if transaction is fraud
         fraud_cost = transaction_amount * self.fraud_payout_percentage