Skip to content

Commit

Permalink
Create binary and multiclass objective classes (#504)
Browse files Browse the repository at this point in the history
* creating new binary / multiclass variants of pipelines, duplicating code for now

* moving common fxns back to pipeline base

* more moving around fxns in pipelines classes

* capping xgboost

* fixing typo

* more cleanup, making predict_proba standard regardless of binary/multiclass

* renaming other_objectives to objectives

* updating score's objective parameter to calculate all objectives, not just additional

* removing self.objective for scoring

* removing objectives from pipeline initialization, adding objective as predict param

* remove xgboost cap from branch

* changelog

* capping xgboost on local branch since tests timing out

* cleaning up

* more cleanup

* reverting requirements file

* adding classification pipeline subclass, cleaning up via PR comments

* more cleanup for docstrings

* more cleanup of changelog and comments

* putting tests in subfolders and adding few more tests

* Update dependencies  (#412)

* Update latest dependencies

* Hide features with zero importance in plot by default (#413)

* adding functionality and test

* changelog and adding boolean param

* Update dependencies check: package whitelist (#417)

* Add a whitelist for update_deps check

* Remove from expected

* Update deps

* Changelog

* adding skeleton for subclasses

* fixing test and linting

* updating change from master

* fixing fixture

* cherry picked wip remove ROC and confusion matrix

* fixing merge

* fixing merge

* cleaning up

* make test use static attribute instead of instance'

* deleting needs_fitting

* updating code to use new objective classes, still broken

* updating threshold, still need to clean up tests

* comment out for now

* more cleanup

* cleaning up

* more cleanup

* still more cleanup

* fixing plot unsuccessful merging

* more cleanup but still some things to work out

* cleaning up using multiclass objectives for binary classification problems

* fixing typo with recall and cleanup

* cleaning up

* adding default

* some more cleaning up

* removing irrelevant test

* forgot to add attribute, breaking things again

* cleanup and change objective of test

* removing objective from predict

* more cleanup :d

* remove unused attribute

* cleaning up via comments

* more comments

* changelog

* order of decorators changed

* fixing copy and paste err

* update random state for binary class pipelines

* updating objective

* typo

* fixing?

* fixing imports

* fixing tests

* adding objective as parameter for predict, removing for fit

* cleaning up test

* more fixing test

* minor linting, need more to go

* more cleanup

* forgot to fix test

* more merging :x

* starting to add tuning logic to automl

* changelog

* cleaning up

* change conditional for objective split

* cleaning up docstrings

* forgot to use classificationobjective class...

* add additional cond

* adding tests

* cleanup

* removing decision function for multiclass

* updating via comments

* removing classification_objective file

* add test + more updates

* use cls instead for pep8 standards

* updating can_optimize to property

* update score

* fix tests

* minor cleanup from comments

* updating predict behavior

* add separate objective check

* fixing some merge conflicts cont

* add fraud test

* patching

* remove old test

* updating for now

* add another test

* add more tests

* adding test structure, still need to fix

* adding test

* fix iloc

* fix tests

* fix import

* fix test?

* removing can_optimize_threshold

* linting

* update docs a little

* remove accuracy

* add more doc fixes

* move binary and multi pipelines in api ref

* revert components notebook

* updating from comments

* oops, fix none set

* update docstring

* update api ref?

* addressing comments

* revert and update

* update docstring

* updating docstrings and lint

* updating unnecessary call to constructor

* pushing empty commit to refresh

Co-authored-by: Jeremy Shih <jeremyliweishih@gmail.com>
Co-authored-by: Dylan Sherry <sharshofski@gmail.com>
  • Loading branch information
3 people committed Apr 3, 2020
1 parent 06bc5a8 commit 55d737a
Show file tree
Hide file tree
Showing 41 changed files with 788 additions and 526 deletions.
29 changes: 14 additions & 15 deletions docs/source/api_reference.rst
Expand Up @@ -104,6 +104,8 @@ Estimators
LinearRegressor
RandomForestRegressor

.. currentmodule:: evalml.pipelines


.. currentmodule:: evalml.pipelines

Expand Down Expand Up @@ -185,7 +187,6 @@ Domain Specific
FraudCost
LeadScoring


Classification
~~~~~~~~~~~~~~

Expand All @@ -194,10 +195,18 @@ Classification
:template: class.rst
:nosignatures:

AUC
AUCMacro
AUCMicro
AUCWeighted
F1
F1Micro
F1Macro
F1Weighted
LogLossBinary
LogLossMulticlass
MCCBinary
MCCMulticlass
Precision
PrecisionMicro
PrecisionMacro
Expand All @@ -206,15 +215,6 @@ Classification
RecallMicro
RecallMacro
RecallWeighted
AUC
AUCMicro
AUCMacro
AUCWeighted
LogLoss
MCC
ROC
ConfusionMatrix


Regression
~~~~~~~~~~
Expand All @@ -224,14 +224,13 @@ Regression
:template: class.rst
:nosignatures:

R2
ExpVariance
MAE
MaxError
MedianAE
MSE
MSLE
MedianAE
MaxError
ExpVariance

R2

Plot Metrics
~~~~~~~~~~~~
Expand Down
4 changes: 1 addition & 3 deletions docs/source/objectives/custom_objectives.ipynb
Expand Up @@ -34,9 +34,7 @@
"class FraudCost(ObjectiveBase):\n",
" \"\"\"Score the percentage of money lost of the total transaction amount process due to fraud\"\"\"\n",
" name = \"Fraud Cost\"\n",
" needs_fitting = True\n",
" greater_is_better = False\n",
" uses_extra_columns = True\n",
" score_needs_proba = False\n",
"\n",
" def __init__(self, retry_percentage=.5, interchange_fee=.02,\n",
Expand Down Expand Up @@ -116,4 +114,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
23 changes: 17 additions & 6 deletions evalml/automl/auto_base.py
Expand Up @@ -5,6 +5,7 @@

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tqdm import tqdm

from .pipeline_search_plots import PipelineSearchPlots
Expand Down Expand Up @@ -40,12 +41,11 @@ def __init__(self, problem_type, tuner, cv, objective, max_pipelines, max_time,
self.verbose = verbose
self.possible_pipelines = get_pipelines(problem_type=self.problem_type, model_families=allowed_model_families)
self.objective = get_objective(objective)
if self.problem_type != self.objective.problem_type:
raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))

logger.verbose = verbose

if self.problem_type not in self.objective.problem_types:
raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))

if additional_objectives is not None:
additional_objectives = [get_objective(o) for o in additional_objectives]
else:
Expand Down Expand Up @@ -228,10 +228,10 @@ def _check_stopping_condition(self, start):
def _check_multiclass(self, y):
if y.nunique() <= 2:
return
if ProblemTypes.MULTICLASS not in self.objective.problem_types:
if self.objective.problem_type != ProblemTypes.MULTICLASS:
raise ValueError("Given objective {} is not compatible with a multiclass problem.".format(self.objective.name))
for obj in self.additional_objectives:
if ProblemTypes.MULTICLASS not in obj.problem_types:
if obj.problem_type != ProblemTypes.MULTICLASS:
raise ValueError("Additional objective {} is not compatible with a multiclass problem.".format(obj.name))

def _transform_parameters(self, pipeline_class, parameters, number_features):
Expand Down Expand Up @@ -290,7 +290,18 @@ def _do_iteration(self, X, y, pbar, raise_errors):

objectives_to_score = [self.objective] + self.additional_objectives
try:
pipeline.fit(X_train, y_train, self.objective)
X_threshold_tuning = None
y_threshold_tuning = None

if self.objective.problem_type == ProblemTypes.BINARY and self.objective.can_optimize_threshold:
X_train, X_threshold_tuning, y_train, y_threshold_tuning = train_test_split(X_train, y_train, test_size=0.2, random_state=pipeline.estimator.random_state)
pipeline.fit(X_train, y_train)
if self.objective.problem_type == ProblemTypes.BINARY:
pipeline.threshold = 0.5
if self.objective.can_optimize_threshold:
y_predict_proba = pipeline.predict_proba(X_threshold_tuning)
y_predict_proba = y_predict_proba[:, 1]
pipeline.threshold = self.objective.optimize_threshold(y_predict_proba, y_threshold_tuning, X=X_threshold_tuning)
scores = pipeline.score(X_test, y_test, objectives=objectives_to_score)
score = scores[self.objective.name]
plot_data.append(pipeline.get_plot_data(X_test, y_test, self.plot_metrics))
Expand Down
27 changes: 2 additions & 25 deletions evalml/automl/auto_classification_search.py
Expand Up @@ -86,7 +86,8 @@ def __init__(self,
objective = "precision_micro"
problem_type = ProblemTypes.MULTICLASS
else:
problem_type = self._set_problem_type(objective, multiclass)
objective = get_objective(objective)
problem_type = objective.problem_type

super().__init__(
tuner=tuner,
Expand All @@ -110,27 +111,3 @@ def __init__(self,
self.plot_metrics = [ROC(), ConfusionMatrix()]
else:
self.plot_metrics = [ConfusionMatrix()]

def _set_problem_type(self, objective, multiclass):
"""Sets the problem type of the AutoClassificationSearch to either binary or multiclass.
If there is an objective either:
a. Set problem_type to MULTICLASS if objective is only multiclass and multiclass is false
b. Set problem_type to MUTLICLASS if multiclass is true
c. Default to BINARY
Arguments:
objective (Object): the objective to optimize
multiclass (bool): boolean representing whether search is for multiclass problems or not
Returns:
ProblemTypes enum representing type of problem to set AutoClassificationSearch to
"""
problem_type = ProblemTypes.BINARY
# if exclusively multiclass: infer
if [ProblemTypes.MULTICLASS] == get_objective(objective).problem_types:
problem_type = ProblemTypes.MULTICLASS
elif multiclass:
problem_type = ProblemTypes.MULTICLASS
return problem_type
13 changes: 9 additions & 4 deletions evalml/objectives/__init__.py
Expand Up @@ -4,17 +4,18 @@
from .objective_base import ObjectiveBase
from .standard_metrics import (
AUC,
F1,
MCC,
R2,
AUCMacro,
AUCMicro,
AUCWeighted,
ExpVariance,
F1,
F1Macro,
F1Micro,
F1Weighted,
LogLoss,
LogLossBinary,
LogLossMulticlass,
MCCBinary,
MCCMulticlass,
MaxError,
MAE,
MedianAE,
Expand All @@ -24,6 +25,7 @@
PrecisionMacro,
PrecisionMicro,
PrecisionWeighted,
R2,
Recall,
RecallMacro,
RecallMicro,
Expand All @@ -32,3 +34,6 @@
ConfusionMatrix
)
from .utils import get_objective, get_objectives
from .binary_classification_objective import BinaryClassificationObjective
from .multiclass_classification_objective import MultiClassificationObjective
from .regression_objective import RegressionObjective
62 changes: 62 additions & 0 deletions evalml/objectives/binary_classification_objective.py
@@ -0,0 +1,62 @@
import pandas as pd
from scipy.optimize import minimize_scalar

from .objective_base import ObjectiveBase

from evalml.problem_types import ProblemTypes


class BinaryClassificationObjective(ObjectiveBase):
"""
Base class for all binary classification objectives.
problem_type (ProblemTypes): Specifies the type of problem this objective is defined for (binary classification)
can_optimize_threshold (bool): Determines if threshold used by objective can be optimized or not.
"""
problem_type = ProblemTypes.BINARY

@property
def can_optimize_threshold(cls):
"""Returns a boolean determining if we can optimize the binary classification objective threshold. This will be false for any objective that works directly with predicted probabilities, like log loss and AUC. Otherwise, it will be true."""
return not cls.score_needs_proba

def optimize_threshold(self, ypred_proba, y_true, X=None):
"""Learn a binary classification threshold which optimizes the current objective.
Arguments:
ypred_proba (list): The classifier's predicted probabilities
y_true (list): The ground truth for the predictions.
X (pd.DataFrame, optional): Any extra columns that are needed from training data.
Returns:
Optimal threshold for this objective
"""
if not self.can_optimize_threshold:
raise RuntimeError("Trying to optimize objective that can't be optimized!")

def cost(threshold):
predictions = self.decision_function(ypred_proba=ypred_proba, threshold=threshold, X=X)
cost = self.objective_function(predictions, y_true, X=X)
return -cost if self.greater_is_better else cost

optimal = minimize_scalar(cost, method='Golden', options={"maxiter": 100})
return optimal.x

def decision_function(self, ypred_proba, threshold=0.5, X=None):
"""Apply a learned threshold to predicted probabilities to get predicted classes.
Arguments:
ypred_proba (list): The classifier's predicted probabilities
threshold (float, optional): Threshold used to make a prediction. Defaults to 0.5.
X (pd.DataFrame, optional): Any extra columns that are needed from training data.
Returns:
predictions
"""
if not isinstance(ypred_proba, pd.Series):
ypred_proba = pd.Series(ypred_proba)
return ypred_proba > threshold
57 changes: 27 additions & 30 deletions evalml/objectives/fraud_cost.py
@@ -1,74 +1,68 @@
import pandas as pd

from .objective_base import ObjectiveBase
from .binary_classification_objective import BinaryClassificationObjective

from evalml.problem_types import ProblemTypes


class FraudCost(ObjectiveBase):
class FraudCost(BinaryClassificationObjective):
"""Score the percentage of money lost of the total transaction amount process due to fraud"""
name = "Fraud Cost"
problem_types = [ProblemTypes.BINARY]
needs_fitting = True
greater_is_better = False
uses_extra_columns = True
score_needs_proba = False

def __init__(self, retry_percentage=.5, interchange_fee=.02,
fraud_payout_percentage=1.0, amount_col='amount', verbose=False):
fraud_payout_percentage=1.0, amount_col='amount'):
"""Create instance of FraudCost
Arguments:
retry_percentage (float): what percentage of customers will retry a transaction if it
is declined? Between 0 and 1. Defaults to .5
retry_percentage (float): What percentage of customers that will retry a transaction if it
is declined. Between 0 and 1. Defaults to .5
interchange_fee (float): how much of each successful transaction do you collect?
interchange_fee (float): How much of each successful transaction you can collect.
Between 0 and 1. Defaults to .02
fraud_payout_percentage (float): how percentage of fraud will you be unable to collect.
fraud_payout_percentage (float): Percentage of fraud you will not be able to collect.
Between 0 and 1. Defaults to 1.0
amount_col (str): name of column in data that contains the amount. defaults to "amount"
amount_col (str): Name of column in data that contains the amount. Defaults to "amount"
"""
self.retry_percentage = retry_percentage
self.interchange_fee = interchange_fee
self.fraud_payout_percentage = fraud_payout_percentage
self.amount_col = amount_col
super().__init__(verbose=verbose)

def decision_function(self, y_predicted, extra_cols, threshold):
"""Determine if transaction is fraud given predicted probabilities, dataframe with transaction amount, and threshold
def decision_function(self, ypred_proba, threshold=0.0, X=None):
"""Determine if a transaction is fraud given predicted probabilities, threshold, and dataframe with transaction amount
Arguments:
y_predicted (pd.Series): predicted labels
extra_cols (pd.DataFrame): extra data needed
threshold (float): dollar threshold to determine if transaction is fraud
ypred_proba (pd.Series): Predicted probablities
X (pd.DataFrame): Dataframe containing transaction amount
threshold (float): Dollar threshold to determine if transaction is fraud
Returns:
pd.Series: series of predicted fraud label using extra cols and threshold
pd.Series: Series of predicted fraud labels using X and threshold
"""
if not isinstance(extra_cols, pd.DataFrame):
extra_cols = pd.DataFrame(extra_cols)
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)

if not isinstance(y_predicted, pd.Series):
y_predicted = pd.Series(y_predicted)
if not isinstance(ypred_proba, pd.Series):
ypred_proba = pd.Series(ypred_proba)

transformed_probs = (y_predicted.values * extra_cols[self.amount_col])
transformed_probs = (ypred_proba.values * X[self.amount_col])
return transformed_probs > threshold

def objective_function(self, y_predicted, y_true, extra_cols):
def objective_function(self, y_predicted, y_true, X):
"""Calculate amount lost to fraud per transaction given predictions, true values, and dataframe with transaction amount
Arguments:
y_predicted (pd.Series): predicted fraud labels
y_true (pd.Series): true fraud labels
extra_cols (pd.DataFrame): extra data needed
X (pd.DataFrame): dataframe with transaction amounts
Returns:
float: amount lost to fraud per transaction
"""
if not isinstance(extra_cols, pd.DataFrame):
extra_cols = pd.DataFrame(extra_cols)
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)

if not isinstance(y_predicted, pd.Series):
y_predicted = pd.Series(y_predicted)
Expand All @@ -77,7 +71,10 @@ def objective_function(self, y_predicted, y_true, extra_cols):
y_true = pd.Series(y_true)

# extract transaction using the amount columns in users data
transaction_amount = extra_cols[self.amount_col]
try:
transaction_amount = X[self.amount_col]
except KeyError:
raise ValueError("`{}` is not a valid column in X.".format(self.amount_col))

# amount paid if transaction is fraud
fraud_cost = transaction_amount * self.fraud_payout_percentage
Expand Down

0 comments on commit 55d737a

Please sign in to comment.