# EvalML Fraud Detection Demo:
This demo showcases using EvalMl to optimize models using a custom objective to predict realized business value. The goal of the model would be to take in credit card transaction data and decide whether the transaction is fraudulent. 

Data: https://www.kaggle.com/c/ieee-fraud-detection/

In [1]:
import os

import evalml
import featuretools as ft
import numpy as np
import pandas as pd

In [2]:
%%time 
train_identity = pd.read_csv('https://featuretools-static.s3.amazonaws.com/evalml/IEEE-CIS+Fraud+Detection/train_identity.csv')
train_transaction = pd.read_csv('https://featuretools-static.s3.amazonaws.com/evalml/IEEE-CIS+Fraud+Detection/train_transaction.csv')

CPU times: user 19.3 s, sys: 5.63 s, total: 24.9 s
Wall time: 1min 1s


In [3]:
display(train_identity.head())
display(train_transaction.head())

Unnamed: 0,TransactionID,id_01,id_02,id_03,id_04,id_05,id_06,id_07,id_08,id_09,...,id_31,id_32,id_33,id_34,id_35,id_36,id_37,id_38,DeviceType,DeviceInfo
0,2987004,0.0,70787.0,,,,,,,,...,samsung browser 6.2,32.0,2220x1080,match_status:2,T,F,T,T,mobile,SAMSUNG SM-G892A Build/NRD90M
1,2987008,-5.0,98945.0,,,0.0,-5.0,,,,...,mobile safari 11.0,32.0,1334x750,match_status:1,T,F,F,T,mobile,iOS Device
2,2987010,-5.0,191631.0,0.0,0.0,0.0,0.0,,,0.0,...,chrome 62.0,,,,F,F,T,T,desktop,Windows
3,2987011,-5.0,221832.0,,,0.0,-6.0,,,,...,chrome 62.0,,,,F,F,T,T,desktop,
4,2987016,0.0,7460.0,0.0,0.0,1.0,0.0,,,0.0,...,chrome 62.0,24.0,1280x800,match_status:2,T,F,T,T,desktop,MacOS


Unnamed: 0,TransactionID,isFraud,TransactionDT,TransactionAmt,ProductCD,card1,card2,card3,card4,card5,...,V330,V331,V332,V333,V334,V335,V336,V337,V338,V339
0,2987000,0,86400,68.5,W,13926,,150.0,discover,142.0,...,,,,,,,,,,
1,2987001,0,86401,29.0,W,2755,404.0,150.0,mastercard,102.0,...,,,,,,,,,,
2,2987002,0,86469,59.0,W,4663,490.0,150.0,visa,166.0,...,,,,,,,,,,
3,2987003,0,86499,50.0,W,18132,567.0,150.0,mastercard,117.0,...,,,,,,,,,,
4,2987004,0,86506,50.0,H,4497,514.0,150.0,mastercard,102.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Merge dataframes:

Since the data given is a one-to-one relationship between the identity and transaction data, we can merge the two dataframes on the `TransactionID` column.

In [4]:
train_df = train_transaction.merge(train_identity)

# select sample size here! `frac=1.0` may take a while to finish!
train_sample = train_df.sample(frac=1.0, random_state=1)
X_train = train_sample.drop('isFraud', axis=1)
y_train = train_sample['isFraud']

## Encode Categorical Variables:
As categorical variables are not compatible with some machine learning models, here we encode them into numerical variables by creating dummy variables.

In [5]:
cat_cols = X_train.select_dtypes(include=['object']).columns

In [6]:
# encode categorical features
X_train = pd.get_dummies(X_train, columns=cat_cols)

In [7]:
X_train, X_holdout, y_train, y_holdout = evalml.preprocessing.split_data(X_train, y_train, test_size=.8, random_state=0)

## Model Training With AUC
Here we utilize a traditional classification objective function to automatically learn the best model. Further down, 

In [8]:
clf = evalml.AutoClassifier(objective="AUC",
                            max_pipelines=10)

### After fitting our models, we can display the rankings of all the models and also score the holdout data with the best model

In [9]:
%%time
# fit using autoclassfier
clf.fit(X_train, y_train)

[1m*****************************[0m
[1m* Beginning pipeline search *[0m
[1m*****************************[0m

Optimizing for AUC. Greater score is better.

Searching up to 10 pipelines. No time limit is set. Set one using max_time parameter.

Possible model types: random_forest, xgboost, linear_model

Testing LogisticRegression w/ imputation + scaling: 100%|██████████| 10/10 [14:37:34<00:00, 5265.43s/it]   

✔ Optimization finished
CPU times: user 39min 36s, sys: 1min 28s, total: 41min 4s
Wall time: 14h 37min 34s


In [10]:
clf.rankings

Unnamed: 0,id,pipeline_name,score,high_variance_cv,parameters
0,4,LogisticRegressionPipeline,0.752407,False,"{'penalty': 'l2', 'C': 6.239401330891865, 'imp..."
1,9,LogisticRegressionPipeline,0.751914,False,"{'penalty': 'l2', 'C': 8.123565600467177, 'imp..."
2,2,LogisticRegressionPipeline,0.750758,False,"{'penalty': 'l2', 'C': 8.444214828324364, 'imp..."
3,5,LogisticRegressionPipeline,0.748011,False,"{'penalty': 'l2', 'C': 0.5765626434012575, 'im..."
4,0,RFClassificationPipeline,0.715018,False,"{'n_estimators': 569, 'max_depth': 22, 'impute..."
5,8,RFClassificationPipeline,0.705916,False,"{'n_estimators': 926, 'max_depth': 20, 'impute..."
6,1,RFClassificationPipeline,0.69191,False,"{'n_estimators': 369, 'max_depth': 10, 'impute..."
7,3,RFClassificationPipeline,0.67596,False,"{'n_estimators': 609, 'max_depth': 7, 'impute_..."
8,6,RFClassificationPipeline,0.664039,False,"{'n_estimators': 715, 'max_depth': 7, 'impute_..."
9,7,RFClassificationPipeline,0.660179,False,"{'n_estimators': 859, 'max_depth': 6, 'impute_..."


In [11]:
pipeline = clf.best_pipeline
print("Model Score: {}".format(pipeline.score(X_holdout, y_holdout)))

Model Score: 0.7624857964876715


## Custom Objective:

Here we utilize a custom objective function built within EvalML for fraud detection. Using it we can define how the model will train to provide the most realized business value. We define below that `50%` of our customers will retry a declined transaction, we earn `2%` of each transaction and we will not be able to collect `100%` of all fraudulent transactions. Thus, the model chosen will best fit our business needs.

In [12]:
fraud_objective = evalml.objectives.FraudDetection(
    retry_percentage=.5,
    interchange_fee=.02,
    fraud_payout_percentage=1.0,
    amount_col='TransactionAmt'  # column in data that contains the amount of the transaction
)

clf_fraud = evalml.AutoClassifier(objective=fraud_objective,
                            max_pipelines=10)

In [13]:
%%time
# fit using autoclassfier
clf_fraud.fit(X_train, y_train)

[1m*****************************[0m
[1m* Beginning pipeline search *[0m
[1m*****************************[0m

Optimizing for Fraud Detection. Lower score is better.

Searching up to 10 pipelines. No time limit is set. Set one using max_time parameter.

Possible model types: random_forest, xgboost, linear_model

Testing LogisticRegression w/ imputation + scaling:   0%|          | 0/10 [00:00<?, ?it/s]minimzed 8.640486047835703e-12
minimzed 0.00010389518239272876
minimzed 2.3310649088727627e-05
Testing Random Forest w/ imputation:  10%|█         | 1/10 [03:08<28:20, 188.96s/it]               minimzed 1.8505412821789644
minimzed 1.3292898825402952
minimzed 0.6705625826852317
Testing XGBoost w/ imputation:  20%|██        | 2/10 [05:39<23:39, 177.45s/it]      minimzed 1.019025415895075
minimzed 0.9414451333309479
minimzed 1.041921338299301
Testing Random Forest w/ imputation:  30%|███       | 3/10 [27:27<1:00:17, 516.72s/it]minimzed 0.9939958075856832
minimzed 1.0899572372530426
minimz

### Again we can rank our models and see the performance on our holdout sets. However, this time we will see the predicted amount of dollars lost due to fraudulent transactions!

In [14]:
clf_fraud.rankings

Unnamed: 0,id,pipeline_name,score,high_variance_cv,parameters
0,8,XGBoostPipeline,5861.35868,False,"{'eta': 0.38438170729269994, 'min_child_weight..."
1,2,XGBoostPipeline,6577.622,False,"{'eta': 0.5928446182250184, 'min_child_weight'..."
2,9,RFClassificationPipeline,6945.37704,False,"{'n_estimators': 926, 'max_depth': 20, 'impute..."
3,1,RFClassificationPipeline,6956.53431,False,"{'n_estimators': 569, 'max_depth': 22, 'impute..."
4,5,RFClassificationPipeline,7389.09264,False,"{'n_estimators': 715, 'max_depth': 7, 'impute_..."
5,7,RFClassificationPipeline,7390.56741,False,"{'n_estimators': 859, 'max_depth': 6, 'impute_..."
6,3,RFClassificationPipeline,7520.78027,False,"{'n_estimators': 369, 'max_depth': 10, 'impute..."
7,4,RFClassificationPipeline,7713.12141,False,"{'n_estimators': 609, 'max_depth': 7, 'impute_..."
8,0,LogisticRegressionPipeline,8342.1095,False,"{'penalty': 'l2', 'C': 8.444214828324364, 'imp..."
9,6,LogisticRegressionPipeline,9419.89779,False,"{'penalty': 'l2', 'C': 6.239401330891865, 'imp..."


In [15]:
pipeline = clf_fraud.best_pipeline
print("Best Model Dollars Lost: {}".format(pipeline.score(X_holdout, y_holdout)))

Best Model Dollars Lost: 75553.41490999998


### In comparison, the model that optimized for AUC performed much worse. This just goes to show how EvalML can get the results you want by optimizing for the right objective!

In [16]:
pipeline = clf.best_pipeline
print("AUC Model Dollars Lost: {}".format(pipeline.score(X_holdout, y_holdout, other_objectives=[fraud_objective])[1][0]))

AUC Model Dollars Lost: 356068.73456000007
