# Fraud Detection

## Introduction

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. 

## Code

In [1]:
# Import the required libraries

import boto3
import io
import matplotlib.pyplot as plt
import numpy as np 
import os
import pandas as pd 
import datetime

import sagemaker
import sagemaker.amazon.common as smac
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer, json_deserializer

In [2]:
# Setup variable input data

s3_bucket = "cc-fraud-demo-source"
s3_source_data_key = "creditcardfraud.zip"
s3_prefix = "sagemaker/fraud-detection-testing/"
local_file_name = "creditcardfraud.zip"
training_instance_type = "ml.m5.large"
training_instance_count = 1
inference_instance_type = "ml.m5.large"
inference_instance_count = 1

In [3]:
# Setup static variables

s3_training_data_key = "{}training/recordio-pb-data".format(s3_prefix)
s3_training_data_path = os.path.join("s3://", s3_bucket, s3_training_data_key)
role = get_execution_role()
s3 = boto3.resource("s3")
local_file = "/tmp/{}".format(local_file_name)

### Data Prep
Download the source data from S3, unzip if necessary, split into training and test data sets, then upload the training data as protobuf to S3

In [4]:
# Download the S3 data

try:   
    s3.Bucket(s3_bucket).download_file(s3_source_data_key, local_file)   
except botocore.exceptions.ClientError as e:
    if e.response["Error"]["Code"] == "404":
        print("The object {} does not exist".format(s3_source_data_key))
    elif e.response["Error"]["Code"] == "403":
        print("You do not have permissions to {} ".format(s3_source_data_key))
    else:
        print("Unknown error")
    raise e

In [5]:
# Unzip the source data if it is zipped

if local_file_name.endswith(".zip") :
    import zipfile
    
    # Create the zip file
    zip_ref = zipfile.ZipFile(local_file, "r")
    
    # Get the first file in the zip
    file_name = zip_ref.namelist()[0]
    
    # Extract that file to tmp
    zip_ref.extract(file_name, "/tmp")
    
    # Close the zip
    zip_ref.close()
    
    # Delete the zip
    os.remove(local_file)
    
    # Reset the local file variable
    local_file = "/tmp/{}".format(file_name)
elif local_file_name.endswith(".gz") :
    import gzip
    # Open the zipped file
    in_file = gzip.open(local_file, "rb")
    
    # Get everything to the left of .gz
    temp_local_file = local_file.rsplit(".gz")[0]
    
    # Create the output file
    out_file = open(temp_local_file)
    
    # Write the uncompressed data to the local file
    out_file.write(in_file.read())
    
    # Close the files
    in_file.close()
    out_file.close()
    
    # Delete the gzip
    os.remove(local_file)
    
    # Reset the local file variable
    local_file = temp_local_file
    

In [6]:
# Read, shuffle, and split the data into training and test sets
# We will also remove the last column, the "label", from the data set features
raw_data = pd.read_csv(local_file).values
raw_data_row_count = raw_data.shape[0]

# Use numpy to shuffle the data
np.random.seed(0)
np.random.shuffle(raw_data)

# Use 70% for training, for our 2D array, shape's 
# first item is the row count, the second item is column count
training_row_count = int(raw_data_row_count * 0.7)

# Get the number of rows computed above for training, 0 to the training_row_count
# minus 1, and get # all columns but the last using 0 to "-1", counts backwards from 
# end in the second dimension
training_dataset_features = raw_data[:training_row_count, :-1]
training_dataset_labels = raw_data[:training_row_count, -1]

# Get the remaining rows as from the row count to the end as the
# testing data set
testing_dataset_features = raw_data[training_row_count:, :-1]
testing_dataset_labels = raw_data[training_row_count:, -1]

# Convert the training data set to protobuf and upload to S3
vectors = np.array([t.tolist() for t in training_dataset_features]).astype('float32')
labels = np.array([t.tolist() for t in training_dataset_labels]).astype('float32')

buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, vectors, labels)
buf.seek(0)
s3.Bucket(s3_bucket).Object(s3_training_data_key).upload_fileobj(buf)

### Setup The Model Training

In [7]:
# Create a function to create the SageMaker predictor

def train_model(
    s3_training_data_location, 
    s3_model_output_path,
    training_instance_type, 
    training_instance_count, 
    hyperparams,
    job_name
):
    from sagemaker.amazon.amazon_estimator import get_image_uri
    
    # Get the docker container for linear learner training in this region
    container = get_image_uri(boto3.Session().region_name, "linear-learner")
    
    # Create the linear learner estimator
    linear_estimator = sagemaker.estimator.Estimator(container,
                                                    role,
                                                    training_instance_count,
                                                    training_instance_type,
                                                    output_path = s3_model_output_path,
                                                    sagemaker_session = sagemaker.Session()
                                                    )
    
    # Set the hyper params
    linear_estimator.set_hyperparameters(**hyperparams)
    
    # Train the model
    linear_estimator.fit({"train" : s3_training_data_location},
                        job_name = job_name)
    
    return linear_estimator

In [8]:
def deploy_endpoint(
    linear_estimator,     
    inference_instance_type, 
    inference_instance_count, 
    endpoint_name
):    
    # Deploy the predictor/inference endpoint
    linear_predictor = linear_estimator.deploy(
        inference_instance_count, 
        inference_instance_type,
        endpoint_name = endpoint_name
    )
    linear_predictor.content_type = "text/csv"
    linear_predictor.serializer = csv_serializer
    linear_predictor.deserializer = json_deserializer
    return linear_predictor

In [9]:
# Create a function to create the SageMaker predictor

def predictor_from_params(
    s3_training_data_location, 
    s3_model_output_path,
    training_instance_type, 
    training_instance_count, 
    hyperparams,
    jobname,
    inference_instance_type, 
    inference_instance_count, 
    endpoint_name
):   
    # Create the linear learner estimator
    linear_estimator = train_model(
        s3_training_data_location, 
        s3_model_output_path,
        training_instance_type,
        training_instance_count,
        hyperparams,
        jobname
    )
    
    # Deploy the predictor/inference endpoint
    linear_predictor = deploy_endpoint(linear_estimator, inference_instance_type, inference_instance_count, endpoint_name)
    
    return linear_predictor
    

In [10]:
# Create a function to run inference testing

def test_inference(
    linear_predictor, 
    test_features, 
    test_labels, 
    model_name, 
    verbose = True
):
    # Split the test data into 100 batches
    # The inference response deserialization is json with one top level property, "predictions"
    # which is an array and contains objects that have the prediction results
    # "predictions":    [
    #    {
    #        "score": 0.4,
    #        "predicted_label": 0
    #    } 
    # ]
    batches = [linear_predictor.predict(batch)['predictions'] for batch in np.array_split(test_features, 100)]
    
    # Now get the predicted label values
    test_predictions = np.concatenate(
        [np.array([x['predicted_label'] for x in batch]) for batch in batches]
    )
    
    # Calculate true and false positives and negatives
    true_positive = np.logical_and(test_labels, test_predictions).sum()
    false_positive = np.logical_and(1-test_labels, test_predictions).sum()
    true_negative = np.logical_and(1-test_labels, 1 - test_predictions).sum()
    false_negative = np.logical_and(test_labels, 1 - test_predictions).sum()
    
    # calculate binary classification metrics
    recall = true_positive / (true_positive + false_negative)
    precision = true_positive / (true_positive + false_positive)
    accuracy = (true_positive + true_negative) / (true_positive + false_positive + true_negative + false_negative)
    f1 = 2 * precision * recall / (precision + recall)
    
    if verbose:
        print(pd.crosstab(test_labels, test_preds, rownames=['actuals'], colnames=['predictions']))
        print("\n{:<11} {:.3f}".format('Recall:', recall))
        print("{:<11} {:.3f}".format('Precision:', precision))
        print("{:<11} {:.3f}".format('Accuracy:', accuracy))
        print("{:<11} {:.3f}".format('F1:', f1))
        
    return {'TP': true_positive, 'FP': false_positive, 'FN': false_negative, 'TN': true_negative, 'Precision': precision, 'Recall': recall, 'Accuracy': accuracy, 
             'F1': f1, 'Model': model_name}

### Train & Deploy Models

Note that we're setting the number of epochs to 40, which is much higher than the default of 10 epochs. With early stopping, we don't have to worry about setting the number of epochs too high. Linear learner will stop training automatically after the model has converged.

In [11]:
# Training a binary classifier with default settings: logistic regression
defaults_hyperparams = {
    'feature_dim': training_dataset_features.shape[1],
    'predictor_type': 'binary_classifier',
    'epochs': 40
}
s3_defaults_output_path = 's3://{}/{}defaults'.format(s3_bucket, s3_prefix)

# Launch training and create the inference endpoint for default predictor
defaults_predictor = predictor_from_params(
    s3_training_data_path, 
    s3_defaults_output_path,
    training_instance_type,
    training_instance_count,
    defaults_hyperparams,
    "cc-fraud-logistic-regression-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    inference_instance_type,
    inference_instance_count,
    "cc-fraud-logistic-regression-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
)

2019-06-21 16:27:21 Starting - Starting the training job...
2019-06-21 16:27:22 Starting - Launching requested ML instances......
2019-06-21 16:28:27 Starting - Preparing the instances for training......
2019-06-21 16:29:33 Downloading - Downloading input data...
2019-06-21 16:30:12 Training - Training image download completed. Training in progress.
[31mDocker entrypoint called with argument(s): train[0m
[31m[06/21/2019 16:30:15 INFO 140280029837120] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_

And now we'll produce a model with a threshold tuned for the best possible precision with recall fixed at 90%:

In [12]:
# Training a binary classifier with automated threshold tuning
autothresh_hyperparams = {
    'feature_dim': training_dataset_features.shape[1],
    'predictor_type': 'binary_classifier',
    'binary_classifier_model_selection_criteria': 'precision_at_target_recall', 
    'target_recall': 0.9,
    'epochs': 40
}
s3_autothresh_output_path = 's3://{}/{}autothresh'.format(s3_bucket, s3_prefix)

# Launch training and create the inference endpoint for auto threshold
autothresh_predictor = predictor_from_params(
    s3_training_data_path, 
    s3_autothresh_output_path,
    training_instance_type,
    training_instance_count,
    autothresh_hyperparams,
    "cc-fraud-auto-threshold-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    inference_instance_type,
    inference_instance_count,
    "cc-fraud-auto-threshold-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
)

2019-06-21 16:40:21 Starting - Starting the training job...
2019-06-21 16:40:26 Starting - Launching requested ML instances......
2019-06-21 16:41:28 Starting - Preparing the instances for training......
2019-06-21 16:42:27 Downloading - Downloading input data...
2019-06-21 16:43:09 Training - Training image download completed. Training in progress.
[31mDocker entrypoint called with argument(s): train[0m
[31m[06/21/2019 16:43:12 INFO 139970424719168] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_

Now we'll improve on these results using a new feature added to linear learner: class weights for binary classification. We introduced this feature in the Class Weights section, and now we'll look into its application to the credit card fraud dataset by training a new model with balanced class weights:

In [13]:
# Training a binary classifier with class weights and automated threshold tuning
class_weights_hyperparams = {
    'feature_dim': training_dataset_features.shape[1],
    'predictor_type': 'binary_classifier',
    'binary_classifier_model_selection_criteria': 'precision_at_target_recall', 
    'target_recall': 0.9,
    'positive_example_weight_mult': 'balanced',
    'epochs': 40
}
s3_class_weights_output_path = 's3://{}/{}class_weights'.format(s3_bucket, s3_prefix)

# Launch training and create the inference endpoint for class weights
class_weights_predictor = predictor_from_params(
    s3_training_data_path, 
    s3_class_weights_output_path,
    training_instance_type,
    training_instance_count,
    class_weights_hyperparams,
    "cc-fraud-weights-predictor-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    inference_instance_type,
    inference_instance_count,
    "cc-fraud-weights-predictor" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
)

2019-06-21 16:54:23 Starting - Starting the training job...
2019-06-21 16:54:25 Starting - Launching requested ML instances......
2019-06-21 16:55:28 Starting - Preparing the instances for training...
2019-06-21 16:56:19 Downloading - Downloading input data.....
[31mDocker entrypoint called with argument(s): train[0m
[31m[06/21/2019 16:57:01 INFO 140321829427008] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'auto', u'target_recall': u'0.8', u'num_models': u'auto', u'early_stopping_

The first training examples used the default loss function for binary classification, logistic loss. Now let's train a model with hinge loss. This is also called a support vector machine (SVM) classifier with a linear kernel. Threshold tuning is supported for all binary classifier models in linear learner.

In [14]:
# Training a binary classifier with hinge loss and automated threshold tuning
svm_hyperparams = {
    'feature_dim': training_dataset_features.shape[1],
    'predictor_type': 'binary_classifier',
    'loss': 'hinge_loss',
    'binary_classifier_model_selection_criteria': 'precision_at_target_recall', 
    'target_recall': 0.9,
    'epochs': 40
}
s3_svm_output_path = 's3://{}/{}svm'.format(s3_bucket, s3_prefix)

# Launch training and create the inference endpoint for svm
svm_predictor = predictor_from_params(
    s3_training_data_path, 
    s3_svm_output_path,
    training_instance_type,
    training_instance_count,
    svm_hyperparams, 
    "cc-fraud-svm" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    inference_instance_type,
    inference_instance_count,
    "cc-fraud-svm" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
)

2019-06-21 17:07:59 Starting - Starting the training job......
2019-06-21 17:08:34 Starting - Launching requested ML instances......
2019-06-21 17:09:37 Starting - Preparing the instances for training...
2019-06-21 17:10:30 Downloading - Downloading input data......
2019-06-21 17:11:17 Training - Training image download completed. Training in progress.
[31mDocker entrypoint called with argument(s): train[0m
[31m[06/21/2019 17:11:20 INFO 140338944796480] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_schedul

And finally, let's see what happens with balancing the class weights for the SVM model:

In [15]:
# Training a binary classifier with hinge loss, balanced class weights, and automated threshold tuning
svm_balanced_hyperparams = {
    'feature_dim': training_dataset_features.shape[1],
    'predictor_type': 'binary_classifier',
    'loss': 'hinge_loss',
    'binary_classifier_model_selection_criteria': 'precision_at_target_recall', 
    'target_recall': 0.9,
    'positive_example_weight_mult': 'balanced',
    'epochs': 40
}
s3_svm_balanced_output_path = 's3://{}/{}svm_balanced'.format(s3_bucket, s3_prefix)

# Launch training and create the inference endpoint for auto threshold
svm_balanced_predictor = predictor_from_params(
    s3_training_data_path, 
    s3_svm_balanced_output_path,
    training_instance_type,
    training_instance_count,
    svm_balanced_hyperparams,
    "cc-fraud-svm-balanced" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    inference_instance_type,
    inference_instance_count,
    "cc-fraud-svm-balanced" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
)

2019-06-21 17:22:41 Starting - Starting the training job...
2019-06-21 17:22:43 Starting - Launching requested ML instances......
2019-06-21 17:23:47 Starting - Preparing the instances for training...
2019-06-21 17:24:38 Downloading - Downloading input data...
2019-06-21 17:25:08 Training - Downloading the training image..
[31mDocker entrypoint called with argument(s): train[0m
[31m[06/21/2019 17:25:24 INFO 140083853162304] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'auto', u'tar

### Evaluate the Model Performance

In [16]:
# Evaluate the trained models
predictors = {'Logistic': defaults_predictor, 'Logistic with auto threshold': autothresh_predictor, 
              'Logistic with class weights': class_weights_predictor, 'Hinge with auto threshold': svm_predictor, 
              'Hinge with class weights': svm_balanced_predictor}
metrics = {key: test_inference(predictor, testing_dataset_features, testing_dataset_labels, key, False) for key, predictor in predictors.items()}
pd.set_option('display.float_format', lambda x: '%.3f' % x)
display(pd.DataFrame(list(metrics.values())).loc[:, ['Model', 'Recall', 'Precision', 'Accuracy', 'F1']])

Unnamed: 0,Model,Recall,Precision,Accuracy,F1
0,Logistic,0.677,0.861,0.999,0.758
1,Logistic with auto threshold,0.903,0.053,0.971,0.101
2,Logistic with class weights,0.91,0.128,0.989,0.225
3,Hinge with auto threshold,0.916,0.015,0.889,0.029
4,Hinge with class weights,0.903,0.108,0.986,0.193


***True Positives (TP)*** - These are the correctly predicted positive values which means that the value of actual class is yes and the value of predicted class is also yes. E.g. if actual class value indicates that the transaction is fraud and predicted class tells you the same thing.

***True Negatives (TN)*** - These are the correctly predicted negative values which means that the value of actual class is no and value of predicted class is also no. E.g. if actual class says the transaction was not fraud and predicted class tells you the same thing.

False positives and false negatives, these values occur when your actual class contradicts with the predicted class.

***False Positives (FP)*** – When actual class is no and predicted class is yes. E.g. if actual class says this transaction is not fraud but predicted class tells you that it is fraud.

***False Negatives (FN)*** – When actual class is yes but predicted class in no. E.g. if actual class value indicates that the transaction is fraud and predicted class tells you that the transaction is fraud.

====================================================================

***Accuracy*** - Ratio of correct predictions (true positive and true negative) to the total data set (TN + TP) / All

***Precision*** - Ratio of correctly predicted postive observations to the total predicted observations (means low false positive rate)  (TP / (TP + FP))

***Recall*** - Ratio of correctly predicted positive observations to all observations in the actual class "yes" (TP / (TP + FN)). What proportion of actual positives was identified correctly?

***F1*** - F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best if false positives and false negatives have similar cost. If the cost of false positives and false negatives are very different, it’s better to look at both Precision and Recall. (2*(Recall * Precision) / (Recall + Precision))

With threshold tuning, we can accurately predict 85-90% of the fraudulent transactions in the test set (due to randomness in training, recall will vary between 0.85-0.9 across multiple runs). But in addition to those true positives, we'll have a high number of false positives: 90-95% of the transactions we predict to be fraudulent are in fact not fraudulent (precision varies between 0.05-0.1). This model would work well as a first line of defense, flagging potentially fraudulent transactions for further review. If we instead want a model that gives very few false alarms, at the cost of catching far fewer of the fraudulent transactions, then we should optimize for higher precision:

binary_classifier_model_selection_criteria='recall_at_target_precision', 
target_precision=0.9,

And what about the results of using our new feature, class weights for binary classification? Training with class weights has made a huge improvement to this model's performance! The precision is roughly doubled, while recall is still held constant at 85-90%.

Balancing class weights improved the performance of our SVM predictor, but it still does not match the corresponding logistic regression model for this dataset. Comparing all of the models we've fit so far, logistic regression with class weights and tuned thresholds did the best.



### Cleanup The Endpoints

In [17]:
def delete_endpoint(predictor):
        try:
            boto3.client('sagemaker').delete_endpoint(EndpointName=predictor.endpoint)
            print('Deleted {}'.format(predictor.endpoint))
        except:
            print('Already deleted: {}'.format(predictor.endpoint))

In [18]:
for predictor in [defaults_predictor, autothresh_predictor, class_weights_predictor, 
                  svm_predictor, svm_balanced_predictor]:
    delete_endpoint(predictor)

Deleted cc-fraud-logistic-regression-2019-06-21-16-27-20
Deleted cc-fraud-auto-threshold-2019-06-21-16-40-21
Deleted credit-card-fraud-detector-test-weights-predictor
Deleted cc-fraud-svm2019-06-21-17-07-59
Deleted cc-fraud-svm-balanced2019-06-21-17-22-41
