## "Detection of Anomalies in Financial Transactions using Deep Autoencoder Networks"

This GPU Technology Conference (GTC) 2018 lab was developed by Mr. X, and Mr. Y

## 01. Environment Verification

#### 01.1 Python Verification

Before we begin, let's verify that Python is working on your system. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Shift-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell.

In [None]:
print('The answer should be forty-two: {}'.format(str(40+2)))

#### 01.2 Import Python Libraries

In [1]:
# importing utilities
import os
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from IPython.display import Image, display

# importing pytorch libraries
import torch
from torch import nn
from torch import autograd
from torch.utils.data import DataLoader

# importing data science libraries
import pandas as pd
import random as rd
import numpy as np

#### 01.3 CUDNN / GPU Verficiation

In [2]:
# print CUDNN backend version
print('The CUDNN backend version: {}'.format(torch.backends.cudnn.version()))

The CUDNN backend version: None


Let's execute the cell below to display information about the GPUs running on the server.

In [3]:
!nvidia-smi

/bin/sh: nvidia-smi: command not found


Let's execute the cell below to display information about the PyTorch version running on the server.

In [4]:
# print current PyTorch version
now = datetime.utcnow().strftime("%Y%m%d-%H:%M:%S")
print('[LOG {}] PyTorch version: {}'.format(now, torch.__version__))

[LOG 20180213-13:53:00] PyTorch version: 0.3.0.post4


## 02. Lab Overview

ToDo -- Timur and Marco

<img align="middle" style="max-width: 550px; height: auto" src="images/accounting.png">

<img align="middle" style="max-width: 550px; height: auto" src="images/anomalies.png">

<img align="middle" style="max-width: 400px; height: auto" src="images/cube.gif">

## 03. Autoencoder Neural Networks

### 03.1 Autoencoder Neural Network Architecture

An autoencoder or replicator neural network defines a special type of feed- forward multilayer neural network that can be trained to reconstruct its input. The difference between the original input and its reconstruction is referred to as reconstruction error. Figure XX illustrates a schematic view of an autoencoder
neural network.

<img align="middle" style="max-width: 600px; height: auto" src="images/autoencoder.png">

**Figure XX:** Schematic view of an autoencoder network comprised of two non-linear mappings (fully connected feed forward neural networks) referred to as encoder $f_θ : R^{dx} \mapsto R^{dz}$ and decoder $g_θ : R^{dz} \mapsto R^{dy}$.

Autoencoder networks are usually comprised of two nonlinear mappings referred to as encoder $f_θ$ and decoder $g_θ$ network [13]. Most commonly the encoder and the decoder are of symmetrical architecture consisting of several layers of neurons followed by a nonlinear function and shared parameters θ. The encoder mapping $f_θ(·)$ maps an input vector $x^i$ to compressed representation $z^i$ referred to as latent space $Z$. This hidden representation $z^i$ is then mapped back by the decoder $g_θ(·)$ to a re-constructed vector $xˆi$ of the original input space. Formally, the nonlinear mappings of the encoder and the decoder can be defined by:

<center>$f_θ(x^i) = s(Wx^i + b)$, and $g_θ(z^i) = s′(W′z^i + d)$,</center>

where $s$ and $s′$ denote a non-linear activations with model parameters $θ = {W,b,W′,d}$, $W ∈ R^{dx×dz}$, $W′ ∈ R^{dz×dy}$ are weight matrices and $b ∈ R^{dx}$, $d ∈ R^{dz}$ are the offset bias vectors.

### 03.2 Autoencoder Neural Network Training

### 03.2 Implementing the Encoder Network

In [5]:
class encoder(nn.Module):

    def __init__(self):

        super(encoder, self).__init__()

        self.dropout = nn.Dropout(p=0.0, inplace=True)

        self.encoder_L1 = nn.Linear(186, 512, bias=True)
        nn.init.xavier_uniform(self.encoder_L1.weight)
        self.encoder_R1 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L2 = nn.Linear(512, 256, bias=True)
        nn.init.xavier_uniform(self.encoder_L2.weight)
        self.encoder_R2 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L3 = nn.Linear(256, 128, bias=True)
        nn.init.xavier_uniform(self.encoder_L3.weight)
        self.encoder_R3 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L4 = nn.Linear(128, 64, bias=True)
        nn.init.xavier_uniform(self.encoder_L4.weight)
        self.encoder_R4 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L5 = nn.Linear(64, 32, bias=True)
        nn.init.xavier_uniform(self.encoder_L5.weight)
        self.encoder_R5 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L6 = nn.Linear(32, 16, bias=True)
        nn.init.xavier_uniform(self.encoder_L6.weight)
        self.encoder_R6 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L7 = nn.Linear(16, 8, bias=True)
        nn.init.xavier_uniform(self.encoder_L7.weight)
        self.encoder_R7 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L8 = nn.Linear(8, 4, bias=True)
        nn.init.xavier_uniform(self.encoder_L8.weight)
        self.encoder_R8 = nn.LeakyReLU(negative_slope= 0.4, inplace=True)

        self.encoder_L9 = nn.Linear(4, 3, bias=True)
        nn.init.xavier_uniform(self.encoder_L9.weight)
        self.encoder_R9 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

    def forward(self, x):

        x = self.encoder_R1(self.dropout(self.encoder_L1(x)))
        x = self.encoder_R2(self.dropout(self.encoder_L2(x)))
        x = self.encoder_R3(self.dropout(self.encoder_L3(x)))
        x = self.encoder_R4(self.dropout(self.encoder_L4(x)))
        x = self.encoder_R5(self.dropout(self.encoder_L5(x)))
        x = self.encoder_R6(self.dropout(self.encoder_L6(x)))
        x = self.encoder_R7(self.dropout(self.encoder_L7(x)))
        x = self.encoder_R8(self.dropout(self.encoder_L8(x)))
        x = self.encoder_R9(self.encoder_L9(x))

        return x

#### 03.3 Implementing the Decoder Network

In [6]:
class decoder(nn.Module):

    def __init__(self):

        super(decoder, self).__init__()

        self.dropout = nn.Dropout(p=0.0, inplace=True)

        self.decoder_L1 = nn.Linear(3, 4, bias=True)
        nn.init.xavier_uniform(self.decoder_L1.weight)
        self.decoder_R1 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L2 = nn.Linear(4, 8, bias=True)
        nn.init.xavier_uniform(self.decoder_L2.weight)
        self.decoder_R2 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L3 = nn.Linear(8, 16, bias=True)
        nn.init.xavier_uniform(self.decoder_L3.weight)
        self.decoder_R3 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L4 = nn.Linear(16, 32, bias=True)
        nn.init.xavier_uniform(self.decoder_L4.weight)
        self.decoder_R4 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L5 = nn.Linear(32, 64, bias=True)
        nn.init.xavier_uniform(self.decoder_L5.weight)
        self.decoder_R5 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L6 = nn.Linear(64, 128, bias=True)
        nn.init.xavier_uniform(self.decoder_L6.weight)
        self.decoder_R6 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L7 = nn.Linear(128, 256, bias=True)
        nn.init.xavier_uniform(self.decoder_L7.weight)
        self.decoder_R7 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L8 = nn.Linear(256, 512, bias=True)
        nn.init.xavier_uniform(self.decoder_L8.weight)
        self.decoder_R8 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

        self.decoder_L9 = nn.Linear(512, 186, bias=True)
        nn.init.xavier_uniform(self.decoder_L9.weight)
        self.decoder_R9 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

    def forward(self, x):

        x = self.decoder_R1(self.dropout(self.decoder_L1(x)))
        x = self.decoder_R2(self.dropout(self.decoder_L2(x)))
        x = self.decoder_R3(self.dropout(self.decoder_L3(x)))
        x = self.decoder_R4(self.dropout(self.decoder_L4(x)))
        x = self.decoder_R5(self.dropout(self.decoder_L5(x)))
        x = self.decoder_R6(self.dropout(self.decoder_L6(x)))
        x = self.decoder_R7(self.dropout(self.decoder_L7(x)))
        x = self.decoder_R8(self.dropout(self.decoder_L8(x)))
        x = self.decoder_R9(self.decoder_L9(x))
        
        return x

In [7]:
# init training network classes / architectures
encoder_train = encoder()
decoder_train = decoder()

# push to cuda if cudnn is available
if (torch.backends.cudnn.version() != None):
    encoder_train = encoder().cuda()
    decoder_train = decoder().cuda()

In [8]:
# print the initialized architectures
now = datetime.utcnow().strftime("%Y%m%d-%H:%M:%S")
print('[LOG {}] encoder architecture:\n\n{}\n'.format(now, encoder_train))
print('[LOG {}] decoder architecture:\n\n{}\n'.format(now, decoder_train))

[LOG 20180213-13:53:08] encoder architecture:

encoder(
  (dropout): Dropout(p=0.0, inplace)
  (encoder_L1): Linear(in_features=186, out_features=512)
  (encoder_R1): LeakyReLU(0.4, inplace)
  (encoder_L2): Linear(in_features=512, out_features=256)
  (encoder_R2): LeakyReLU(0.4, inplace)
  (encoder_L3): Linear(in_features=256, out_features=128)
  (encoder_R3): LeakyReLU(0.4, inplace)
  (encoder_L4): Linear(in_features=128, out_features=64)
  (encoder_R4): LeakyReLU(0.4, inplace)
  (encoder_L5): Linear(in_features=64, out_features=32)
  (encoder_R5): LeakyReLU(0.4, inplace)
  (encoder_L6): Linear(in_features=32, out_features=16)
  (encoder_R6): LeakyReLU(0.4, inplace)
  (encoder_L7): Linear(in_features=16, out_features=8)
  (encoder_R7): LeakyReLU(0.4, inplace)
  (encoder_L8): Linear(in_features=8, out_features=4)
  (encoder_R8): LeakyReLU(0.4, inplace)
  (encoder_L9): Linear(in_features=4, out_features=3)
  (encoder_R9): LeakyReLU(0.4, inplace)
)

[LOG 20180213-13:53:08] decoder archit

## 04. Financial Fraud Detection Dataset

ToDo -- Timur

The dataset is extracted from Kaggle datasets repository at the following link: 
https://www.kaggle.com/ntnu-testimon/paysim1

In [9]:
# load the dataset
ori_dataset = pd.read_csv("./data/paysim1.zip", sep=",", header=0, encoding="utf-8")

IOError: [Errno 2] No such file or directory: './data/paysim1.zip'

In [None]:
ori_dataset.head()

In [None]:
# dimension of data
ori_dataset.shape 

In [None]:
# plot "step" and "type" attributes
sns.set()
fig, ax =plt.subplots(1,2)
fig.set_figwidth(15)
sns.distplot(ori_dataset['step'], kde=False, bins=100, ax=ax[0])
sns.countplot(x=ori_dataset['type'], ax=ax[1])

In [None]:
# let's have a look what "type" do fraudulent transactions possess
ori_dataset[ori_dataset['isFraud'] == 1]['type'].unique()

In [None]:
# we select a subset of "type": "transfer" ("cash-out" may be selected for optional excercise)
ori_subset = ori_dataset[ori_dataset['type'] == 'TRANSFER']

In [None]:
# dimension of data
ori_subset.shape

In [None]:
# plot "amount" attribute and its log scale
fig, ax =plt.subplots(1,2)
fig.set_figwidth(15)
sns.distplot(ori_subset['amount'], ax=ax[0])
sns.distplot(ori_subset['amount'].apply(np.log), ax=ax[1])

Preprocess numeric attributes

In [None]:
# select numeric attributes
numeric_attr_names = ['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']
epsilon = 1e-7

# add a small epsilon to eliminate zero values from data for log scaling
numeric_attr = ori_subset[numeric_attr_names] + epsilon
numeric_attr = numeric_attr.apply(np.log)

# normalize all numeric attribute to the range [0,1]
numeric_attr = (numeric_attr - numeric_attr.min()) / (numeric_attr.max() - numeric_attr.min())

# append 'isFraud' attribute
numeric_attr['isFraud'] = ori_subset['isFraud']

In [None]:
# plot numeric attributes scaled under natural log
if os.path.exists('./images/numeric_attr_pair_plot.png'):
    display(Image('./images/numeric_attr_pair_plot.png'))
else:
    sns.pairplot(data=numeric_attr, vars=numeric_attr_names, hue='isFraud')

In [None]:
# number of distinct values possessed by 'nameOrig'
ori_subset['nameOrig'].nunique()

In [None]:
# if we select first 2 or 3 characters
nameOrig_slice_2 = ori_subset['nameOrig'].str.slice(0, 2)
nameOrig_slice_3 = ori_subset['nameOrig'].str.slice(0, 3)

nameDest_slice_2 = ori_subset['nameDest'].str.slice(0, 2)
nameDest_slice_3 = ori_subset['nameDest'].str.slice(0, 3)

fig, ax =plt.subplots(1,2)
fig.set_figwidth(20)
sns.countplot(x=nameOrig_slice_2, ax=ax[0])
sns.countplot(x=nameOrig_slice_3, ax=ax[1])
# g.set_xticklabels(rotation=90)

In [None]:
# select categorical attributes
ori_subset_categ = pd.concat([ori_subset['type'], nameOrig_slice_3, nameDest_slice_3], 
                             axis = 1,
                             names = ['type', 'nameOrig', 'nameDest'])
# select numeric attributes
ori_subset_numeric = numeric_attr.drop('isFraud', axis=1)

In [None]:
# transform categorcal attributes into a sparse representation uning one-hot transformation
ori_subset_categ_transformed = pd.get_dummies(ori_subset_categ)

In [None]:
# join categorical and numeric subsets
ori_subset_transformed = pd.concat([ori_subset_categ_transformed, ori_subset_numeric],
                                   axis = 1)
ori_subset_transformed.shape

In [None]:
# prepate the dataset for pytorch loader
torch_dataset = torch.from_numpy(ori_subset_transformed.values).float()

In [None]:
# features = ["AccountID_Key", "CurrencyCode_Key", "TaxCode_Key", "CompanyKey_Key", "ShipToCountry_Key", "ShipFromCountry_Key"]
# ranges = [0, 62+1, 121+1, 183+1, 234+1, 349+1, 400+1]

# init training results
# columns = ["timestamp", "node", "seed", "architecture", "epoch", "rec_loss", "roc_auc", "anomalies", "normalies", "anomalies_s", "normalies_s", "max_threshold", "max_tpr_s", "max_fpr_s", "precision", "recall", "f1_score", "fpr", "tpr", "thresholds"]
# evaluations = ["err_" + str(element) for element in range(0, (len(features))+1)]
# columns.extend(evaluations)
# evaluations = ["ano_c1_" + str(element) for element in range(0, (len(features))+1)]
# columns.extend(evaluations)
# evaluations = ["ano_c2_" + str(element) for element in range(0, (len(features))+1)]
# columns.extend(evaluations)
# evaluation_results = pd.DataFrame(columns=columns)

## 05. Network Training

ToDo - Timur and Marco

In [None]:
# init deterministic seed
seed_value = 1234 #4444 #3333 #2222 #1111 #1234
rd.seed(seed_value) # set random seed
np.random.seed(seed_value) # set numpy seed
torch.manual_seed(seed_value) # set pytorch seed CPU
# torch.cuda.manual_seed(seed_value) # set pytorch seed GPU

# init training parameters
num_epochs = 50
mini_batch_size = 128
learning_rate = 1e-3

# convert to pytorch tensor - none cuda enabled
dataloader = DataLoader(torch_dataset, batch_size=mini_batch_size, shuffle=True, num_workers=0)
# set num_workers to zero to retreive deterministic results

# determine if CUDA is available at compute node
if (torch.backends.cudnn.version() != None):

    dataloader = DataLoader(torch_dataset.cuda(), batch_size=mini_batch_size, shuffle=True)

# define optimization criterion and optimizer
criterion = nn.BCEWithLogitsLoss()
encoder_optimizer = torch.optim.Adam(encoder_train.parameters(), lr=learning_rate)
decoder_optimizer = torch.optim.Adam(decoder_train.parameters(), lr=learning_rate)

# train autoencoder model
for epoch in range(num_epochs):

    # init mini batch counter
    mini_batch_count = 0

    # determine if CUDA is available at compute node
    if (torch.backends.cudnn.version() != None) and (use_cuda == True):

        # set all networks / models in CPU mode
        encoder_train.cuda()
        decoder_train.cuda()

    # set networks in training mode (apply dropout when needed)
    encoder_train.train()
    decoder_train.train()

    for mini_batch_data in dataloader:

        # increase mini batch counter
        mini_batch_count += 1

        # convert mini batch to torch variable
        mini_batch_torch = autograd.Variable(mini_batch_data)

        # =================== forward pass =====================

        # run forward pass
        z_representation = encoder_train(mini_batch_torch) # encode mini-batch data
        mini_batch_reconstruction = decoder_train(z_representation) # decode mini-batch data

        # determine reconstruction loss
        reconstruction_loss = criterion(mini_batch_reconstruction, mini_batch_torch)

        # =================== backward pass ====================

        # reset graph gradients
        decoder_optimizer.zero_grad()
        encoder_optimizer.zero_grad()

        # run backward pass
        reconstruction_loss.backward()

        # update network parameters
        decoder_optimizer.step()
        encoder_optimizer.step()

        # =================== log ==============================

        if mini_batch_count % 100 == 0:

            # print mini batch reconstuction results
            now = datetime.utcnow().strftime("%Y%m%d-%H_%M_%S")
            print('[LOG {}] training status, epoch: [{:04}/{:04}], batch: {:04}, loss: {:.10f}'.format(now, (epoch+1), num_epochs, mini_batch_count, reconstruction_loss.data[0]))

    # save trained encoder model file to disk
    now = datetime.utcnow().strftime("%Y%m%d-%H_%M_%S")
    encoder_model_name = "{}_ep_{}_encoder_model.pth".format(now, (epoch+1))
    torch.save(encoder_train.state_dict(), os.path.join("./models", encoder_model_name))

    # save trained decoder model file to disk
    decoder_model_name = "{}_ep_{}_decoder_model.pth".format(now, (epoch+1))
    torch.save(decoder_train.state_dict(), os.path.join("./models", decoder_model_name))

## 06. Result Evaluation

In [None]:
# specify trained models to be loaded
encoder_model_name = "20180211-09_02_16_ep_50_encoder_model.pth"
decoder_model_name = "20180211-09_02_16_ep_50_decoder_model.pth"

# init training network classes / architectures
encoder_eval = encoder()
decoder_eval = decoder()

# load trained models
encoder_eval.load_state_dict(torch.load(os.path.join("models", encoder_model_name)))
decoder_eval.load_state_dict(torch.load(os.path.join("models", decoder_model_name)))

In [None]:
# convert encoded transactional data to torch Variable
data = autograd.Variable(torch_dataset)

# set networks in training mode (don't apply dropout)
encoder_eval.eval()
decoder_eval.eval()

# reconstruct encoded transactional data
reconstruction = decoder_eval(encoder_eval(data))

In [None]:
# determine reconstruction loss - all transactions
reconstruction_loss_all = criterion(reconstruction, data)

# print reconstruction loss - all transactions
now = datetime.utcnow().strftime("%Y%m%d-%H:%M:%S")
print('[LOG {}] collected reconstruction loss of: {:06}/{:06} transactions'.format(now, reconstruction.size()[0], reconstruction.size()[0]))
print('[LOG {}] reconstruction loss: {:.10f}'.format(now, reconstruction_loss_all.data[0]))

In [None]:
# init binary cross entropy errors
reconstruction_loss_transaction = np.zeros(reconstruction.size()[0])

# iterate over all detailed reconstructions
for i in range(0, reconstruction.size()[0]):

    # determine reconstruction loss - individual transactions
    reconstruction_loss_transaction[i] = criterion(reconstruction[i], data[i]).data[0]

    if(i % 100000 == 0):

        ### print conversion summary
        now = datetime.utcnow().strftime("%Y%m%d-%H:%M:%S")
        print('[LOG {}] collected individial reconstruction loss of: {:06}/{:06} transactions'.format(now, i, reconstruction.size()[0]))

In [None]:
# prepare plot
fig = plt.figure()
ax = fig.add_subplot(111)

plot_data = np.column_stack((np.arange(len(reconstruction_loss_transaction)), 
                             reconstruction_loss_transaction))

regular_data = plot_data[ori_subset.isFraud == 0]
fraud_data = plot_data[ori_subset.isFraud == 1]

# plot reconstruction error scatter plot
ax.scatter(regular_data[:, 0], regular_data[:, 1], c='C0', alpha=0.4, marker="o") # plot regular transactions
ax.scatter(fraud_data[:, 0], fraud_data[:, 1], c='C1', marker="^") # plot fraudulent transactions


## 07. Optional Excercises

ToDo - Timur and Marco

## 08. Lab Summary

ToDo - Timur and Marco

## 09. Next Steps

ToDo - Timur and Marco