<h1>TECHNICAL REPORT - FINANCIAL FORENSIC DATA ANALYSIS<span class="tocSkip"></span></h1>

Author: Amir Yunus<br>
GitHub: https://github.com/AmirYunus/GA_DSI_Capstone
***

# PREFACE

## Background: Fraud and the Accounting System

The Association of Certified Fraud Examiners estimates in its Global Study on Occupational Fraud and Abuse, 2018 [[1]](../documents/2018-report-to-the-nations.pdf) that organisations lose 5% of their annual revenues to fraud. In their report, the term **"occupational fraud"** refers to, 

>_" . . . an attack against the organisation from within, by the very people who were entrusted to protect its assets and resources"_.

A similar study, conducted by the auditors of PwC [[2]](../documents/global-economic-crime-and-fraud-survey-2018.pdf), revealed that 30% of the study respondents experienced losses of between USD 100,000 and USD 5 million in the last 24 months (as of 2018). The study also showed that financial statement fraud caused by far the greatest median loss of the surveyed fraud schemes.

At the same time organizations accelerate the digitisation and reconfiguration of business processes [[3]](../documents/accelerating_the_digitization_of_business_processes.pdf) affecting in particular Accounting Information Systems (AIS) or more general Enterprise Resource Planning (ERP) systems.

<img align="middle" style="height: auto" src="../images/accounting.png">

**Figure 1:** Hierarchical view of an Accounting Information System (AIS) that records distinct layers of abstraction, namely (1) the business process information, (2) the accounting information as well as the (3) technical journal entry information in designated database tables.

Steadily, these systems collect vast quantities of electronic evidence at an almost atomic level. This holds in particular for the journal entries of an organization recorded in its general ledger and sub-ledger accounts. SAP, one of the most prominent ERP software providers, estimates that approx. 76% of the world's transaction revenue touches one of their systems [5].

The illustration in **Figure 1** depicts a hierarchical view of an Accounting Information System (AIS) recording process and journal entry information in designated database tables. In the context of fraud examinations, the data collected by such systems may contain valuable traces of a potential fraud scheme.

## Classification of Financial Anomalies

When conducting a detailed examination of real-world journal entries, usually recorded in large-scaled AIS or ERP systems, two prevalent characteristics can be observed:

> - specific transactions attributes exhibit **a high variety of distinct attribute values** e.g. customer information, posted sub-ledgers, amount information, and 
> - the transactions exhibit **strong dependencies between specific attribute values** e.g. between customer information and type of payment, posting type and general ledgers. 

Derived from this observation we distinguish two classes of anomalous journal entries, namely **"global"** and **"local" anomalies** as illustrated in **Figure 2** below:

<img align="middle" style="height: auto" src="../images/anomalies.png">

**Figure 2:** Illustrative example of global and local anomalies portrait in a feature space of the two transaction features "Posting Amount" (Feature 1) and "Posting Positions" (Feature 2).

***Global Anomalies***, are financial transactions that exhibit **unusual or rare individual attribute values**. These anomalies usually relate to highly skewed attributes e.g. seldom posting users, rarely used ledgers, or unusual posting times. 

Traditionally "red-flag" tests, performed by auditors during annual audits, are designed to capture those types of anomalies. However, such tests might result in a high volume of false positive alerts due to e.g. regular reverse postings, provisions and year-end adjustments usually associated with a low fraud risk.

***Local Anomalies***, are financial transactions that exhibit an **unusual or rare combination of attribute values** while the individual attribute values occur quite frequently e.g. unusual accounting records. 

This type of anomaly is significantly more difficult to detect since perpetrators intend to disguise their activities trying to imitate a regular behaviour. As a result, such anomalies usually pose a high fraud risk since they might correspond to e.g. misused user accounts, irregular combinations of general ledger accounts and posting keys that don't follow an usual activity pattern.

## Executive Summary

The objective of this lab is to walk you through a deep learning based methodology that can be used to detect of global and local anomalies in financial datasets. The proposed method is based on the following assumptions: 

>1. the majority of financial transactions recorded within an organizations’ ERP-system relate to regular day-to-day business activities and perpetrators need to deviate from the ”regular” in order to conduct fraud,
>2. such deviating behaviour will be recorded by a very limited number of financial transactions and their respective attribute values or combination of attribute values and we refer to such deviation as "anomaly".

Concluding from these assumptions we can learn a model of regular journal entries with minimal ”harm” caused by the potential anomalous ones.

In order to detect such anomalies, we will train deep autoencoder networks to learn a compressed but "lossy" model of regular transactions and their underlying posting pattern. Imposing a strong regularization onto the network hidden layers limits the networks' ability to memorize the characteristics of anomalous journal entries. Once the training process is completed, the network will be able to reconstruct regular journal entries, while failing to do so for the anomalous ones.

After completing the lab you should be familiar with:

>1. the basic concepts, intuitions and major building blocks of autoencoder neural networks,
>2. the techniques of pre-processing financial data in order to learn a model of its characteristics,
>3. the application of autoencoder neural networks to detect anomalies in large-scale financial data, and,
>4. the interpretation of the detection results of the networks as well as its reconstruction loss. 

Please note, that this lab is neither a complete nor comprehensive forensic data analysis approach or fraud examination strategy. However, the methodology and code provided in this lab can be modified or adapted to detect anomalous records in a variety of financial datasets. Subsequently, the detected records might serve as a starting point for a more detailed and substantive examination by auditors or compliance personnel. 

For this lab, we assume that you are familiar with the general concepts of deep neural networks (DNN) and GPUs as well as PyTorch and Python. For more information on these concepts please check the relevant labs of NVIDIA's Deep Learning Institute (DLI). 

Think about potential fraud scenarios of your organization:

>1. What scenarios or fraudulent activities you could think of? [3 min]
>2. What data sources might affect or record those potential fraudulent activities? [5 min]
>3. What kind of data analytics techniques could be applied to detect those activities? [5 min]

## Content

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#PREFACE" data-toc-modified-id="PREFACE-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>PREFACE</a></span><ul class="toc-item"><li><span><a href="#Background:-Fraud-and-the-Accounting-System" data-toc-modified-id="Background:-Fraud-and-the-Accounting-System-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Background: Fraud and the Accounting System</a></span></li><li><span><a href="#Classification-of-Financial-Anomalies" data-toc-modified-id="Classification-of-Financial-Anomalies-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Classification of Financial Anomalies</a></span></li><li><span><a href="#Executive-Summary" data-toc-modified-id="Executive-Summary-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Executive Summary</a></span></li><li><span><a href="#Content" data-toc-modified-id="Content-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Content</a></span></li><li><span><a href="#Data-Dictionary" data-toc-modified-id="Data-Dictionary-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Data Dictionary</a></span></li><li><span><a href="#Libraries" data-toc-modified-id="Libraries-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Libraries</a></span></li><li><span><a href="#Functions" data-toc-modified-id="Functions-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Functions</a></span><ul class="toc-item"><li><span><a href="#Basic-Utilities" data-toc-modified-id="Basic-Utilities-1.7.1"><span class="toc-item-num">1.7.1&nbsp;&nbsp;</span>Basic Utilities</a></span></li><li><span><a href="#Define-Networks" data-toc-modified-id="Define-Networks-1.7.2"><span class="toc-item-num">1.7.2&nbsp;&nbsp;</span>Define Networks</a></span><ul class="toc-item"><li><span><a href="#Encoder" data-toc-modified-id="Encoder-1.7.2.1"><span class="toc-item-num">1.7.2.1&nbsp;&nbsp;</span>Encoder</a></span></li><li><span><a href="#Decoder" data-toc-modified-id="Decoder-1.7.2.2"><span class="toc-item-num">1.7.2.2&nbsp;&nbsp;</span>Decoder</a></span></li><li><span><a href="#Discriminator" data-toc-modified-id="Discriminator-1.7.2.3"><span class="toc-item-num">1.7.2.3&nbsp;&nbsp;</span>Discriminator</a></span></li><li><span><a href="#Autoencoder" data-toc-modified-id="Autoencoder-1.7.2.4"><span class="toc-item-num">1.7.2.4&nbsp;&nbsp;</span>Autoencoder</a></span></li></ul></li></ul></li></ul></li><li><span><a href="#ABOUT-THE-DATASET" data-toc-modified-id="ABOUT-THE-DATASET-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>ABOUT THE DATASET</a></span><ul class="toc-item"><li><span><a href="#Importing-the-Dataset" data-toc-modified-id="Importing-the-Dataset-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Importing the Dataset</a></span></li></ul></li><li><span><a href="#EXPLORATORY-DATA-ANALYSIS" data-toc-modified-id="EXPLORATORY-DATA-ANALYSIS-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>EXPLORATORY DATA ANALYSIS</a></span><ul class="toc-item"><li><span><a href="#Data-Attributes" data-toc-modified-id="Data-Attributes-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Data Attributes</a></span></li><li><span><a href="#Exploring-Dataset-Using-Benford's-Law" data-toc-modified-id="Exploring-Dataset-Using-Benford's-Law-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Exploring Dataset Using Benford's Law</a></span></li><li><span><a href="#Categorical-Features" data-toc-modified-id="Categorical-Features-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Categorical Features</a></span><ul class="toc-item"><li><span><a href="#BELNR" data-toc-modified-id="BELNR-3.3.1"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span>BELNR</a></span></li><li><span><a href="#WAERS" data-toc-modified-id="WAERS-3.3.2"><span class="toc-item-num">3.3.2&nbsp;&nbsp;</span>WAERS</a></span></li><li><span><a href="#BUKRS" data-toc-modified-id="BUKRS-3.3.3"><span class="toc-item-num">3.3.3&nbsp;&nbsp;</span>BUKRS</a></span></li><li><span><a href="#KTOSL" data-toc-modified-id="KTOSL-3.3.4"><span class="toc-item-num">3.3.4&nbsp;&nbsp;</span>KTOSL</a></span></li><li><span><a href="#PRCTR" data-toc-modified-id="PRCTR-3.3.5"><span class="toc-item-num">3.3.5&nbsp;&nbsp;</span>PRCTR</a></span></li><li><span><a href="#BSCHL" data-toc-modified-id="BSCHL-3.3.6"><span class="toc-item-num">3.3.6&nbsp;&nbsp;</span>BSCHL</a></span></li><li><span><a href="#HKONT" data-toc-modified-id="HKONT-3.3.7"><span class="toc-item-num">3.3.7&nbsp;&nbsp;</span>HKONT</a></span></li></ul></li><li><span><a href="#Numerical-Features" data-toc-modified-id="Numerical-Features-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Numerical Features</a></span><ul class="toc-item"><li><span><a href="#DMBTR" data-toc-modified-id="DMBTR-3.4.1"><span class="toc-item-num">3.4.1&nbsp;&nbsp;</span>DMBTR</a></span></li><li><span><a href="#WRBTR" data-toc-modified-id="WRBTR-3.4.2"><span class="toc-item-num">3.4.2&nbsp;&nbsp;</span>WRBTR</a></span></li></ul></li></ul></li><li><span><a href="#FEATURE-ENGINEERING" data-toc-modified-id="FEATURE-ENGINEERING-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>FEATURE ENGINEERING</a></span><ul class="toc-item"><li><span><a href="#One-Hot-Encoding" data-toc-modified-id="One-Hot-Encoding-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>One Hot Encoding</a></span></li><li><span><a href="#Log-Transform" data-toc-modified-id="Log-Transform-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Log Transform</a></span></li><li><span><a href="#Merge-Features" data-toc-modified-id="Merge-Features-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Merge Features</a></span></li></ul></li><li><span><a href="#DBSCAN" data-toc-modified-id="DBSCAN-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>DBSCAN</a></span></li></ul></div>

## Data Dictionary

## Libraries

In [None]:
import os
import warnings
import matplotlib
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf

from sklearn.preprocessing import MinMaxScaler
from sklearn.externals import joblib
from numpy.random import seed
from keras.layers import Input, Dropout, Dense, LSTM, TimeDistributed, RepeatVector, LeakyReLU
from keras.models import Model
from keras import regularizers
from datetime import datetime
from sklearn.model_selection import train_test_split
from IPython.display import display, Markdown
from keras.callbacks import EarlyStopping, ModelCheckpoint, CSVLogger

In [None]:
sns.set(color_codes=True)
%matplotlib inline
# tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

In [None]:
seed(10)
tf.compat.v1.set_random_seed(10)

In [None]:
# # import utilities
# import os
# import sys
# import random
# import io
# import urllib
# import warnings
# import IPython
# import csv

# from IPython.display import display, Markdown
# from datetime import datetime

# # import data science libraries
# import pandas as pd
# import random as rd
# import numpy as np

# # import pytorch libraries
# import torch
# import torch.optim as optim
# import pytorch_lightning as pl
# from torch import nn
# from torch.utils.data import DataLoader
# from torch.nn import functional as F

# # import python plotting libraries
# import matplotlib.pyplot as plt
# import seaborn as sns

# Prevent warnings from distracting the reader
warnings.filterwarnings('ignore')

# Colour scheme and style selected
theme = ['#1F306E', '#553772', '#8F3B76', '#C7417B', '#F5487F']
colors_palette = sns.palplot(sns.color_palette(theme))
plt.style.use('seaborn')
sns.set(style="white", color_codes=True)
sns.set_palette(colors_palette)

# Forces Matplotlib to use high-quality images
ip = get_ipython()
ibe = ip.configurables[-1]
ibe.figure_formats = {'pdf', 'png'}    

## Functions

### Basic Utilities

In [None]:
%autosave 60

In [None]:
nl = "\n"

In [None]:
if not os.path.exists('../data'): os.makedirs('../data')  # create data directory
if not os.path.exists('../models'): os.makedirs('../models')  # create trained models directory

In [None]:
##################################
# Define LOG
##################################
def log ():
    now = str(datetime.now())
    print(f'[LOG {now}]')
    return

In [None]:
# ##################################
# # Define Euclidean Distance Calculation
# ##################################
# def compute_euclid_distance(x, y):
    
#     # calculate euclidean distance 
#     euclidean_distance = np.sqrt(np.sum((x - y) ** 2, axis=1))
    
#     # return euclidean distance
#     return euclidean_distance

### Define Networks

#### Encoder

In [None]:
##################################
# Define Encoder Model - in/16/z
##################################
def encoder_model (X):
    # input
    model_input = Input(shape=(X.shape[1], X.shape[2]))
    
    # encoder
    L01 = LeakyReLU(16, return_sequences=True, kernel_regularizer=regularizers.l2(0.00))(model_input)
    model_output = LeakyReLU(4, return_sequences=False)(L01)
    
    # define model
    model = Model(inputs=model_input, outputs=model_output)
    
    # return model
    return model

#### Decoder

In [None]:
##################################
# Define Decoder Model - z/16/out
##################################
def decoder_model (X):
    # input
    model_input = Input(shape=(X.shape[1]))
    
    # decoder
    L01 = LeakyReLU(16, return_sequences=True)(model_input)
    model_output = Dense(X.shape[2], activation = 'sigmoid', return_sequences=True)(L01)
    
    # define model
    model = Model(inputs=model_input, outputs=model_output)
    
    # return model
    return model

#### Discriminator

In [None]:
##################################
# Define Discriminator Model - z/16/z
##################################
def discriminator_model (X):
    # input
    model_input = Input(shape=(X.shape[1]))
    
    # decoder
    L01 = LeakyReLU(16, return_sequences=True)(model_input)
    model_output = Dense(4, activation = 'sigmoid', return_sequences=True)(L01)
    
    # define model
    model = Model(inputs=model_input, outputs=model_output)
    
    # return model
    return model

#### Autoencoder

In [None]:
##################################
# Define Autoencoder Model
##################################
def autoencoder_model (X):
    # input
    model_input = Input(shape=(X.shape[1], X.shape[2]))
    
    # encoder
    L01 = LSTM(512, activation = tf.nn.leaky_relu, return_sequences=True, kernel_regularizer=regularizers.l2(0.00))(model_input)
    L02 = LSTM(256, activation = tf.nn.leaky_relu, return_sequences=True)(L01)
    L03 = LSTM(128, activation = tf.nn.leaky_relu, return_sequences=True)(L02)
    L04 = LSTM(64, activation = tf.nn.leaky_relu, return_sequences=True)(L03)
    L05 = LSTM(32, activation = tf.nn.leaky_relu, return_sequences=True)(L04)
    L06 = LSTM(16, activation = tf.nn.leaky_relu, return_sequences=True)(L05)
    L07 = LSTM(4, activation = tf.nn.leaky_relu, return_sequences=True)(L06)
    L08 = LSTM(2, activation = tf.nn.leaky_relu, return_sequences=False)(L07)
    
    # latent space
    L09 = RepeatVector(X.shape[1])(L08)
    
    # decoder
    L10 = LSTM(2, activation = tf.nn.leaky_relu, return_sequences=True)(L09)
    L11 = LSTM(4, activation = tf.nn.leaky_relu, return_sequences=True)(L10)
    L12 = LSTM(16, activation = tf.nn.leaky_relu, return_sequences=True)(L11)
    L13 = LSTM(32, activation = tf.nn.leaky_relu, return_sequences=True)(L12)
    L14 = LSTM(64, activation = tf.nn.leaky_relu, return_sequences=True)(L13)
    L15 = LSTM(128, activation = tf.nn.leaky_relu, return_sequences=True)(L14)
    L16 = LSTM(256, activation = tf.nn.leaky_relu, return_sequences=True)(L15)
    L17 = LSTM(512, activation = tf.nn.leaky_relu, return_sequences=True)(L16)
    
    # output
    model_output = TimeDistributed(Dense(X.shape[2]))(L17)
    
    # define model
    model = Model(inputs=model_input, outputs=model_output)
    
    # return model
    return model

In [None]:
# # ##################################
# # # Define Autoencoder Model
# # ##################################
# def autoencoder_model (X):
#     # input
#     model_input = Input(shape=(X.shape[1], X.shape[2]))
    
#     # encoder
#     L01 = LSTM(512, activation = tf.nn.leaky_relu, return_sequences=True, kernel_regularizer=regularizers.l2(0.00))(model_input)
#     L02 = LSTM(256, activation = tf.nn.leaky_relu, return_sequences=True)(L01)
#     L03 = LSTM(128, activation = tf.nn.leaky_relu, return_sequences=True)(L02)
#     L04 = LSTM(64, activation = tf.nn.leaky_relu, return_sequences=True)(L03)
#     L05 = LSTM(32, activation = tf.nn.leaky_relu, return_sequences=True)(L04)
#     L06 = LSTM(16, activation = tf.nn.leaky_relu, return_sequences=True)(L05)
#     L07 = LSTM(4, activation = tf.nn.leaky_relu, return_sequences=True)(L06)
#     L08 = LSTM(2, activation = tf.nn.leaky_relu, return_sequences=False)(L07)
    
#     # latent space
#     L09 = RepeatVector(X.shape[1])(L08)
    
#     # decoder
#     L10 = LSTM(2, activation = tf.nn.leaky_relu, return_sequences=True)(L09)
#     L11 = LSTM(4, activation = tf.nn.leaky_relu, return_sequences=True)(L10)
#     L12 = LSTM(16, activation = tf.nn.leaky_relu, return_sequences=True)(L11)
#     L13 = LSTM(32, activation = tf.nn.leaky_relu, return_sequences=True)(L12)
#     L14 = LSTM(64, activation = tf.nn.leaky_relu, return_sequences=True)(L13)
#     L15 = LSTM(128, activation = tf.nn.leaky_relu, return_sequences=True)(L14)
#     L16 = LSTM(256, activation = tf.nn.leaky_relu, return_sequences=True)(L15)
#     L17 = LSTM(512, activation = tf.nn.leaky_relu, return_sequences=True)(L16)
    
#     # output
#     model_output = TimeDistributed(Dense(X.shape[2]))(L17)
    
#     # define model
#     model = Model(inputs=model_input, outputs=model_output)
    
#     # return model
#     return model

In [None]:
# # ##################################
# # # Define Autoencoder Model - 16/4/z/4/16
# # ##################################
# def autoencoder_model (X):
#     # input
#     model_input = Input(shape=(X.shape[1], X.shape[2]))
    
#     # encoder
#     L01 = Dense(16, activation = tf.nn.leaky_relu, return_sequences=True, kernel_regularizer=regularizers.l2(0.00))(model_input)
#     L02 = Dense(4, activation = tf.nn.leaky_relu, return_sequences=False)(L01)
    
#     # latent space
#     L03 = RepeatVector(X.shape[1])(L02)
    
#     # decoder
#     L04 = Dense(4, activation = tf.nn.leaky_relu, return_sequences=True)(L03)
#     L05 = Dense(16, activation = tf.nn.leaky_relu, return_sequences=True)(L04)
    
#     # output
#     model_output = TimeDistributed(Dense(X.shape[2]))(L05)
    
#     # define model
#     model = Model(inputs=model_input, outputs=model_output)
    
#     # return model
#     return model

In [None]:
# # define encoder class
# class Encoder(nn.Module):

#     # define class constructor
#     def __init__(self, input_size, hidden_size):

#         # call super class constructor
#         super(Encoder, self).__init__()

#         # specify first layer - in 618, out 256
#         self.map_L1 = nn.Linear(input_size, hidden_size[0], bias=True) # init linearity
#         nn.init.xavier_uniform_(self.map_L1.weight) # init weights according to [9]
#         nn.init.constant_(self.map_L1.bias, 0.0) # constant initialization of the bias
#         self.map_R1 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity according to [10]

#         # specify second layer - in 256, out 64
#         self.map_L2 = nn.Linear(hidden_size[0], hidden_size[1], bias=True)
#         nn.init.xavier_uniform_(self.map_L2.weight)
#         nn.init.constant_(self.map_L2.bias, 0.0)
#         self.map_R2 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

#         # specify third layer - in 64, out 16
#         self.map_L3 = nn.Linear(hidden_size[1], hidden_size[2], bias=True)
#         nn.init.xavier_uniform_(self.map_L3.weight)
#         nn.init.constant_(self.map_L3.bias, 0.0)
#         self.map_R3 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

#         # specify fourth layer - in 16, out 4
#         self.map_L4 = nn.Linear(hidden_size[2], hidden_size[3], bias=True)
#         nn.init.xavier_uniform_(self.map_L4.weight)
#         nn.init.constant_(self.map_L4.bias, 0.0)
#         self.map_R4 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

#         # specify fifth layer - in 4, out 2
#         self.map_L5 = nn.Linear(hidden_size[3], hidden_size[4], bias=True)
#         nn.init.xavier_uniform_(self.map_L5.weight)
#         nn.init.constant_(self.map_L5.bias, 0.0)
#         self.map_R5 = torch.nn.LeakyReLU(negative_slope=0.4, inplace=True)
        
#     # define forward pass
#     def forward(self, x):

#         # run forward pass through the network
#         x = self.map_R1(self.map_L1(x))
#         x = self.map_R2(self.map_L2(x))
#         x = self.map_R3(self.map_L3(x))
#         x = self.map_R4(self.map_L4(x))
#         x = self.map_R5(self.map_L5(x))

#         # return result
#         return x

In [None]:
# ##################################
# # Define Encoder
# ##################################
# class encoder(nn.Module):

#     # define class constructor
#     def __init__(self,input_size):
        
#         # call super class constructor
#         super(encoder, self).__init__()
        
#         # first layer - in input size, out 512
#         self.l1 = nn.Linear(input_size, 512, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l1.weight) # init weights according to [9]
#         nn.init.constant_(self.l1.bias, 0.0) # constant initialization of the bias
#         self.r1 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity according to [10]

#         # second layer - in 512, out 256
#         self.l2 = nn.Linear(512, 256, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l2.weight) # init weights
#         nn.init.constant_(self.l2.bias, 0.0) # constant initialization of the bias
#         self.r2 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # third layer - in 256, out 128
#         self.l3 = nn.Linear(256, 128, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l3.weight) # init weights
#         nn.init.constant_(self.l3.bias, 0.0) # constant initialization of the bias
#         self.r3 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # fourth layer - in 128, out 64
#         self.l4 = nn.Linear(128, 64, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l4.weight) # init weights
#         nn.init.constant_(self.l4.bias, 0.0) # constant initialization of the bias
#         self.r4 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # fifth layer - in 64, out 32
#         self.l5 = nn.Linear(64, 32, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l5.weight) # init weights
#         nn.init.constant_(self.l5.bias, 0.0) # constant initialization of the bias
#         self.r5 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # sixth layer - in 32, out 16
#         self.l6 = nn.Linear(32, 16, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l6.weight) # init weights
#         nn.init.constant_(self.l6.bias, 0.0) # constant initialization of the bias
#         self.r6 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # seventh layer - in 16, out 8
#         self.l7 = nn.Linear(16, 8, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l7.weight) # init weights
#         nn.init.constant_(self.l7.bias, 0.0) # constant initialization of the bias
#         self.r7 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # eigth layer - in 8, out 4
#         self.l8 = nn.Linear(8, 4, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l8.weight) # init weights
#         nn.init.constant_(self.l8.bias, 0.0) # constant initialization of the bias
#         self.r8 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
 
#         # ninth layer - in 4, out 2
#         self.l9 = nn.Linear(4, 2, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l9.weight) # init weights
#         nn.init.constant_(self.l9.bias, 0.0) # constant initialization of the bias
#         self.r9 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#     # define forward pass
#     def forward(self, x):
        
#         # run forward pass through the network
#         x = self.r1(self.l1(x))
#         x = self.r2(self.l2(x))
#         x = self.r3(self.l3(x))
#         x = self.r4(self.l4(x))
#         x = self.r5(self.l5(x))
#         x = self.r6(self.l6(x))
#         x = self.r7(self.l7(x))
#         x = self.r8(self.l8(x))
#         x = self.r9(self.l9(x))
        
#         # return result
#         return x

In [None]:
# # ##################################
# # # Define Decoder Model
# # ##################################
# def decoder_model (X):
#     # input
#     model_input = Input(shape = X.shape[1])
    
#     # decoder
#     L01 = LSTM(2, activation = tf.nn.leaky_relu, return_sequences=True)(model_input)
#     L02 = LSTM(4, activation = tf.nn.leaky_relu, return_sequences=True)(L01)
#     L03 = LSTM(16, activation = tf.nn.leaky_relu, return_sequences=True)(L02)
#     L04 = LSTM(32, activation = tf.nn.leaky_relu, return_sequences=True)(L03)
#     L05 = LSTM(64, activation = tf.nn.leaky_relu, return_sequences=True)(L04)
#     L06 = LSTM(128, activation = tf.nn.leaky_relu, return_sequences=True)(L05)
#     L07 = LSTM(256, activation = tf.nn.leaky_relu, return_sequences=True)(L06)
#     L08 = LSTM(512, activation = tf.nn.leaky_relu, return_sequences=True)(L07)
    
#     # output
#     model_output = TimeDistributed(Dense(X.shape[2]))(L08)
    
#     # define model
#     model = Model(inputs=model_input, outputs=model_output)
    
#     # return model
#     return model

In [None]:
# # define decoder class
# class Decoder(nn.Module):

#     # define class constructor
#     def __init__(self, output_size, hidden_size):

#         # call super class constructor
#         super(Decoder, self).__init__()

#         # specify first layer - in 2, out 4
#         self.map_L1 = nn.Linear(hidden_size[0], hidden_size[1], bias=True) # init linearity
#         nn.init.xavier_uniform_(self.map_L1.weight) # init weights according to [9]
#         nn.init.constant_(self.map_L1.bias, 0.0) # constant initialization of the bias
#         self.map_R1 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity according to [10]

#         # specify second layer - in 4, out 16
#         self.map_L2 = nn.Linear(hidden_size[1], hidden_size[2], bias=True)
#         nn.init.xavier_uniform_(self.map_L2.weight)
#         nn.init.constant_(self.map_L2.bias, 0.0)
#         self.map_R2 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

#         # specify third layer - in 16, out 64
#         self.map_L3 = nn.Linear(hidden_size[2], hidden_size[3], bias=True)
#         nn.init.xavier_uniform_(self.map_L3.weight)
#         nn.init.constant_(self.map_L3.bias, 0.0)
#         self.map_R3 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

#         # specify fourth layer - in 64, out 256
#         self.map_L4 = nn.Linear(hidden_size[3], hidden_size[4], bias=True)
#         nn.init.xavier_uniform_(self.map_L4.weight)
#         nn.init.constant_(self.map_L4.bias, 0.0)
#         self.map_R4 = nn.LeakyReLU(negative_slope=0.4, inplace=True)
        
#         # specify fifth layer - in 256, out 618
#         self.map_L5 = nn.Linear(hidden_size[4], output_size, bias=True)
#         nn.init.xavier_uniform_(self.map_L5.weight)
#         nn.init.constant_(self.map_L5.bias, 0.0)
#         self.map_S5 = torch.nn.Sigmoid()

#     # define forward pass
#     def forward(self, x):

#         # run forward pass through the network
#         x = self.map_R1(self.map_L1(x))
#         x = self.map_R2(self.map_L2(x))
#         x = self.map_R3(self.map_L3(x))
#         x = self.map_R4(self.map_L4(x))
#         x = self.map_S5(self.map_L5(x))

#         # return result
#         return x

In [None]:
# ##################################
# # Define Decoder
# ##################################
# class decoder(nn.Module):

#     # define class constructor
#     def __init__(self,output_size):
        
#         # call super class constructor
#         super(decoder, self).__init__()
        
#         # first layer - in 2, out 4
#         self.l1 = nn.Linear(2, 4, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l1.weight) # init weights according to [9]
#         nn.init.constant_(self.l1.bias, 0.0) # constant initialization of the bias
#         self.r1 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity according to [10]

#         # second layer - in 4, out 8
#         self.l2 = nn.Linear(4, 8, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l2.weight) # init weights
#         nn.init.constant_(self.l2.bias, 0.0) # constant initialization of the bias
#         self.r2 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # third layer - in 8, out 16
#         self.l3 = nn.Linear(8, 16, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l3.weight) # init weights
#         nn.init.constant_(self.l3.bias, 0.0) # constant initialization of the bias
#         self.r3 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # fourth layer - in 16, out 32
#         self.l4 = nn.Linear(16, 32, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l4.weight) # init weights
#         nn.init.constant_(self.l4.bias, 0.0) # constant initialization of the bias
#         self.r4 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # fifth layer - in 32, out 64
#         self.l5 = nn.Linear(32, 64, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l5.weight) # init weights
#         nn.init.constant_(self.l5.bias, 0.0) # constant initialization of the bias
#         self.r5 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # sixth layer - in 64, out 128
#         self.l6 = nn.Linear(64, 128, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l6.weight) # init weights
#         nn.init.constant_(self.l6.bias, 0.0) # constant initialization of the bias
#         self.r6 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # seventh layer - in 128, out 256
#         self.l7 = nn.Linear(128, 256, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l7.weight) # init weights
#         nn.init.constant_(self.l7.bias, 0.0) # constant initialization of the bias
#         self.r7 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # eigth layer - in 256, out 512
#         self.l8 = nn.Linear(256, 512, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l8.weight) # init weights
#         nn.init.constant_(self.l8.bias, 0.0) # constant initialization of the bias
#         self.r8 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
 
#         # ninth layer - in 512, out output size
#         self.l9 = nn.Linear(512, output_size, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l9.weight) # init weights
#         nn.init.constant_(self.l9.bias, 0.0) # constant initialization of the bias
#         self.s9 = nn.Sigmoid() # sigmoid transformation

#     # define forward pass
#     def forward(self, x):
        
#         # run forward pass through the network
#         x = self.r1(self.l1(x))
#         x = self.r2(self.l2(x))
#         x = self.r3(self.l3(x))
#         x = self.r4(self.l4(x))
#         x = self.r5(self.l5(x))
#         x = self.r6(self.l6(x))
#         x = self.r7(self.l7(x))
#         x = self.r8(self.l8(x))
#         x = self.s9(self.l9(x))
        
#         # return result
#         return x

In [None]:
# # define discriminator class
# class Discriminator(nn.Module):

#     # define class constructor
#     def __init__(self, input_size, hidden_size, output_size):

#         # call super class constructor
#         super(Discriminator, self).__init__()

#         # specify first layer - in 2, out 256
#         self.map_L1 = nn.Linear(input_size, hidden_size[0], bias=True) # init linearity
#         nn.init.xavier_uniform_(self.map_L1.weight) # init weights according to [9]
#         nn.init.constant_(self.map_L1.bias, 0.0) # constant initialization of the bias
#         self.map_R1 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity according to [10]

#         # specify second layer - in 256, out 16
#         self.map_L2 = nn.Linear(hidden_size[0], hidden_size[1], bias=True)
#         nn.init.xavier_uniform_(self.map_L2.weight)
#         nn.init.constant_(self.map_L2.bias, 0.0)
#         self.map_R2 = nn.LeakyReLU(negative_slope=0.4, inplace=True)

#         # specify third layer - in 16, out 4
#         self.map_L3 = nn.Linear(hidden_size[1], hidden_size[2], bias=True)
#         nn.init.xavier_uniform_(self.map_L3.weight)
#         nn.init.constant_(self.map_L3.bias, 0.0)
#         self.map_R3 = nn.LeakyReLU(negative_slope=0.4, inplace=True)
        
#         # specify fourth layer - in 4, out 2
#         self.map_L4 = nn.Linear(hidden_size[2], output_size, bias=True)
#         nn.init.xavier_uniform_(self.map_L4.weight)
#         nn.init.constant_(self.map_L4.bias, 0.0)
#         self.map_S4 = torch.nn.Sigmoid()

#     # define forward pass
#     def forward(self, x):

#         # run forward pass through the network
#         x = self.map_R1(self.map_L1(x))
#         x = self.map_R2(self.map_L2(x))
#         x = self.map_R3(self.map_L3(x))
#         x = self.map_S4(self.map_L4(x))

#         # return result
#         return x

In [None]:
# ##################################
# # Define Discriminator
# ##################################
# class discriminator(nn.Module):

#     # define class constructor
#     def __init__(self):
        
#         # call super class constructor
#         super(discriminator, self).__init__()
        
#         # first layer - in 2, out 512
#         self.l1 = nn.Linear(2, 512, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l1.weight) # init weights according to [9]
#         nn.init.constant_(self.l1.bias, 0.0) # constant initialization of the bias
#         self.r1 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity according to [10]

#         # second layer - in 512, out 256
#         self.l2 = nn.Linear(512, 256, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l2.weight) # init weights
#         nn.init.constant_(self.l2.bias, 0.0) # constant initialization of the bias
#         self.r2 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # third layer - in 256, out 128
#         self.l3 = nn.Linear(256, 128, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l3.weight) # init weights
#         nn.init.constant_(self.l3.bias, 0.0) # constant initialization of the bias
#         self.r3 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # fourth layer - in 128, out 64
#         self.l4 = nn.Linear(128, 64, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l4.weight) # init weights
#         nn.init.constant_(self.l4.bias, 0.0) # constant initialization of the bias
#         self.r4 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # fifth layer - in 64, out 32
#         self.l5 = nn.Linear(64, 32, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l5.weight) # init weights
#         nn.init.constant_(self.l5.bias, 0.0) # constant initialization of the bias
#         self.r5 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # sixth layer - in 32, out 16
#         self.l6 = nn.Linear(32, 16, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l6.weight) # init weights
#         nn.init.constant_(self.l6.bias, 0.0) # constant initialization of the bias
#         self.r6 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # seventh layer - in 16, out 8
#         self.l7 = nn.Linear(16, 8, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l7.weight) # init weights
#         nn.init.constant_(self.l7.bias, 0.0) # constant initialization of the bias
#         self.r7 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
        
#         # eigth layer - in 8, out 4
#         self.l8 = nn.Linear(8, 4, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l8.weight) # init weights
#         nn.init.constant_(self.l8.bias, 0.0) # constant initialization of the bias
#         self.r8 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity
 
#         # ninth layer - in 4, out 2
#         self.l9 = nn.Linear(4, 2, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l9.weight) # init weights
#         nn.init.constant_(self.l9.bias, 0.0) # constant initialization of the bias
#         self.r9 = nn.LeakyReLU(negative_slope=0.4, inplace=True) # add non-linearity

#         # tenth layer - in 2, out 1
#         self.l0 = nn.Linear(2, 1, bias=True) # init linearity
#         nn.init.xavier_uniform_(self.l0.weight) # init weights
#         nn.init.constant_(self.l0.bias, 0.0) # constant initialization of the bias
#         self.s0 = nn.Sigmoid() # sigmoid transformation

#     # define forward pass
#     def forward(self, x):
        
#         # run forward pass through the network
#         x = self.r1(self.l1(x))
#         x = self.r2(self.l2(x))
#         x = self.r3(self.l3(x))
#         x = self.r4(self.l4(x))
#         x = self.r5(self.l5(x))
#         x = self.r6(self.l6(x))
#         x = self.r7(self.l7(x))
#         x = self.r8(self.l8(x))
#         x = self.r9(self.l9(x))
#         x = self.s0(self.l0(x))
        
#         # return result
#         return x

# ABOUT THE DATASET

In this section, we will conduct a descriptive analysis of the labs financial dataset. Furthermore, we will apply some necessary pre-processing steps to train a deep neural network. The lab is based on a derivation of the **"Synthetic Financial Dataset For Fraud Detection"** by Lopez-Rojas [6] available via the Kaggle predictive modelling and analytics competitions platform that can be obtained using the following link: https://www.kaggle.com/ntnu-testimon/paysim1.

Let's start loading the dataset and investigate its structure and attributes:

## Importing the Dataset

In [None]:
log()

# load the dataset into the notebook kernel
ori_dataset = pd.read_csv('../data/fraud_dataset_v2.csv')
# inspect the datasets dimensionalities
print(F'Transactional dataset of {ori_dataset.shape[0]} rows and {ori_dataset.shape[1]} columns loaded')

# EXPLORATORY DATA ANALYSIS

We augmented the dataset and renamed the attributes to appear more similar to a real-world dataset that one usually observes in SAP-ERP systems as part of SAP's Finance and Cost controlling (FICO) module. 

The dataset contains a subset of in total 7 categorical and 2 numerical attributes available in the FICO BKPF (containing the posted journal entry headers) and BSEG (containing the posted journal entry segments) tables. Please, find below a list of the individual attributes as well as a brief description of their respective semantics:

>- `BELNR`: the accounting document number,
>- `BUKRS`: the company code,
>- `BSCHL`: the posting key,
>- `HKONT`: the posted general ledger account,
>- `PRCTR`: the posted profit center,
>- `WAERS`: the currency key,
>- `KTOSL`: the general ledger account key,
>- `DMBTR`: the amount in local currency,
>- `WRBTR`: the amount in document currency.

Let's also have a closer look into the top 10 rows of the dataset:

## Data Attributes

In [None]:
# inspect top rows of dataset
log()
ori_dataset.head(3)

You may also have noticed the attribute `label` in the data. We will use this field throughout the lab to evaluate the quality of our trained models. The field describes the true nature of each individual transaction of either being a **regular** transaction (denoted by `regular`) or an **anomaly** (denoted by `global` and `local`). Let's have closer look into the distribution of the regular vs. anomalous transactions in the dataset:

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[9]}</b> - {ori_dataset.label.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.label.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.label.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.label} has {n_nan} NaNs')

In [None]:
log()
# plot distribution of feature
plt.figure(figsize=(16,9))
sns.countplot(ori_dataset.label)

In [None]:
log()
# number of anomalies vs. regular transactions
ori_dataset.label.value_counts()

In [None]:
log()
# number of anomalies vs. regular transactions
ori_dataset.label.value_counts(normalize = True)

Ok, the statistic reveals that, similar to real world scenarios, we are facing a highly "unbalanced" dataset. Overall, the dataset contains only a small fraction of **100 (0.018%)** anomalous transactions. While the 100 anomalous entries encompass **70 (0.013%)** "global" anomalies and **30 (0.005%)** "local" anomalies as introduced in section 1.2.

In [None]:
log()
# remove the "ground-truth" label information for the following steps of the lab
label = ori_dataset.pop('label')

## Exploring Dataset Using Benford's Law

Benford's law is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford's law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on. - From [Wikipedia](https://en.wikipedia.org/wiki/Benford%27s_law)

In [None]:
log()
# make a copy of original dataset
ori_dataset_bf = ori_dataset.copy()

# map out the first digit and display top 5 rows
ori_dataset_bf['FIRST_DIGIT'] = ori_dataset_bf.DMBTR.map(lambda a: str(a)[0]).astype(int)
ori_dataset_bf.head()

In [None]:
log()
# display the actual percentage distribution of dataset
actuals = ori_dataset_bf.FIRST_DIGIT.value_counts(normalize=True).sort_index()
actuals

In [None]:
log()
# calculate the expected distribution based on Benford's Law
digits = list(range(1,10))
benford = [np.log10(1 + 1/d) for d in digits]
plt.figure(figsize = (16,9))

# plot graph to visualise distribution
plt.bar(digits, benford, label='Exptected')
plt.plot(actuals, color='r', label='Actual')
plt.xticks(digits)
plt.legend();

## Categorical Features

From the initial data assessment above we can observe that the majority of attributes recorded in AIS- and ERP-systems correspond to categorical (discrete) attribute values, e.g. the posting date, the general-ledger account, the posting type, the currency. Let's have a more detailed look into the distribution of two dataset attributes, namely (1) the posting key `BSCHL` as well as (2) the general ledger account `HKONT`:

### BELNR

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[0]}</b> - {ori_dataset.BELNR.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.BELNR.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.BELNR.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.BELNR} has {n_nan} NaNs')

In [None]:
log()
# plot distribution of feature
plt.figure(figsize=(16,9))
sns.distplot(ori_dataset.BELNR)
plt.title('Distribution of BELNR observations', fontsize = 20)

### WAERS

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[1]}</b> - {ori_dataset.WAERS.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.WAERS.value_counts()}\n')
print()
n_nan = ori_dataset.WAERS.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.WAERS} has {n_nan} NaNs')

In [None]:
log()
# prepare to plot posting key and general ledger account side by side

fig, ax = plt.subplots()
fig.set_figwidth(16)
fig.set_figheight(9)

# plot the distribution of the posting key attribute
g = sns.countplot(x=ori_dataset.loc[label=='regular', 'WAERS'])
g.set_xticklabels(g.get_xticklabels(), rotation=0)
g.set_title('Distribution of WAERS observations', fontsize = 20)

### BUKRS

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[2]}</b> - {ori_dataset.BUKRS.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.BUKRS.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.BUKRS.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.BUKRS} has {n_nan} NaNs')

In [None]:
log()
# prepare to plot posting key and general ledger account side by side

fig, ax = plt.subplots()
fig.set_figwidth(16)
fig.set_figheight(20)

# plot the distribution of the posting key attribute
g = sns.countplot(y=ori_dataset.loc[label=='regular', 'BUKRS'])
g.set_xticklabels(g.get_xticklabels(), rotation=0)
g.set_title('Distribution of BUKRS observations', fontsize = 20)

### KTOSL

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[3]}</b> - {ori_dataset.KTOSL.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.KTOSL.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.KTOSL.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.KTOSL} has {n_nan} NaNs')

In [None]:
log()
# prepare to plot posting key and general ledger account side by side

fig, ax = plt.subplots()
fig.set_figwidth(16)
fig.set_figheight(9)

# plot the distribution of the posting key attribute
g = sns.countplot(x=ori_dataset.loc[label=='regular', 'KTOSL'])
g.set_xticklabels(g.get_xticklabels(), rotation=0)
g.set_title('Distribution of KTOSL observations', fontsize = 20)

### PRCTR

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[4]}</b> - {ori_dataset.PRCTR.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.PRCTR.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.PRCTR.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.PRCTR} has {n_nan} NaNs')

In [None]:
log()
# prepare to plot posting key and general ledger account side by side

fig, ax = plt.subplots()
fig.set_figwidth(16)
fig.set_figheight(20)

# plot the distribution of the posting key attribute
g = sns.countplot(y=ori_dataset.loc[label=='regular', 'PRCTR'])
g.set_xticklabels(g.get_xticklabels(), rotation=0)
g.set_title('Distribution of PRCTR observations', fontsize = 20)

### BSCHL

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[5]}</b> - {ori_dataset.BSCHL.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.BSCHL.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.BSCHL.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.BSCHL} has {n_nan} NaNs')

In [None]:
log()
# prepare to plot posting key and general ledger account side by side

fig, ax = plt.subplots()
fig.set_figwidth(16)
fig.set_figheight(9)

# plot the distribution of the posting key attribute
g = sns.countplot(x=ori_dataset.loc[label=='regular', 'BSCHL'])
g.set_xticklabels(g.get_xticklabels(), rotation=0)
g.set_title('Distribution of BSCHL observations', fontsize = 20)

### HKONT

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[6]}</b> - {ori_dataset.HKONT.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.HKONT.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.HKONT.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.HKONT} has {n_nan} NaNs')

In [None]:
log()
# prepare to plot posting key and general ledger account side by side

fig, ax = plt.subplots()
fig.set_figwidth(16)
fig.set_figheight(9)

# plot the distribution of the posting key attribute
g = sns.countplot(x=ori_dataset.loc[label=='regular', 'HKONT'])
g.set_xticklabels(g.get_xticklabels(), rotation=0)
g.set_title('Distribution of HKONT observations', fontsize = 20)

## Numerical Features

Let's now inspect the distributions of the two numerical attributes contained in the transactional dataset namely, the (1) local currency amount `DMBTR` and the (2) document currency amount `WRBTR`:

### DMBTR

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[7]}</b> - {ori_dataset.DMBTR.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.DMBTR.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.DMBTR.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.DMBTR} has {n_nan} NaNs')

In [None]:
log()
# plot distribution of feature
plt.figure(figsize=(16,9))
sns.distplot(ori_dataset.DMBTR)
plt.title('Distribution of DMBTR observations', fontsize = 20)

### WRBTR

In [None]:
log()
# Display object type and values for each feature
display(Markdown(f'<b>{ori_dataset.columns[8]}</b> - {ori_dataset.WRBTR.dtype}'))
display(Markdown(f'Values:'))
print(f'{ori_dataset.WRBTR.value_counts(normalize=True)}\n')
print()
n_nan = ori_dataset.WRBTR.isnull().sum()
if n_nan > 0:
    print(f'{ori_dataset.WRBTR} has {n_nan} NaNs')

In [None]:
log()
# plot distribution of feature
plt.figure(figsize=(16,9))
sns.distplot(ori_dataset.WRBTR)
plt.title('Distribution of WRBTR observations', fontsize = 20)

As expected, it can be observed, that for both attributes the distributions of amount values are heavy tailed. 

# FEATURE ENGINEERING

## One Hot Encoding

Unfortunately, neural networks are in general not designed to be trained directly on categorical data and require the attributes to be trained on to be numeric. One simple way to meet this requirement is by applying a technique referred to as **"one-hot" encoding**. Using this encoding technique, we will derive a numerical representation of each of the categorical attribute values. One-hot encoding creates new binary columns for each categorical attribute value present in the original data. 

Let's work through a brief example: The **categorical attribute “Receiver”** below contains the names "John", "Timur" and "Marco". We "one-hot" encode the names by creating a separate binary column for each possible name value observable in the "Receiver" column. Now, we encode for each transaction that contains the value "John" in the "Receiver" column this observation with 1.0 in the newly created "John" column and 0.0 in all other created name columns.

<img align="middle" style="max-width: 430px; height: auto" src="../images/encoding.png">

Using this technique will "one-hot" encode the 6 categorical attributes in the original transactional dataset. This can be achieved using the `get_dummies()` function available in the Pandas data science library:  

In [None]:
# select categorical attributes to be "one-hot" encoded
log()

categorical_attr_names = ['WAERS', 'BUKRS', 'KTOSL', 'PRCTR', 'BSCHL', 'HKONT']

# encode categorical attributes into a binary one-hot encoded representation 
ori_dataset_categ_transformed = pd.get_dummies(ori_dataset[categorical_attr_names])

Finally, let's inspect the encoding of 10 sample transactions to see if we have been successfull.

In [None]:
# inspect encoded sample transactions
log()
ori_dataset_categ_transformed.head(3)

## Log Transform

Recall that the numeric features are heavily tailed. In order to approach faster a potential global minimum scaling and normalization of numerical input values is good a practice. Therefore, we first log-scale both variables and second min-max normalize the scaled amounts to the interval [0, 1].

In [None]:
# select "DMBTR" vs. "WRBTR" attribute
log()

numeric_attr_names = ['DMBTR', 'WRBTR']

# add a small epsilon to eliminate zero values from data for log scaling
numeric_attr = ori_dataset[numeric_attr_names] + 1e-7
numeric_attr = numeric_attr.apply(np.log)

# normalize all numeric attributes to the range [0,1]
ori_dataset_numeric_attr = (numeric_attr - numeric_attr.min()) / (numeric_attr.max() - numeric_attr.min())

Let's now visualize the log-scaled and min-max normalized distributions of both attributes:

In [None]:
# init the plots
log()

fig, ax = plt.subplots(1,2)
fig.set_figwidth(16)
fig.set_figheight(9)

# plot distribution of the local amount attribute
g = sns.distplot(ori_dataset_numeric_attr['DMBTR'].tolist(), ax=ax[0])
g.set_title('Distribution of scaled DMBTR amount values')

# set axis-labels 
ax[0].set_xlabel('DMBTR')
ax[0].set_ylabel('density')

# plot distribution of the local amount attribute
g = sns.distplot(ori_dataset_numeric_attr['WRBTR'].tolist(), ax=ax[1])
g.set_title('Distribution of scaled WRBTR amount values')

# set axis-labels
ax[1].set_xlabel('WRBTR')
ax[1].set_ylabel('density');

In [None]:
log()
# plot distribution of feature
plt.figure(figsize=(16,9))
sns.distplot(ori_dataset_numeric_attr.DMBTR)
plt.title('Distribution of scaled DMBTR observations', fontsize = 20)

In [None]:
log()
# plot distribution of feature
plt.figure(figsize=(16,9))
sns.distplot(ori_dataset_numeric_attr.WRBTR)
plt.title('Distribution of scaled WRBTR observations', fontsize = 20)

Ok, let's now visually investigate the scaled distributions of both attributes in terms of the distinct anomaly classes contained in the population of journal entries:

In [None]:
# append 'label' attribute 
log()

numeric_attr_vis = ori_dataset_numeric_attr.copy()
numeric_attr_vis['label'] = label

# plot the log-scaled and min-max normalized numeric attributes
g = sns.pairplot(data=numeric_attr_vis, vars=numeric_attr_names, hue='label', diag_kind='kde', palette={'regular': 'C0', 'local': 'C3', 'global': 'C1'}, markers=['o', 'x', 'x'])

# set figure title
g.fig.suptitle('Distribution of DMBTR vs. WRBTR amount values', y=1.02, fontsize = 20)

# set figure size
g.fig.set_size_inches(16, 9)

Ok, as anticipated the numeric attribute values of the "global" anomalies (orange) fall outside the range of the regular amount distributions due to their unusual high amount values. In contrast, the numeric attribute values of the "local" anomalies (red) are much more commingled within the regular transaction amounts.
As DMBTR attribute contains a number of extreme values we might want to visulalize its distribution by omitting those set of extreme values.

## Merge Features

Finally, we merge both pre-processed numerical and categorical attributes into a single dataset that we will use for training our deep autoencoder neural network (explained an implemented in the following section 4.)

Now, let's again have a look at the dimensionality of the dataset after we applied the distinct pre-processing steps to the attributes:

In [None]:
log ()
# merge categorical and numeric subsets
ori_subset_transformed = pd.concat([ori_dataset_categ_transformed, ori_dataset_numeric_attr], axis = 1)

# inspect final dimensions of pre-processed transactional data
ori_subset_transformed.shape

Following the pre-processing steps above you may have noticed, that we didn't encode the attributes `WAERS` and `BUKRS` yet. This we left as an exercise for you:

>1. Plot and inspect the distribution of the values of both attributes `WAERS` and `BUKRS`. [3 min]
>2. Encode both variables using the `get_dummies()` method provided by the Pandas library. [5 min]
>3. Merge your encoding results with the Pandas `ori_subset_transformed` data frame. [5 min]

Ok, upon completion of all the pre-processing steps (incl. the exercises) we should end up with an encoded dataset consisting of a total number of 533,009 records (rows) and **618 encoded attributes** (columns). Let's keep the number number of columns in mind since it will define the dimensionality of the input- and output-layer of our deep autoencoder network which we will now implement in the following section.

# DBSCAN

In [None]:
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import IsolationForest

In [None]:
label_int = label.copy()

In [None]:
for i in range(len(label_int)):
    if label_int[i] == "regular":
        label_int[i] = 0
    elif label_int[i] == "global":
        label_int[i] = 1
    elif label_int[i] == "local":
        label_int[i] = 0
    else:
        print("error")

In [None]:
df_train, df_test, df_y_train, df_y_test = \
train_test_split(
    ori_subset_transformed, 
    label_int, 
    random_state=42, 
    stratify=label_int
)

In [None]:
X_train, X_val, y_train, y_val = \
train_test_split(
    df_train, 
    df_y_train, 
    random_state=42, 
    stratify=df_y_train
)

In [None]:
ss = MinMaxScaler()
X_train = ss.fit_transform(X_train)
X_val = ss.transform(X_val)

In [None]:
pca = PCA(n_components=200)
X_train = pca.fit_transform(X_train)
X_val = pca.transform(X_val)

In [None]:
# X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
print(f'Training data shape {X_train.shape}')

# X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
print(f'Testing data shape {X_val.shape}')

In [None]:
dbscan = DBSCAN(
    eps = .2,
    metric='euclidean', 
    min_samples = 5,
    n_jobs = -1)

In [None]:
dbscan.fit(X_train, y_train)

In [None]:
pred = dbscan.predict(X_val)

In [None]:
cmat = confusion_matrix(y_val.astype(int), pred)
print(f'TP - True Negative {cmat[0,0]}')
print(f'FP - False Positive {cmat[0,1]}')
print(f'FN - False Negative {cmat[1,0]}')
print(f'TP - True Positive {cmat[1,1]}')
print(f'Accuracy Rate: {np.divide(np.sum([cmat[0,0],cmat[1,1]]),np.sum(cmat))}')
print(f'Misclassification Rate: {np.divide(np.sum([cmat[0,1],cmat[1,0]]),np.sum(cmat))}')

In [None]:
print(classification_report(y_val.astype(int), pred, digits=5))

In [None]:
label_int = label.copy()

In [None]:
for i in range(len(label_int)):
    if label_int[i] == "regular":
        label_int[i] = 0
    elif label_int[i] == "global":
        label_int[i] = 0
    elif label_int[i] == "local":
        label_int[i] = 1
    else:
        print("error")

In [None]:
df_train, df_test, df_y_train, df_y_test = \
train_test_split(
    ori_subset_transformed, 
    label_int, 
    random_state=42, 
    stratify=label_int
)

In [None]:
X_train, X_val, y_train, y_val = \
train_test_split(
    df_train, 
    df_y_train, 
    random_state=42, 
    stratify=df_y_train
)

In [None]:
ss = MinMaxScaler()
X_train = ss.fit_transform(X_train)
X_val = ss.transform(X_val)

In [None]:
pca = PCA(n_components=200)
X_train = pca.fit_transform(X_train)
X_val = pca.transform(X_val)

In [None]:
# X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
print(f'Training data shape {X_train.shape}')

# X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
print(f'Testing data shape {X_val.shape}')

In [None]:
dbscan = DBSCAN(
    eps = .2,
    metric='euclidean', 
    min_samples = 5,
    n_jobs = -1)

In [None]:
dbscan.fit(X_train, y_train)

In [None]:
pred = dbscan.predict(X_val)

In [None]:
cmat = confusion_matrix(y_val.astype(int), pred)
print(f'TP - True Negative {cmat[0,0]}')
print(f'FP - False Positive {cmat[0,1]}')
print(f'FN - False Negative {cmat[1,0]}')
print(f'TP - True Positive {cmat[1,1]}')
print(f'Accuracy Rate: {np.divide(np.sum([cmat[0,0],cmat[1,1]]),np.sum(cmat))}')
print(f'Misclassification Rate: {np.divide(np.sum([cmat[0,1],cmat[1,0]]),np.sum(cmat))}')

In [None]:
print(classification_report(y_val.astype(int), pred, digits=5))

In [None]:
label_int = label.copy()

In [None]:
for i in range(len(label_int)):
    if label_int[i] == "regular":
        label_int[i] = 0
    elif label_int[i] == "global":
        label_int[i] = 1
    elif label_int[i] == "local":
        label_int[i] = 1
    else:
        print("error")

In [None]:
df_train, df_test, df_y_train, df_y_test = \
train_test_split(
    ori_subset_transformed, 
    label_int, 
    random_state=42, 
    stratify=label_int
)

In [None]:
ss = MinMaxScaler()
df_train = ss.fit_transform(df_train)
df_test = ss.transform(df_test)

In [None]:
pca = PCA(n_components=200)
df_train = pca.fit_transform(df_train)
df_test = pca.transform(df_test)

In [None]:
# X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
print(f'Training data shape {df_train.shape}')

# X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
print(f'Testing data shape {df_test.shape}')

In [None]:
dbscan = DBSCAN(
    eps = .2,
    metric='euclidean', 
    min_samples = 5,
    n_jobs = -1)

In [None]:
dbscan.fit(df_train, df_y_train)

In [None]:
pred = dbscan.predict(df_test)

In [None]:
cmat = confusion_matrix(df_y_test.astype(int), pred)
print(f'TP - True Negative {cmat[0,0]}')
print(f'FP - False Positive {cmat[0,1]}')
print(f'FN - False Negative {cmat[1,0]}')
print(f'TP - True Positive {cmat[1,1]}')
print(f'Accuracy Rate: {np.divide(np.sum([cmat[0,0],cmat[1,1]]),np.sum(cmat))}')
print(f'Misclassification Rate: {np.divide(np.sum([cmat[0,1],cmat[1,0]]),np.sum(cmat))}')

In [None]:
print(classification_report(df_y_test.astype(int), pred, digits=5))