# Graph Analytics - Fraud Detection

<a href="https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Fraud_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Traditional fraud detection solutions view data through a straw, focusing on discrete data points such as specific accounts, individuals, devices or IP addresses. However, today’s sophisticated fraudsters escape detection by forming fraud rings or curious loops composed of stolen and synthetic identities and circuitous back channels. To uncover such fraud rings, it is essential to look beyond individual data points in individual data sources to a broader view of the connection patterns between them that exist in multiple disparate data sources.
 
ArangoDB’s multi-model graph allows you to easily fuse together disparate data and identify complex fraudulent patterns of connections, such as fraud rings, using the ArangoDB Query Language (AQL). The identification of fraud ring patterns require very deep (multi-hop) traversals across the graph.  The query for detecting a fraud ring can be accomplished in six lines of (easy to write and maintain) AQL code and ArangoDB can execute these queries with sub-second response times.

Thanks to https://twitter.com/arthurakeen for inspiration!

![fraud_overview](img/fraud_detection_collections.png)

# Setup

First, setting up our environment.

In [None]:
%%capture
!git clone https://github.com/joerg84/Graph_Powered_ML_Workshop.git
!rsync -av Graph_Powered_ML_Workshop/ ./ --exclude=.git
!pip3 install pyarango

In [None]:
import csv
import json
import requests
import sys
import oasis


from pyArango.connection import *
from pyArango.collection import Collection, Edges, Field
from pyArango.graph import Graph, EdgeDefinition
from pyArango.collection import BulkOperation as BulkOperation

First, create a temporary database:

In [None]:
# Retrieve tmp credentials from ArangoDB Tutorial Service
login = oasis.getTempCredentials()

# Connect to the temp database
conn = oasis.connect(login)
db = conn[login["dbName"]] 

In [None]:
print("https://"+login["hostname"]+":"+str(login["port"]))
print("Username: " + login["username"])
print("Password: " + login["password"])
print("Database: " + login["dbName"])

Feel free to use to above URL to checkout the UI!

# Data Import 

*Note the included arangorestore will only work on Linux system, if you want to run this notebook on a different OS please consider using the appropriate arangorestore from the [Download](https://www.arangodb.com/download-major/) area.*

In [None]:
!./tools/arangorestore -c none --server.endpoint http+ssl://{login["hostname"]}:{login["port"]} --server.username {login["username"]} --server.database {login["dbName"]} --server.password {login["password"]} --default-replication-factor 3  --input-directory "data/fraud_dump"

# Create Graph 

The graph we will be using in the following analysis looks as follows:

![graph](img/fraud_graph.jpeg)

In [None]:
from pyArango.collection import Collection, Field
from pyArango.graph import Graph, EdgeDefinition


class account(Collection):
    _fields = {
        "Name": Field()
    }
    
class customer(Collection):
    _fields = {
        "Name": Field()
    }
    
class transaction(Edges): 
    _fields = {
    }

class accountHolder(Edges): 
    _fields = {
    }

class FraudDetection(Graph) :
    _edgeDefinitions = [EdgeDefinition("accountHolder", fromCollections=["customer"], toCollections=["account"]),EdgeDefinition("transaction", fromCollections=["account"], toCollections=["account"])]
    _orphanedCollections = []

fraudGraph = db.createGraph("FraudDetection")

print("Collection/Graph Setup done.")

# Fraud Detection

We will look at 3 different techniques to identify fraudulant behavior:

1. Find long loops (potential fraud rings) from a suspicious account. 
2. Find all long loops (potential fraud rings)
3. Find suspicious accounts (e.g. dormant or orphan accounts)
4. Find disaggregation(fanout)/re-aggregation(fan-in) money laundering patterns. 



## Long Loops

This query searches the whole graph for accounts involved in long loops, meaning transactions starting at one account and after a long circle (IN 5..10 OUTBOUND) the money gets back to the account that started the transaction.This looping behavior is important to us because it is a method for attempting to circumvent local laws. In really large datasets with millions of data points this can be a long running query but it should complete quickly with this example dataset.

In [None]:
# find loops for account 10000032
loop_query = """
FOR accounts IN account
   FOR acct, tx, path IN 5..10 OUTBOUND  'account/10000032'  GRAPH 'FraudDetection'
   FILTER tx._to == 'account/10000032'
RETURN DISTINCT path
"""

queryResult = db.AQLQuery(loop_query, rawResults=True)
for result in queryResult:
    print(result)
    print()

You can also execute the AQL part of the all the queries directly in the ArangoDB UI (using the above link and login). The result will include a handy graph representation for visual inspection.

![result](img/fraud_loop.png)

## Find All Suspicious Long Loops
In this step you can find all suspicious long loops of an account and its financial transactions.

In [None]:
# find loops for all accounts
loop_all_query = """
WITH transaction, account
FOR suspicous_account IN account
   FOR acct, tx, path IN 5..10 OUTBOUND suspicous_account._id  GRAPH 'FraudDetection'
   PRUNE tx._to == suspicous_account._id
   FILTER tx._to == suspicous_account._id
RETURN  path
"""

queryResult = db.AQLQuery(loop_all_query, rawResults=True)
for result in queryResult:
    print(result)
    print()

## Find Orphan Accounts

Orphan Accounts are accounts which have little or no transactions. These accounts may be set up in advance of money laundering operations. The query below finds accounts with no transactions. 

Note that the query below is not a graph query but a JOIN operation. You can see that we are nesting FOR loops which is a classical JOIN operation in ArangoDB (well, just without the JOIN keyword)

In [None]:
# find orphaned accounts
orphaned_query = """
LET usedResources = UNION_DISTINCT(
  FOR relationship IN transaction RETURN relationship._from, 
  FOR relationship IN transaction RETURN relationship._to
) 
FOR resource IN account 
  FILTER resource._id NOT IN usedResources 
  SORT resource.account_type, resource.customer_id 
  RETURN {
  "customerName" : DOCUMENT(CONCAT("customer/", resource.customer_id)).Name, 
  "customerID": resource.customer_id,
  "accountID": resource._id, 
  "type": resource.account_type 
  }
"""

queryResult = db.AQLQuery(orphaned_query, rawResults=True)
for result in queryResult:
    print(result)
    print()   

# Anti Money Laundering Pattern Detection

Find transaction patterns that contain a disaggregation and re-aggregation of funds pattern.

 This pattern is characterized by transactions that dis-aggregate funds from a source account to multiple accounts in amounts that are below a reporting threshold, (e.g., just below $10,000) followed by a series of small transactions into 1 or more accounts, followed by re-aggregation
    of the small transactions into a destination account.

In [None]:
# find aml account
aml_query = """
WITH account, transaction
LET accountOutDegree = (FOR transaction IN transaction
    COLLECT accountOut = transaction._from WITH COUNT INTO outDegree
    RETURN {account : accountOut, outDegree : outDegree})
LET accountInDegree = (FOR transaction IN transaction
    COLLECT accountIn = transaction._to WITH COUNT INTO inDegree
    RETURN  {account : accountIn, inDegree : inDegree} )
LET accountDegree = (FOR inRecord in accountInDegree
   FOR outRecord in accountOutDegree
   FILTER inRecord.account == outRecord.account 
   RETURN MERGE(inRecord, outRecord))
LET maxAccount = (FOR maxDegree IN accountOutDegree 
                    FILTER maxDegree.outDegree == MAX(accountOutDegree[*].outDegree)
                    RETURN maxDegree)[0]
FOR account, transaction IN 1..4 OUTBOUND maxAccount.account transaction
RETURN transaction
"""

queryResult = db.AQLQuery(aml_query, rawResults=True)
for result in queryResult:
    print(result)
    print()  

# Cleanup

In [None]:
# Delete collections
db.dropAllCollections() 
db.reload()