# FlowScope: Spotting Money Laundering Based on Graphs

## Abstract

Given a graph of the money transfers between accounts of
a bank, how can we detect money laundering? Money laundering refers to criminals using the bank’s services to move
massive amounts of illegal money to untraceable destination
accounts, in order to inject their illegal money into the legitimate financial system. Existing graph fraud detection approaches focus on dense subgraph detection, without considering the fact that money laundering involves high-volume
flows of funds through chains of bank accounts, thereby
decreasing their detection accuracy. Instead, we propose to
model the transactions using a multipartite graph, and detect the complete flow of money from source to destination
using a scalable algorithm, FlowScope. Theoretical analysis
shows that FlowScope provides guarantees in terms of the
amount of money that fraudsters can transfer without being
detected. FlowScope outperforms state-of-the-art baselines in
accurately detecting the accounts involved in money laundering, in both injected and real-world data settings.


In [None]:
import spartan as st

In [None]:
# load graph data
fs1_tensor_data = st.loadTensor(path = "./inputData/fs_in_data.csv.gz", header=None)
fs2_tensor_data = st.loadTensor(path = "./inputData/fs_out_data.csv.gz", header=None)

"tensor_data.data" has two-colum attributes, and a single-colum values. The following table shows an example of 1000 two-tuple (source account id, destination account id) and the 3th-colum is the money.

| row id |    0    |    1    |    2   |
| :----  | :----   | :----   | :----  |
| 0 | 0  |  3009   | 1000 |
| 1 | 1  |  915   | 937 |
| 2 | 2  |  3061   | 0 |
| 3 | 3  |  55   | 6000 |
| 4 | 4  |  939   | 157 |
| ... | ...  |  ...  | ... |
|995|	621	|3328 |	50000
|996|	622	|1278 |	3100
|997|	623	|2470 |	3000
|998|	375	|1350 |	20000
|999|	624	|3329 |	1000


In [None]:
fs1_stensor = fs1_tensor_data.toSTensor(hasvalue=True)
fs2_stensor = fs2_tensor_data.toSTensor(hasvalue=True)

In [None]:
#fs1_stensor._data
#fs2_stensor._data

Sparse tensors "fs1_stensor" and "fs2_stensor" are matrices constructed from tensor_data. The amounts of money are elements in those matrices.

The size of fs1_stensor in this example is $1334 \times 3430 $, and the size of fs2_stensor is $2203 \times 1909 $.

In [None]:
maxshape = max(fs1_stensor.shape[1], fs2_stensor.shape[0])
fs1_stensor.shape = (fs1_stensor.shape[0], maxshape)
fs2_stensor.shape = (maxshape, fs2_stensor.shape[1])

Change the shape of two stensors, make sure they have the same size in middle dimension.

In this case, those matrices have sizes of $1334 \times 3430$ and $3430 \times 1909$

In [None]:
graph_1 = st.Graph(fs1_stensor, bipartite=True, weighted=True, modet=None)
graph_2 = st.Graph(fs2_stensor, bipartite=True, weighted=True, modet=None)

Get graph instances from sparse tensors.

In [None]:
step2list = []
step2list.append(graph_1)
step2list.append(graph_2)

Create a graph list, and add graphs in order

### Run FlowScope as a single model

In [None]:
fs = st.FlowScope(step2list)

Note: This model does not support GPU, so it will not be accelerated in GPU mode

In [None]:
print(fs)

Default parameters are: {'alpha': '4'}

alpha is equivalent to $\lambda$ in the paper

In [None]:
res = fs.run(k=3, alpha=4,maxsize=(10,10,10))

$k$ is the number of blocks you want to detect.

$res$ is a list of each block. Each block constains [[detected nodes in each partite], score]

### Run FlowScope from anomaly detection task

In [None]:
ad_model = st.AnomalyDetection.create(step2list, st.ADPolicy.FlowScope, 'flowscope')

In [None]:
# run the model
#default k=3, alpha=4
res = ad_model.run(k=3,alpha=4,maxsize=(-1,-1,100))

$maxsize$ is the block size limit.

$maxsize$ can be an integer (-1 or positive), and $maxsize==-1$ means no size limit.

$maxsize$ can be a tuple which contains the node size limit for each dimension of the block. Similarly, each element of $maxsize$ should be an integer (-1 or positive).

The results is a list of top-k suspicious blocks. Each block constains [[detected nodes in each partite], score]

Then we can visualize the subgraphs as follows.

In [None]:
#viusal of graphs by networkx
import matplotlib.pyplot as plt
import numpy as np
for r in res:
    one, two, three = r[0]
    one = np.array(one)
    two = np.array(two)
    three = np.array(three)
    # to subgraph
    sg_1 = graph_1.get_sub_graph(one, two)
    sg_2 = graph_2.get_sub_graph(two, three)
    # networkx plot
    fig_1 = st.plot_graph(sg_1, bipartite=True, labels=[*one, *two])
    fig_2 = st.plot_graph(sg_2, bipartite=True, labels=[*two, *three])

| Block 1| Block 2| Block 3|
|:--:|:---:|:--:|
|<img src="images/flowResGraph_1.png" />|<img src="images/flowResGraph_2.png" />|<img src="images/flowResGraph_3.png" />|

## Experimental results:

-----
|HoloScope (result in CBank)       |  
|:-------------------------:|
<img src="images/flowscopeRes_1.png" />
|<b> Model analysis of FlowScope.|

<img src="images/flowscopeRes2.png" />

### Cite:
------
1. Li, Xiangfeng, Shenghua Liu, Zifeng Li, Xiaotian Han, Chuan Shi, Bryan Hooi, He Huang, and Xueqi Cheng. "FlowScope: Spotting Money Laundering Based on Graphs." In AAAI, pp. 4731-4738. 2020.
<details>
    <summary><span style="color:blue">click for BibTex...</span></summary>

    ```bibtex
    @inproceedings{li2020flowscope,
      title={FlowScope: Spotting Money Laundering Based on Graphs.},
      author={Li, Xiangfeng and Liu, Shenghua and Li, Zifeng and Han, Xiaotian and Shi, Chuan and Hooi, Bryan and Huang, He and Cheng, Xueqi},
      booktitle={AAAI},
      pages={4731--4738},
      year={2020}
    }
    ```
    </details>  