# Transaction Data Analysis

This notebook analyzes transaction patterns and creates a graph structure for money laundering detection.

In [1]:
!pip install polars

Collecting polars
  Downloading polars-1.36.1-py3-none-any.whl.metadata (10 kB)
Collecting polars-runtime-32==1.36.1 (from polars)
  Downloading polars_runtime_32-1.36.1-cp39-abi3-win_amd64.whl.metadata (1.5 kB)
Downloading polars-1.36.1-py3-none-any.whl (802 kB)
   ---------------------------------------- 0.0/802.4 kB ? eta -:--:--
   ---------------------------------------- 802.4/802.4 kB 8.6 MB/s eta 0:00:00
Downloading polars_runtime_32-1.36.1-cp39-abi3-win_amd64.whl (44.5 MB)
   ---------------------------------------- 0.0/44.5 MB ? eta -:--:--
   ----- ---------------------------------- 5.8/44.5 MB 29.4 MB/s eta 0:00:02
   ---------- ----------------------------- 11.3/44.5 MB 27.1 MB/s eta 0:00:02
   -------------- ------------------------- 16.3/44.5 MB 27.7 MB/s eta 0:00:02
   -------------------- ------------------- 22.8/44.5 MB 27.8 MB/s eta 0:00:01
   --------------------------- ------------ 30.1/44.5 MB 29.4 MB/s eta 0:00:01
   --------------------------------- ------ 37.5/4

## Setup

Install required library.

In [5]:
import polars as pl

df = pl.read_csv('data/HI-Small_Trans.csv')

## Load Data

Read transaction data from CSV file.

In [6]:
df

Timestamp,From Bank,Account,To Bank,Account_duplicated_0,Amount Received,Receiving Currency,Amount Paid,Payment Currency,Payment Format,Is Laundering
str,i64,str,i64,str,f64,str,f64,str,str,i64
"""2022/09/01 00:20""",10,"""8000EBD30""",10,"""8000EBD30""",3697.34,"""US Dollar""",3697.34,"""US Dollar""","""Reinvestment""",0
"""2022/09/01 00:20""",3208,"""8000F4580""",1,"""8000F5340""",0.01,"""US Dollar""",0.01,"""US Dollar""","""Cheque""",0
"""2022/09/01 00:00""",3209,"""8000F4670""",3209,"""8000F4670""",14675.57,"""US Dollar""",14675.57,"""US Dollar""","""Reinvestment""",0
"""2022/09/01 00:02""",12,"""8000F5030""",12,"""8000F5030""",2806.97,"""US Dollar""",2806.97,"""US Dollar""","""Reinvestment""",0
"""2022/09/01 00:06""",10,"""8000F5200""",10,"""8000F5200""",36682.97,"""US Dollar""",36682.97,"""US Dollar""","""Reinvestment""",0
…,…,…,…,…,…,…,…,…,…,…
"""2022/09/10 23:57""",54219,"""8148A6631""",256398,"""8148A8711""",0.154978,"""Bitcoin""",0.154978,"""Bitcoin""","""Bitcoin""",0
"""2022/09/10 23:35""",15,"""8148A8671""",256398,"""8148A8711""",0.108128,"""Bitcoin""",0.108128,"""Bitcoin""","""Bitcoin""",0
"""2022/09/10 23:52""",154365,"""8148A6771""",256398,"""8148A8711""",0.004988,"""Bitcoin""",0.004988,"""Bitcoin""","""Bitcoin""",0
"""2022/09/10 23:46""",256398,"""8148A6311""",256398,"""8148A8711""",0.038417,"""Bitcoin""",0.038417,"""Bitcoin""","""Bitcoin""",0


In [12]:
df = df.with_columns(
    pl.col('Timestamp').str.strptime(pl.Datetime, format='%Y/%m/%d %H:%M')
)

## Data Preparation

Convert timestamp column to datetime format.

In [13]:
df

Timestamp,From Bank,Account,To Bank,Account_duplicated_0,Amount Received,Receiving Currency,Amount Paid,Payment Currency,Payment Format,Is Laundering
datetime[μs],i64,str,i64,str,f64,str,f64,str,str,i64
2022-09-01 00:20:00,10,"""8000EBD30""",10,"""8000EBD30""",3697.34,"""US Dollar""",3697.34,"""US Dollar""","""Reinvestment""",0
2022-09-01 00:20:00,3208,"""8000F4580""",1,"""8000F5340""",0.01,"""US Dollar""",0.01,"""US Dollar""","""Cheque""",0
2022-09-01 00:00:00,3209,"""8000F4670""",3209,"""8000F4670""",14675.57,"""US Dollar""",14675.57,"""US Dollar""","""Reinvestment""",0
2022-09-01 00:02:00,12,"""8000F5030""",12,"""8000F5030""",2806.97,"""US Dollar""",2806.97,"""US Dollar""","""Reinvestment""",0
2022-09-01 00:06:00,10,"""8000F5200""",10,"""8000F5200""",36682.97,"""US Dollar""",36682.97,"""US Dollar""","""Reinvestment""",0
…,…,…,…,…,…,…,…,…,…,…
2022-09-10 23:57:00,54219,"""8148A6631""",256398,"""8148A8711""",0.154978,"""Bitcoin""",0.154978,"""Bitcoin""","""Bitcoin""",0
2022-09-10 23:35:00,15,"""8148A8671""",256398,"""8148A8711""",0.108128,"""Bitcoin""",0.108128,"""Bitcoin""","""Bitcoin""",0
2022-09-10 23:52:00,154365,"""8148A6771""",256398,"""8148A8711""",0.004988,"""Bitcoin""",0.004988,"""Bitcoin""","""Bitcoin""",0
2022-09-10 23:46:00,256398,"""8148A6311""",256398,"""8148A8711""",0.038417,"""Bitcoin""",0.038417,"""Bitcoin""","""Bitcoin""",0


## Create Nodes

Build graph nodes from transactions with ID, sender, receiver, time, amount, and label.

In [14]:
nodes = df.with_row_index("node_id").select([
    pl.col("node_id"),
    pl.col("Account").alias("f_i"),                # From
    pl.col("Account_duplicated_0").alias("b_i"),   # Beneficiary
    pl.col("Timestamp").alias("t_i"),              # Time
    pl.col("Amount Received").alias("a_i"),        # Amount
    pl.col("Is Laundering")                        # Ground truth
])

In [15]:
nodes

node_id,f_i,b_i,t_i,a_i,Is Laundering
u32,str,str,datetime[μs],f64,i64
0,"""8000EBD30""","""8000EBD30""",2022-09-01 00:20:00,3697.34,0
1,"""8000F4580""","""8000F5340""",2022-09-01 00:20:00,0.01,0
2,"""8000F4670""","""8000F4670""",2022-09-01 00:00:00,14675.57,0
3,"""8000F5030""","""8000F5030""",2022-09-01 00:02:00,2806.97,0
4,"""8000F5200""","""8000F5200""",2022-09-01 00:06:00,36682.97,0
…,…,…,…,…,…
5078340,"""8148A6631""","""8148A8711""",2022-09-10 23:57:00,0.154978,0
5078341,"""8148A8671""","""8148A8711""",2022-09-10 23:35:00,0.108128,0
5078342,"""8148A6771""","""8148A8711""",2022-09-10 23:52:00,0.004988,0
5078343,"""8148A6311""","""8148A8711""",2022-09-10 23:46:00,0.038417,0


## Create Edges

Connect transactions where one receiver becomes the sender in another transaction.

In [16]:
edges = nodes.join(
    nodes,
    left_on="b_i", 
    right_on="f_i",
    suffix="_d"
).rename({"node_id": "v_s", "node_id_d": "v_d"})

In [17]:
edges

v_s,f_i,b_i,t_i,a_i,Is Laundering,v_d,b_i_d,t_i_d,a_i_d,Is Laundering_d
u32,str,str,datetime[μs],f64,i64,u32,str,datetime[μs],f64,i64
0,"""8000EBD30""","""8000EBD30""",2022-09-01 00:20:00,3697.34,0,0,"""8000EBD30""",2022-09-01 00:20:00,3697.34,0
1124178,"""80F7CE4C0""","""8000EBD30""",2022-09-02 00:20:00,33.64,0,0,"""8000EBD30""",2022-09-01 00:20:00,3697.34,0
1124182,"""80F7D6660""","""8000EBD30""",2022-09-02 00:18:00,44.08,0,0,"""8000EBD30""",2022-09-01 00:20:00,3697.34,0
1624528,"""8000EBBB0""","""8000EBD30""",2022-09-02 16:25:00,145.14,0,0,"""8000EBD30""",2022-09-01 00:20:00,3697.34,0
1874382,"""80C1C4030""","""8000EBD30""",2022-09-03 00:04:00,37.92,0,0,"""8000EBD30""",2022-09-01 00:20:00,3697.34,0
…,…,…,…,…,…,…,…,…,…,…
411414,"""8148A6311""","""8148A6311""",2022-09-01 02:46:00,0.15669,0,5078343,"""8148A8711""",2022-09-10 23:46:00,0.038417,0
1515967,"""8148A6181""","""8148A6311""",2022-09-02 12:27:00,0.038922,0,5078343,"""8148A8711""",2022-09-10 23:46:00,0.038417,0
4590253,"""8148A6181""","""8148A6311""",2022-09-09 13:01:00,0.038922,0,5078343,"""8148A8711""",2022-09-10 23:46:00,0.038417,0
478259,"""8148A6091""","""8148A6091""",2022-09-01 04:36:00,0.000987,0,5078344,"""8148A8711""",2022-09-10 23:37:00,0.281983,0


In [18]:
timedelta = pl.duration(hours=24)

## Filter Edges

Keep only edges where the second transaction occurs within 24 hours after the first.

In [19]:
edges = edges.filter(
    (pl.col("t_i_d") > pl.col("t_i")) & 
    (pl.col("t_i_d") < pl.col("t_i") + timedelta)
)

## Temporal View Results

Display final nodes and edges.

In [22]:
print(nodes)

shape: (5_078_345, 6)
┌─────────┬───────────┬───────────┬─────────────────────┬──────────┬───────────────┐
│ node_id ┆ f_i       ┆ b_i       ┆ t_i                 ┆ a_i      ┆ Is Laundering │
│ ---     ┆ ---       ┆ ---       ┆ ---                 ┆ ---      ┆ ---           │
│ u32     ┆ str       ┆ str       ┆ datetime[μs]        ┆ f64      ┆ i64           │
╞═════════╪═══════════╪═══════════╪═════════════════════╪══════════╪═══════════════╡
│ 0       ┆ 8000EBD30 ┆ 8000EBD30 ┆ 2022-09-01 00:20:00 ┆ 3697.34  ┆ 0             │
│ 1       ┆ 8000F4580 ┆ 8000F5340 ┆ 2022-09-01 00:20:00 ┆ 0.01     ┆ 0             │
│ 2       ┆ 8000F4670 ┆ 8000F4670 ┆ 2022-09-01 00:00:00 ┆ 14675.57 ┆ 0             │
│ 3       ┆ 8000F5030 ┆ 8000F5030 ┆ 2022-09-01 00:02:00 ┆ 2806.97  ┆ 0             │
│ 4       ┆ 8000F5200 ┆ 8000F5200 ┆ 2022-09-01 00:06:00 ┆ 36682.97 ┆ 0             │
│ …       ┆ …         ┆ …         ┆ …                   ┆ …        ┆ …             │
│ 5078340 ┆ 8148A6631 ┆ 8148A8711 ┆ 2022-09

In [21]:
print(edges.select(["v_s", "v_d", "t_i", "t_i_d"]))

shape: (34_739_332, 4)
┌─────────┬─────────┬─────────────────────┬─────────────────────┐
│ v_s     ┆ v_d     ┆ t_i                 ┆ t_i_d               │
│ ---     ┆ ---     ┆ ---                 ┆ ---                 │
│ u32     ┆ u32     ┆ datetime[μs]        ┆ datetime[μs]        │
╞═════════╪═════════╪═════════════════════╪═════════════════════╡
│ 280625  ┆ 3       ┆ 2022-09-01 00:00:00 ┆ 2022-09-01 00:02:00 │
│ 30133   ┆ 5       ┆ 2022-09-01 00:01:00 ┆ 2022-09-01 00:03:00 │
│ 94948   ┆ 7       ┆ 2022-09-01 00:11:00 ┆ 2022-09-01 00:16:00 │
│ 97148   ┆ 15      ┆ 2022-09-01 00:05:00 ┆ 2022-09-01 00:09:00 │
│ 23      ┆ 18      ┆ 2022-09-01 00:02:00 ┆ 2022-09-01 00:28:00 │
│ …       ┆ …       ┆ …                   ┆ …                   │
│ 5018387 ┆ 5078330 ┆ 2022-09-10 16:12:00 ┆ 2022-09-10 23:52:00 │
│ 5018388 ┆ 5078330 ┆ 2022-09-10 16:24:00 ┆ 2022-09-10 23:52:00 │
│ 4945754 ┆ 5078335 ┆ 2022-09-10 07:20:00 ┆ 2022-09-10 23:37:00 │
│ 4945755 ┆ 5078335 ┆ 2022-09-10 07:07:00 ┆ 2022-09-1