Skip to content

Results: Statistical Graph Features

Hiroki Kanezashi edited this page Jan 14, 2019 · 1 revision

Statistical Features of the Generated Graphs

Example 1: Sparse Graph

27,770 accounts (vertices), 55,963 transactions (edges), 100 fraud patterns (triangles), 1000 false alert patterns (fan-in/out with 5 vertices)

Transaction Count Degree Distribution Diameter and Average Distance

In this sparse network, the degree distribution follows the power-law (like scale-free network). The diameter transition is unstable while adding new transaction edges.

GPML Result

The F1-scores are almost stable in many combinations of graph features. The recall value is mostly 1.0 while the precision is 0.18.

Example 2: Medium Graph

27,770 accounts (vertices), 90,457 transactions (edges), 100 fraud patterns (triangles), 1000 false alert patterns (fan-in/out with 5 vertices)

Transaction Count Degree Distribution Diameter and Average Distance

The degree distribution does not follow for some account vertices with small degrees (less than 5).

GPML Result

The results of GPML are more stable than those of the sparse network. The F1-scores are almost stable in many combinations of graph features. The recall value is mostly 1.0 while the precision is 0.18. The best combinations with the largest F1-score have 1-hop Egonet and 2-hop Egonet features.

Example 3: Dense Graph

27,770 accounts (vertices), 161,299 transactions (edges), 100 fraud patterns (triangles), 1000 false alert patterns (fan-in/out with 5 vertices)

Transaction Count Degree Distribution Diameter and Average Distance

There are few accounts (vertices) with small degrees (less than 10). The diameter constantly decreases, and the average distance is almost stable with consistency.

GPML Result

The recall does not drastically change (from 0.8 to 1.0) with almost all combinations of graph features. With 1-hop Egonet features, the precision and F1-score increases a little.

Example 4: Very Dense Graph

27,770 accounts (vertices), 275,356 transactions (edges), 100 fraud patterns (triangles), 1000 false alert patterns (fan-in/out with 5 vertices)

Transaction Count Degree Distribution Diameter and Average Distance

GPML Result