-
Notifications
You must be signed in to change notification settings - Fork 82
Data Schema for Input Parameters and Generated Data Set
In order to generate your data set with AMLSim, you firstly prepare for the input parameter file to run the simulator. The specification for the input parameter file is as follows, and then the specification for generated data set as output is as follows.
-
count
Number of accounts -
min_balance
Minimum initial balance -
max_balance
Maximum initial balance -
start_day
The day when the account is opened -
end_day
The day when the account is closed -
country
Alpha-2 country code -
business_type
business type -
suspicious
Suspicious account or not (currently unused) -
model
Account behavior model ID (See alsoAbstractTransactionModel.java
)- 0: Single transactions
- 1: Fan-out
- 2: Fan-in
- 3: Mutual
- 4: Forward
- 5: Periodical
This CSV file has three columns with header names: Count
, In-degree
and Out-degree
.
Each CSV row indicates how many account vertices with certain in(out)-degrees should be generated.
Here is an example of degree.csv.
Count,In-degree,Out-degree
0,2,2
1,1,1
2,2,2
From this parameter file, the transaction graph generator generates a directed graph with five vertices (accounts) and five edges. Two of five vertices has no outgoing edges and two of five vertices has no incoming edges (these two vertices might be same).
-
count
Number of transaction sets -
type
Fraud pattern name (fan_in
,fan_out
orcycle
) -
schedule_id
Transaction scheduling ID- 0: All accounts send money in order with the same interval
- 1: All accounts send money in order with random intervals
- 2: All accounts send money randomly
-
accounts
: Number of involved accounts -
individual_amount
Minimum individual amount -
aggregated_amount
Minimum aggregated amount -
transaction_count
Minimum transaction count -
amount_difference
Proportion of transaction difference -
period
Lookback period (days) -
amount_rounded
Proportion of transactions with rounded amounts -
orig_country
Whether the originator country is suspicious -
bene_country
Whether the beneficiary country is suspicious -
orig_business
Whether the originator business type is suspicious -
bene_business
Whether the beneficiary business type is suspicious -
is_fraud
Whether the alert is fraud (True) or false alert (False)
This CSV file has two columns with header names: Type
(transaction type name) and Frequency
(frequency)
Here is an example of transactionType.csv.
Type,Frequency
WIRE,5
CREDIT,10
DEPOSIT,15
CHECK,20
In this case, the WIRE
transaction will appear with the probability of 10% (5 / (5+10+15+20) = 0.1).
Note: (optional) columns will be added if the input account list file has the same column names.
-
ACCOUNT_ID
Account ID (int) -
CUSTOMER_ID
Customer ID (string) -
INIT_BALANCE
Initial balance (float) -
START_DATE
Start timestamp (int) -
END_DATE
End timestamp (int) -
COUNTRY
Country code (string) -
ACCOUNT_TYPE
Account type (string) -
IS_SUSPICIOUS
Whether this account is suspicious (boolean) -
IS_FRAUD
Whether this account is fraud (boolean) -
TX_BEHAVIOR_ID
Transaction behavior model code (int) -
SEQ
(optional) account index (int) -
FIRST_NAME
(optional) first name of the customer (string) -
LAST_NAME
(optional) last name of the customer (string) -
STREET_ADDR
(optional) detailed address including street name (string) -
CITY
(optional) city name (string) -
STATE
(optional) state name (string) -
ZIP
(optional) zip code (string) -
GENDER
(optional) gender (string) -
PHONE_NUMBER
(optional) phone number (string) -
BIRTH_DATE
(optional) birth date (string) -
SSN
(optional) social security number (string) -
LON
(optional) longitude of the address -
LAT
(optional) latitude of the address
-
TX_ID
Transaction ID (int) -
SENDER_ACCOUNT_ID
Sender ID (int) -
RECEIVER_ACCOUNT_ID
Receiver ID (int) -
TXN_SOURCE_TYPE_CODE
Transaction type (string) -
TX_TYPE
Number of aggregated transactions (should be 1) -
TX_AMOUNT
Transaction amount (float) -
TIMESTAMP
Simulation step when the transaction is done (int) -
IS_FRAUD
Whether this transaction is fraud (boolean) -
ALERT_ID
Alert ID which this transaction is involved in (int, -1 if this transaction is not alerted)
-
ALERT_ID
Alert ID (int) -
ALERT_TYPE
Alert type (string) -
IS_FRAUD
Whether this alert is fraud (boolean) -
TX_ID
Transaction ID (int) -
SENDER_ACCOUNT_ID
Sender account ID (int) -
RECEIVER_ACCOUNT_ID
Receiver account ID (int) -
TX_TYPE
Transaction type (string) -
TX_AMOUNT
Transaction amount (float) -
TIMESTAMP
Date when the transaction is done (int)