-
Notifications
You must be signed in to change notification settings - Fork 82
Data Schema for Input Parameters and Generated Data Set
In order to generate your data set with AMLSim, you firstly prepare for the input parameter file to run the simulator. The specification for the input parameter file is as follows, and then the specification for generated data set as output is as follows.
-
count
Number of accounts -
min_balance
Minimum initial balance -
max_balance
Maximum initial balance -
start_day
The day when the account is opened -
end_day
The day when the account is closed -
country
Alpha-2 country code -
business_type
business type -
suspicious
Suspicious account or not (currently unused) -
model
Account behavior model ID (See alsoAbstractTransactionModel.java
)- 0: Single transactions
- 1: Fan-out
- 2: Fan-in
- 3: Mutual
- 4: Forward
- 5: Periodical
-
bank_id
Bank ID which these accounts belong to (optional, default is 0)
This CSV file has three columns with header names: Count
, In-degree
and Out-degree
.
Each CSV row indicates how many account vertices with certain in(out)-degrees should be generated.
Here is an example of degree.csv.
Count,In-degree,Out-degree
0,2,2
1,1,1
2,2,2
From this parameter file, the transaction graph generator generates a directed graph with five vertices (accounts) and five edges. Two of five vertices has no outgoing edges and two of five vertices has no incoming edges (these two vertices might be same).
-
count
Number of typologies (transaction sets) -
type
Name of the transaction type (fan_in
,fan_out
orcycle
...) as the AML typology -
schedule_id
Transaction scheduling ID of the typology- 0: All accounts make transactions in order with the same interval
- 1: All accounts make transactions in order with random intervals
- 2: All accounts make transactions randomly
-
accounts
: Number of involved accounts -
individual_amount
Initial individual transaction amount -
aggregated_amount
Minimum aggregated (total) transaction amount -
transaction_count
Minimum number of transactions -
amount_difference
Proportion of the maximum difference of overall transaction amounts -
period
Period of overall transactions (number of days) -
amount_rounded
Proportion of the number of transactions with rounded amounts (optional) -
orig_country
Whether the country of the originator account is suspicious (optional) -
bene_country
Whether the country of the beneficiary account is suspicious (optional) -
orig_business
Whether the business type of the originator account is suspicious (optional) -
bene_business
Whether the business type of the beneficiary accuont is suspicious (optional) -
is_internal
Whether all involved accounts belong to the same bank (optional: default is False) -
is_sar
Whether the alert is SAR (True) or false alert (False)
This CSV file has two columns with header names: Type
(transaction type name) and Frequency
(relative frequency)
We currently support 4 types, WIRE, CREDIT, DEPOSIT, ACH and TRANSFER.
But since we don't have real data, we only generate a transaction of the WIRE type.
In order not to confuse users of this AMLSim, we simply put "TRANSFER" as a default type of transactions.
"TRANSFER" is just a symbol name indicating that someone sends money to someone else.
Here is an example of transactionType.csv.
Type, Frequency
WIRE,5
CREDIT,10
DEPOSIT,10
In this case, the WIRE
transaction will appear with the probability of 20% (5 / (5+10+10) = 0.2).
The result data is generated as some CSV files under the output
directory.
- Account list (accounts.csv)
- Transaction list (transactions.csv)
- Alert account list
- Alert transaction list
The data schema (columns and data types) can be defined by editing the data schema definition file (schema.json) under parameter file directory (paramFiles).
Note: (optional) columns will be added if the input account list file has the same column names.
-
acct_id
Account ID (int) -
dsply_nm
Customer ID (string) -
type
Account type (str) -
acct_stat
Account status (str) -
acct_rptng_crncy
Default currency (str) -
prior_sar_count
Whether this account is involved in SAR transactions (boolean) -
branch_id
Bank branch ID (int) -
open_dt
Date when this account is opened -
close_dt
Date when this account is closed -
initial_deposit
Initial balance (float) -
tx_behavior_id
Transaction behavior model code (int): See also Normal transaction models and AML typology models -
bank_id
Bank ID which this account belongs to -
first_name
(optional) first name of the customer (string) -
last_name
(optional) last name of the customer (string) -
street_addr
(optional) detailed address including street name (string) -
city
(optional) city name (string) -
state
(optional) state name (string) -
country
(optional) Alpha-2 country code (string) -
zip
(optional) zip code (string) -
gender
(optional) gender (string) -
birth_date
(optional) birth date (string) -
ssn
(optional) social security number (string) -
lon
(optional) longitude of the address (float) -
lat
(optional) latitude of the address (float)
The latter half of columns are omitted for brevity because these are optional.
-
tran_id
Transaction ID (int) -
orig_acct
Originator account ID (int) -
bene_acct
Beneficiary account ID (int) -
tx_type
Transaction type (string) -
base_amt
Transaction amount (float) -
tran_timestamp
Simulation step when the transaction is done (int) -
is_sar
Whether this transaction is SAR (boolean) -
alert_id
Alert ID which this transaction is involved in (int: If this transaction is not involved in any alerts, the value is -1)
-
alert_id
Alert ID (int) -
alert_type
Alert type (string) -
acct_id
Account ID (int) -
acct_name
Account name (string) -
is_sar
SAR flag (boolean) -
model_id
AML typology model ID (int) -
start
Simulation step when the account is activated (int) -
end
Simulation step when the account is deactivated (int) -
schedule_id
Schedule ID of the AML typology (int) -
bank_id
Bank ID which this account belongs to (int)
-
alert_id
Alert ID (int) -
alert_type
Alert type (string) -
is_sar
Whether this alert is SAR (boolean) -
tran_id
Transaction ID (int) -
orig_acct
Originator account ID (int) -
bene_acct
Beneficiary account ID (int) -
tx_type
Transaction type (string) -
base_amt
Transaction amount (float) -
tran_timestamp
Date when the transaction is done (int)