Skip to content

Data Schema for Input Parameters and Generated Data Set

hkanezashi edited this page Nov 12, 2019 · 15 revisions

In order to generate your data set with AMLSim, you firstly prepare for the input parameter file to run the simulator. The specification for the input parameter file is as follows, and then the specification for generated data set as output is as follows.



Input (Parameter) Files

Account List (accounts.csv)

  • count Number of accounts
  • min_balance Minimum initial balance
  • max_balance Maximum initial balance
  • start_day The day when the account is opened
  • end_day The day when the account is closed
  • country Alpha-2 country code
  • business_type business type
  • suspicious Suspicious account or not (currently unused)
  • model Account behavior model ID (See also AbstractTransactionModel.java)
    • 0: Single transactions
    • 1: Fan-out
    • 2: Fan-in
    • 3: Mutual
    • 4: Forward
    • 5: Periodical
  • bank_id Bank ID which these accounts belong to (optional, default is 0)

Degree Distribution List (degree.csv)

This CSV file has three columns with header names: Count, In-degree and Out-degree. Each CSV row indicates how many account vertices with certain in(out)-degrees should be generated.

Here is an example of degree.csv.

Count,In-degree,Out-degree
0,2,2
1,1,1
2,2,2

From this parameter file, the transaction graph generator generates a directed graph with five vertices (accounts) and five edges. Two of five vertices has no outgoing edges and two of five vertices has no incoming edges (these two vertices might be same).

AML Typology List (alertPatterns.csv)

  • count Number of typologies (transaction sets)
  • type Name of the transaction type (fan_in, fan_out or cycle...) as the AML typology
  • schedule_id Transaction scheduling ID of the typology
    • 0: All accounts make transactions in order with the same interval
    • 1: All accounts make transactions in order with random intervals
    • 2: All accounts make transactions randomly
  • accounts: Number of involved accounts
  • individual_amount Initial individual transaction amount
  • aggregated_amount Minimum aggregated (total) transaction amount
  • transaction_count Minimum number of transactions
  • amount_difference Proportion of the maximum difference of overall transaction amounts
  • period Period of overall transactions (number of days)
  • amount_rounded Proportion of the number of transactions with rounded amounts (optional)
  • orig_country Whether the country of the originator account is suspicious (optional)
  • bene_country Whether the country of the beneficiary account is suspicious (optional)
  • orig_business Whether the business type of the originator account is suspicious (optional)
  • bene_business Whether the business type of the beneficiary accuont is suspicious (optional)
  • is_internal Whether all involved accounts belong to the same bank (optional: default is False)
  • is_sar Whether the alert is SAR (True) or false alert (False)

Transaction Type List (transactionType.csv)

This CSV file has two columns with header names: Type(transaction type name) and Frequency(relative frequency) We currently support 4 types, WIRE, CREDIT, DEPOSIT, ACH and TRANSFER. But since we don't have real data, we only generate a transaction of the WIRE type. In order not to confuse users of this AMLSim, we simply put "TRANSFER" as a default type of transactions. "TRANSFER" is just a symbol name indicating that someone sends money to someone else.

Here is an example of transactionType.csv.

Type, Frequency
WIRE,5
CREDIT,10
DEPOSIT,10

In this case, the WIRE transaction will appear with the probability of 20% (5 / (5+10+10) = 0.2).

Output Files

The result data is generated as some CSV files under the output directory.

  • Account list (accounts.csv)
  • Transaction list (transactions.csv)
  • Alert account list
  • Alert transaction list

Output data schema definition

The data schema (columns and data types) can be defined by editing the data schema definition file (schema.json) under parameter file directory (paramFiles).

Accounts (accounts.csv)

CSV Schema (Column Names)

Note: (optional) columns will be added if the input account list file has the same column names.

  • acct_id Account ID (int)
  • dsply_nm Customer ID (string)
  • type Account type (str)
  • acct_stat Account status (str)
  • acct_rptng_crncy Default currency (str)
  • prior_sar_count Whether this account is involved in SAR transactions (boolean)
  • branch_id Bank branch ID (int)
  • open_dt Date when this account is opened
  • close_dt Date when this account is closed
  • initial_deposit Initial balance (float)
  • tx_behavior_id Transaction behavior model code (int): See also Normal transaction models and AML typology models
  • bank_id Bank ID which this account belongs to
  • first_name (optional) first name of the customer (string)
  • last_name (optional) last name of the customer (string)
  • street_addr (optional) detailed address including street name (string)
  • city (optional) city name (string)
  • state (optional) state name (string)
  • country (optional) Alpha-2 country code (string)
  • zip (optional) zip code (string)
  • gender (optional) gender (string)
  • birth_date (optional) birth date (string)
  • ssn (optional) social security number (string)
  • lon (optional) longitude of the address (float)
  • lat (optional) latitude of the address (float)

Example

The latter half of columns are omitted for brevity because these are optional.

accounts.csv

Transactions (transactions.csv)

CSV Schema (Column Names)

  • tran_id Transaction ID (int)
  • orig_acct Originator account ID (int)
  • bene_acct Beneficiary account ID (int)
  • tx_type Transaction type (string)
  • base_amt Transaction amount (float)
  • tran_timestamp Simulation step when the transaction is done (int)
  • is_sar Whether this transaction is SAR (boolean)
  • alert_id Alert ID which this transaction is involved in (int: If this transaction is not involved in any alerts, the value is -1)

Example

transactions.csv

Alert members (alert_accounts.csv)

  • alert_id Alert ID (int)
  • alert_type Alert type (string)
  • acct_id Account ID (int)
  • acct_name Account name (string)
  • is_sar SAR flag (boolean)
  • model_id AML typology model ID (int)
  • start Simulation step when the account is activated (int)
  • end Simulation step when the account is deactivated (int)
  • schedule_id Schedule ID of the AML typology (int)
  • bank_id Bank ID which this account belongs to (int)

Example

alert_accounts.csv

Alert transactions (alert_transactions.csv)

  • alert_id Alert ID (int)
  • alert_type Alert type (string)
  • is_sar Whether this alert is SAR (boolean)
  • tran_id Transaction ID (int)
  • orig_acct Originator account ID (int)
  • bene_acct Beneficiary account ID (int)
  • tx_type Transaction type (string)
  • base_amt Transaction amount (float)
  • tran_timestamp Date when the transaction is done (int)

Example

alert_transactions.csv