Skip to content

Data Schema for Input Parameters and Generated Data Set

Hiroki Kanezashi edited this page Oct 13, 2021 · 15 revisions

In order to generate your data set with AMLSim, you firstly prepare for the input parameter file to run the simulator. The specification for the input parameter file is as follows, and then the specification for generated data set as output is as follows.



Input (Parameter) Files

Account List (accounts.csv)

  • count: Number of accounts
  • min_balance: Minimum initial balance
  • max_balance: Maximum initial balance
  • start_day: The day when the account is opened
  • end_day: The day when the account is closed
  • country: Alpha-2 country code
  • business_type: business type
  • suspicious: Suspicious account or not (currently unused)
  • model: Account behavior model ID (See also AbstractTransactionModel.java)
    • 0: Single
    • 1: Fan-out
    • 2: Fan-in
    • 3: Mutual
    • 4: Forward
    • 5: Periodical
  • bank_id: Bank ID (string type) which these accounts belong to (optional, default value is an empty string or can be defined at conf.json)

Degree Distribution List (degree.csv)

This CSV file has three columns with header names: Count, In-degree and Out-degree. Each CSV row indicates how many account vertices with certain in(out)-degrees should be generated.

Here is an example of degree.csv.

Count,In-degree,Out-degree
0,2,2
1,1,1
2,2,2

From this parameter file, the transaction graph generator generates a directed graph with five vertices (accounts) and five edges. Two of five vertices has no outgoing edges and two of five vertices has no incoming edges (these two vertices might be same).

AML Typology List (alertPatterns.csv)

  • count: Number of typologies (transaction sets)
  • type: Name of transaction type (fan_in, fan_out, cycle...) as the AML typology
  • schedule_id: Transaction scheduling ID of the typology
    • 0: All member accounts send money in order with the same interval (number of days)
    • 1: All member accounts send money in order with random intervals
    • 2: All member accounts send money randomly
  • min_accounts: Minimum number of involved accounts
  • max_accounts: Maximum number of involved accounts
  • min_amount: Minimum initial transaction amount
  • max_amount: Maximum initial transaction amount
  • min_period: Minimum overall transaction period (number of days)
  • max_period: Maximum overall transaction period (number of days)
  • bank_id: Bank ID which member accounts belong to (optional: if empty, no limitation for the bank ID)
  • is_sar: Whether the alert is SAR (True) or false alert (False)

Transaction Type List (transactionType.csv)

This CSV file has two columns with header names: Type(transaction type name) and Frequency(relative frequency) We currently support 4 types, WIRE, CREDIT, DEPOSIT, ACH and TRANSFER. But since we don't have real data, we only generate a transaction of the WIRE type. In order not to confuse users of this AMLSim, we simply put "TRANSFER" as a default type of transactions. "TRANSFER" is just a symbol name indicating that someone sends money to someone else.

Here is an example of transactionType.csv.

Type, Frequency
WIRE,5
CREDIT,10
DEPOSIT,10

In this case, the WIRE transaction will appear with the probability of 20% (5 / (5+10+10) = 0.2).

Output Files

The result data is generated as some CSV files under the output directory.

  • Account list (accounts.csv)
  • Transaction list (transactions.csv)
  • Alert account list
  • Alert transaction list

Output data schema definition

The data schema (columns and data types) can be defined by editing the data schema definition file (schema.json) under parameter file directory (paramFiles).

Accounts (accounts.csv)

CSV Schema (Column Names)

Note: (optional) columns will be added if the input account list file has the same column names.

  • acct_id: Account ID (int)
  • dsply_nm: Customer ID (string)
  • type: Account type (str)
  • acct_stat: Account status (str)
  • acct_rptng_crncy: Default currency (str)
  • prior_sar_count: Whether this account is involved in SAR transactions (boolean)
  • branch_id: Bank branch ID (int)
  • open_dt: Date when this account is opened
  • close_dt: Date when this account is closed
  • initial_deposit: Initial balance (float)
  • tx_behavior_id: Transaction behavior model code (int): See also Normal transaction models and AML typology models
  • bank_id: Bank ID which this account belongs to (string)
  • first_name: (optional) first name of the customer (string)
  • last_name: (optional) last name of the customer (string)
  • street_addr: (optional) detailed address including street name (string)
  • city: (optional) city name (string)
  • state: (optional) state name (string)
  • country: (optional) Alpha-2 country code (string)
  • zip: (optional) zip code (string)
  • gender: (optional) gender (string)
  • birth_date: (optional) birth date (string)
  • ssn: (optional) social security number (string)
  • lon: (optional) longitude of the address (float)
  • lat: (optional) latitude of the address (float)

Example

The latter half of columns are omitted for brevity because these are optional.

accounts

Transactions (transactions.csv)

CSV Schema (Column Names)

  • tran_id: Transaction ID (int)
  • orig_acct: Originator account ID (int)
  • bene_acct: Beneficiary account ID (int)
  • tx_type: Transaction type (string)
  • base_amt: Transaction amount (float)
  • tran_timestamp: Simulation step when the transaction is done (int)
  • is_sar: Whether this transaction is SAR (boolean)
  • alert_id: Alert ID which this transaction is involved in (int: If this transaction is not involved in any alerts, the value is -1)

Example

transactions

Alert members (alert_accounts.csv)

  • alert_id: Alert ID (int)
  • alert_type: Alert type (string)
  • acct_id: Account ID (int)
  • acct_name: Account name (string)
  • is_sar: SAR flag (boolean)
  • model_id: AML typology model ID (int)
  • start: Simulation step when the account is activated (int)
  • end: Simulation step when the account is deactivated (int)
  • schedule_id: Schedule ID of the AML typology (int)
  • bank_id: Bank ID which this account belongs to (string)

Example

alert_accounts

Alert transactions (alert_transactions.csv)

  • alert_id: Alert ID (int)
  • alert_type: Alert type (string)
  • is_sar: Whether this alert is SAR (boolean)
  • tran_id: Transaction ID (int)
  • orig_acct: Originator account ID (int)
  • bene_acct: Beneficiary account ID (int)
  • tx_type: Transaction type (string)
  • base_amt: Transaction amount (float)
  • tran_timestamp: Date when the transaction is done (int)

Example

alert_transactions