Skip to content

Data Schema for Input Parameters and Generated Data Set

hkanezashi edited this page Apr 25, 2019 · 15 revisions

In order to generate your data set with AMLSim, you firstly prepare for the input parameter file to run the simulator. The specification for the input parameter file is as follows, and then the specification for generated data set as output is as follows.



Input (Parameter) Files

Account List (accounts.csv)

  • count Number of accounts
  • min_balance Minimum initial balance
  • max_balance Maximum initial balance
  • start_day The day when the account is opened
  • end_day The day when the account is closed
  • country Alpha-2 country code
  • business_type business type
  • suspicious Suspicious account or not (currently unused)
  • model Account behavior model ID (See also AbstractTransactionModel.java)
    • 0: Single transactions
    • 1: Fan-out
    • 2: Fan-in
    • 3: Mutual
    • 4: Forward
    • 5: Periodical

Degree Distribution List (degree.csv)

This CSV file has three columns with header names: Count, In-degree and Out-degree. Each CSV row indicates how many account vertices with certain in(out)-degrees should be generated.

Here is an example of degree.csv.

Count,In-degree,Out-degree
0,2,2
1,1,1
2,2,2

From this parameter file, the transaction graph generator generates a directed graph with five vertices (accounts) and five edges. Two of five vertices has no outgoing edges and two of five vertices has no incoming edges (these two vertices might be same).

Alert Pattern List (alertPatterns.csv)

  • count Number of transaction sets
  • type Fraud pattern name (fan_in, fan_out or cycle)
  • schedule_id Transaction scheduling ID
    • 0: All accounts send money in order with the same interval
    • 1: All accounts send money in order with random intervals
    • 2: All accounts send money randomly
  • accounts: Number of involved accounts
  • individual_amount Minimum individual amount
  • aggregated_amount Minimum aggregated amount
  • transaction_count Minimum transaction count
  • amount_difference Proportion of transaction difference
  • period Lookback period (days)
  • amount_rounded Proportion of transactions with rounded amounts
  • orig_country Whether the originator country is suspicious
  • bene_country Whether the beneficiary country is suspicious
  • orig_business Whether the originator business type is suspicious
  • bene_business Whether the beneficiary business type is suspicious
  • is_fraud Whether the alert is fraud (True) or false alert (False)

Transaction Type List (transactionType.csv)

This CSV file has two columns with header names: Type(transaction type name) and Frequency(frequency)

Here is an example of transactionType.csv.

Type,Frequency
WIRE,5
CREDIT,10
DEPOSIT,15
CHECK,20

In this case, the WIRE transaction will appear with the probability of 10% (5 / (5+10+15+20) = 0.1).

Output Files

Accounts (accounts.csv)

CSV Schema (Column Names)

Note: (optional) columns will be added if the input account list file has the same column names.

  • ACCOUNT_ID Account ID (int)
  • CUSTOMER_ID Customer ID (string)
  • INIT_BALANCE Initial balance (float)
  • START_DATE Start timestamp (int)
  • END_DATE End timestamp (int)
  • COUNTRY Country code (string)
  • ACCOUNT_TYPE Account type (string)
  • IS_SUSPICIOUS Whether this account is suspicious (boolean)
  • IS_FRAUD Whether this account is fraud (boolean)
  • TX_BEHAVIOR_ID Transaction behavior model code (int)
  • SEQ (optional) account index (int)
  • FIRST_NAME (optional) first name of the customer (string)
  • LAST_NAME (optional) last name of the customer (string)
  • STREET_ADDR (optional) detailed address including street name (string)
  • CITY (optional) city name (string)
  • STATE (optional) state name (string)
  • ZIP (optional) zip code (string)
  • GENDER (optional) gender (string)
  • PHONE_NUMBER (optional) phone number (string)
  • BIRTH_DATE (optional) birth date (string)
  • SSN (optional) social security number (string)
  • LON (optional) longitude of the address
  • LAT (optional) latitude of the address

Example

accounts.csv

Transactions (tx.csv) and Cash transactions (cash_tx.csv)

CSV Schema (Column Names)

  • TX_ID Transaction ID (int)
  • SENDER_ACCOUNT_ID Sender ID (int)
  • RECEIVER_ACCOUNT_ID Receiver ID (int)
  • TXN_SOURCE_TYPE_CODE Transaction type (string)
  • TX_TYPE Number of aggregated transactions (should be 1)
  • TX_AMOUNT Transaction amount (float)
  • TIMESTAMP Simulation step when the transaction is done (int)
  • IS_FRAUD Whether this transaction is fraud (boolean)
  • ALERT_ID Alert ID which this transaction is involved in (int, -1 if this transaction is not alerted)

Example

transactions.csv

Alert (alerts.csv)

  • ALERT_ID Alert ID (int)
  • ALERT_TYPE Alert type (string)
  • IS_FRAUD Whether this alert is fraud (boolean)
  • TX_ID Transaction ID (int)
  • SENDER_ACCOUNT_ID Sender account ID (int)
  • RECEIVER_ACCOUNT_ID Receiver account ID (int)
  • TX_TYPE Transaction type (string)
  • TX_AMOUNT Transaction amount (float)
  • TIMESTAMP Date when the transaction is done (int)

Example

alerts.csv