# Data Generation

### On `Mobile money` transactions for `Mara Bank`

## Sections

- Introduction
    - Project Overview
    - Objectives

- Data Importation
    - Import modules
    - Import Datasets
    - Set Global Constants

- Data Description
    - Data Information
    - Dataset Shape and Size
    - Data Types

- Generator Functions
    - tick
    - random_amount
    - random_location
    - generate_bank_devices
    - new_account
    - new_user
    - generate_users
    - generate_nonce
    - random_user
    - random_merchant
    - random_atm

- Simulations
    - add_transaction
    - credit_account
    - debit_account
    - atm_withdrawal
    - atm_payment
    - atm_deposit
    - pos_payment
    - pos_withdrawal
    - mobile_transfer
    - generate_event
    - simulate


## Introduction

### Project Overview

In the dynamic landscape of digital finance, the ability to detect fraudulent activities amidst a deluge of daily transactions is paramount. 

This notebook embarks on a journey to simulate financial transactions within Nigeria, creating a realistic synthetic dataset to serve as a foundation for building robust fraud detection models.
 
Unmasking Financial Irregularities: `A Synthetic Data Journey into Nigerian Transaction Fraud Detection`

### Project Objective

By mimicking the patterns and complexities of real world ATM card and mobile phone transactions, we aim to generate data that reflects the nuances of consumer behavior and potential fraudulent schemes. 

This synthetic environment allows us to explore various detection techniques without compromising sensitive personal information, providing a safe and ethical space for developing innovative solutions to combat financial crime. 

Join us as we dive into the fascinating world of data generation, fraud detection logic, and the simulation of a bustling financial ecosystem.

Here are the entities considered in this synthetic banking system in order to generate a transactions table mimicing that of an actual bank.

- Banks
    - Bank Devices (ATM Stands)
    
- Users
    - Bank Accounts (Owned by Users)
    - User Devices
    - Merchants (Users with POS)

Here are the data points to generate for each transaction:

- `amount`: The value of the transaction.
- `balance`: The account balance after the transaction.
- `time`: The timestamp of the transaction.
- `holder`: The account number of the transaction's initiator or recipient.
- `kyc`: The kyc level of the account
- `holder_bvn`: The BVN of the transaction's initiator or recipient.
- `holder_bank`: The bank of the related party.
- `related`: The account number or entity related to the transaction (e.g., recipient account, ATM bank).
- `related_bvn`: The BVN of the related party.
- `related_bank`: The bank of the related party.
- `state`, `latitude`, `longitude`: Location details of the transaction.
- `status`: The outcome of the transaction (e.g., 'SUCCESS', 'FAILED').
- `type`: The transaction type (e.g., 'DEBIT', 'CREDIT').
- `category`: The specific class of transaction (e.g., 'OPENING', 'WITHDRAWAL', 'PAYMENT', 'TRANSFER', 'REVERSAL', 'BILL').
- `channel`: The channel used for the transaction (e.g., 'CARD', 'APP', 'USSD').
- `device`: The device used for the transaction (e.g., 'ATM-001', 'MOBILE-003') .
- `nonce`: A unique identifier for the transaction.
- `reported`: Marks reported transactions?

## Data Importation

- Import modules
- Import Datasets
- Set Global Constants

### Import Modules

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Import the required modules

import pandas as pd # Used for data manipulation and analysis.
import numpy as np # Used for numerical operations
import random # Used to randomize situations
from datetime import datetime, timedelta # Used to add timestamp to the transactions.
import time
import os
import sys

In [3]:
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [4]:
from lib import  tracker

### Import datasets

In [5]:
# Load the Nigerian state locations data from a CSV file into a pandas DataFrame.
location_df = pd.read_csv('../datasets/nigerian_state_locations.csv')

# Display the first few rows of the location_df DataFrame.
location_df.head()

Unnamed: 0,state,latitude,longitude
0,Abia,8.732392,4.630143
1,Abia,8.727039,4.615569
2,Abia,8.738321,4.635168
3,Abia,8.768629,4.611625
4,Abia,8.714793,4.641194


### Set global constants

In [6]:
# Define constant variables used in the data generation process.

SEED = 42 # Seed for random number generation to ensure reproducibility.

NUM_USERS = 1000 # Number of users to generate.

NUM_BANKS = 50 # Number of banks to generate.

TRX_TYPES = ['DEBIT', 'CREDIT'] # Possible transaction types.

CHANNELS = ['CARD', 'APP', 'USSD'] # Possible transaction channels.

SEASONS = 365

BASE_TIME = datetime.now() - timedelta(days=SEASONS) # The starting time for transaction timestamps.

TICKER = 0 # The number of seconds that have passed

transactions = {} # Initialize an empty DataFrame to store the generated transactions.

users_df = pd.DataFrame() # Initialize an empty DataFrame to store the generated users.

accounts_df = pd.DataFrame() # Initialize an empty DataFrame to store the generated user accounts.

bank_device_df = pd.DataFrame() # Initialize an empty DataFrame to store the generated bank devices.

MIN_AMOUNT = 100

## Generator Functions

We will be using the following functions to enable the generation of the entities required for this project.

- `tick`: A synthetic Clock

- `random_amount`: Generates a random amount base of account level and set treshhold

- `random_location`: Selects a random location using the location's Dataframe we imported.

- `generate_nonce`: Generates an identifier for related transactions.

- `generate_banks`: Sets up the banks for the project

- `generate_bank_devices`: Assigns ATM devices to banks at random locations.

- `generate_users`: Generates the users for the project.

- `generate_user_accounts`: Assign bank accounts at random banks to all users.

- `open_account`: Makes initial deposit for each account to complete the account opening process.

- `random_account`: This is used to select a random account to initiate a transaction.

- `random_atm`: This is used to select a random atm to initiate a transaction.

In [7]:
def tick(sec = 1):
    """
        A synthetic clock ticker.

        @param sec: The number of seconds to tick forward.

        @return: The current time after ticking forward.
    """

    # Enable the global time
    global TICKER, BASE_TIME

    # Pick a random number of seconds
    TICKER += random.randint(0, sec)

    # Update the base time using the selected number of seconds
    time = BASE_TIME + timedelta(seconds=TICKER)

    return time

In [8]:
def random_amount(level=1, limit=0):
    """
        Generate a random amount.

        @param level: The level of the account.
        @param limit: The maximum amount to generate.

        @return: A random amount.
    """

    # Set the bounds for amount
    max_amount = limit if limit else 10000 * (10 ** level)
    min_amount = max_amount / (100)

    # Generate a random amount and round it 2 decimal places to mimic money.
    amount = round(np.random.uniform(min_amount, max_amount), 2)

    return amount

In [9]:
def random_location(lat=None, lon=None):
    """
        Select a random location from location_df (vectorized & fast).

        @params lat: The latitude
        @params lon: The longitude

        @returns a random location(lon, lat)
    """


    limits = [1, 10, 100, 1000, 10000]
    radius = random.uniform(0, random.choices(limits, [1, .5, .1, .05, .001], k=1)[0])
    locations = location_df.copy()

    if lat is not None and lon is not None:
        distances = locations.apply(
            lambda location: 
            tracker.distance(
                latA=lat,
                lonA=lon,
                latB=location['latitude'],
                lonB=location['longitude']
            ),
            axis=1
        )

        # Filter within radius
        nearby = locations.loc[distances <= radius]
        if not nearby.empty:
            locations = nearby

    # fallback: pick any location
    return locations.sample(n=1).squeeze()


In [10]:
def generate_nonce():
    """
        Generate a random nonce.

        @return: A random nonce.
    """

    return random.randint(1e20, 1e21-1)

In [11]:
def generate_bank_devices():
    """
        Generate a list of banks and assign a random number devices.

        @return: A DataFrame containing bank information.
    """

    # Initialize a random list of bank devices
    devices = []

    # Generate based on the set number of banks
    for i in range(NUM_BANKS):
        # Set the unique identifier for each bank
        bank_id = f"BANK_{(i):05}"

        # Assign bank a random number of devices
        num_devices = random.randint(1, 3)

        # Generate devices for each bank
        for _ in range(num_devices):
            # Assign bank device a random location
            location = random_location()

            # Set a unique identifier for the device
            marker = len(devices) + 1
            device_id = f"{'ATM'}_{marker:010}"

            # Set device details
            device = {
                'device_id': device_id,
                'bank_id': bank_id,
                **location
            }

            # Add device to the general banks device list
            devices.append(device)

        transactions[bank_id] = pd.DataFrame()

    return pd.DataFrame(devices).set_index('device_id')

# Generate banks data
bank_device_df = generate_bank_devices()

# Previce banks data
bank_device_df.head()

Unnamed: 0_level_0,bank_id,state,latitude,longitude
device_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ATM_0000000001,BANK_00000,Kogi,10.073038,3.1714
ATM_0000000002,BANK_00000,Sokoto,12.98202,5.230135
ATM_0000000003,BANK_00000,Anambra,10.675228,4.936083
ATM_0000000004,BANK_00001,Ebonyi,6.405663,3.580839
ATM_0000000005,BANK_00001,Ondo,8.425542,5.977803


In [12]:
def add_transaction(transaction):
    """
        Adds a transaction to the transactions dataframe

        @params transaction: The transaction to add
    """

    # Add transaction
    global transactions
    bank_id = transaction.get('holder_bank')
    transactions[bank_id] = pd.concat([transactions[bank_id], pd.DataFrame([transaction])], ignore_index=True)

In [None]:
def new_account(user_id):
    """
        Generates a new account belonging to the user with provided user_id in a random bank.

        @params user_id: The id (BVN) of the user
    """
    global transactions_df, accounts_df

    # Get the user
    user = users_df.loc[user_id]

    # Set a unique identifier for each account
    account_no = f"ACC_{(len(accounts_df) + 1):010}"

    # Set a random kyc level for account
    level = random.choices([1, 2, 3, 4], [1, 1, 1, .2], k=1)[0]

    # Set location for where this account is opened
    location = user[['state', 'latitude', 'longitude']] if random.random() > .3 else random_location(user['latitude'], user['longitude'])

    # Open this account in a random bank
    bank_id = random.choice(bank_device_df['bank_id'].unique())

    # Set a random amount as the opening amount based on the account's kyc level
    opening_balance = random_amount(level)

    device = random.choice(user['devices'])

    # Initialize accout with basic information
    account = {
        'account_no': account_no,
        'balance': opening_balance,
        'kyc': level,
        'bvn': user_id,
        'bank_id': bank_id,
        'merchant': random.random() > 0.9,
        'opening_device': device
    }

    # Set the details for first transaction
    opening_transaction = {
        'amount': opening_balance,
        'balance': opening_balance,
        'time': tick(5),
        'holder': account['account_no'],
        'holder_bvn': account['bvn'],
        'holder_bank': account['bank_id'],
        'related': account['bank_id'],
        'related_bvn': account['bank_id'],
        'related_bank': account['bank_id'],
        **location,
        'status': 'SUCCESS',
        'type': 'CREDIT',
        'category': 'OPENING',
        'channel': 'APP',
        'device': device,
        'nonce': generate_nonce(),
        'reported': False
    }

    # Add account to the account dataframe
    accounts_df = pd.concat([accounts_df, pd.DataFrame([account]).set_index('account_no')])

    # Add transaction to the transaction dataframe
    add_transaction(opening_transaction)

In [14]:
def new_user():
    """
        Generates a new user
    """

    # Use the global users dataset
    global users_df

    # Set the unique identifier for each user
    user_marker = len(users_df) + 1
    user_id = f"USER_{(user_marker):012}"

    # Assign user a random number of devices
    num_devices = random.randint(1, 2)

    # Assign user a random location
    location = random_location()

    # Initialize an empty device list to this user
    user_devices = []

    # Generate devices for each user
    for _ in range(num_devices):
        # Set a unique identifier for the device
        device_marker = len(user_devices) + 1
        device_id = f"{'MOBILE'}_{user_marker:05}_{device_marker:05}"

        # Add device to the specific user device list
        user_devices.append(device_id)

    # Add user to the user list
    user = {
        'user_id': user_id,
        'devices': user_devices,
        **location
    }

    # Add account to the account dataframe
    users_df = pd.concat([users_df, pd.DataFrame([user]).set_index('user_id')])

    # Create an account for this user
    new_account(user_id)

In [15]:
def generate_users():
    """
        Generate a list of users and assign a random number devices and accounts to each.

        @return: A DataFrame containing user information.
    """

    # Generate based on the set number of users
    for _ in range(NUM_USERS):
        new_user()

# Generate users data
generate_users()

# Previce users data
users_df.head()

Unnamed: 0_level_0,devices,state,latitude,longitude
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
USER_000000000001,"[MOBILE_00001_00001, MOBILE_00001_00002]",Plateau,7.351898,4.05921
USER_000000000002,[MOBILE_00002_00001],Bayelsa,7.538992,4.747898
USER_000000000003,[MOBILE_00003_00001],Ogun,10.907129,3.678199
USER_000000000004,"[MOBILE_00004_00001, MOBILE_00004_00002]",Benue,10.888057,4.366624
USER_000000000005,[MOBILE_00005_00001],Niger,6.463792,7.53153


In [16]:
def random_account(exclude):
    """
        Select a random account number from the accounts_df.

        @return: An account
    """
    level = random.choices([1, 2, 3, 4], [1, 2, 3, 4], k=1)[0]
    
    # Select a random account
    return accounts_df.drop(exclude).loc[accounts_df['kyc'] >= level].sample(n=1).squeeze()

In [17]:
def random_merchant(lat, lon):
    """
        Select a random merchant from the merchants DataFrame

        @params lat: The latitude
        @params lon: The longitude

        @returns: a random account that is a merchant within the set location(lon, lat)
    """

    # Get the list of merchants
    merchants_list = accounts_df[accounts_df['merchant']]

    # Get the users who are merchants
    merchant_users_list = users_df[users_df.index.isin(merchants_list['bvn'])].copy()

    # Pick radius
    limits = [1, 10, 100, 1000, 10000]
    radius = random.uniform(0, random.choices(limits, [1, .5, .1, .05, .001], k=1)[0])

    # Calculate the distance of merchants
    merchant_users_list['distance'] = merchant_users_list.apply(
        lambda merchant: 
        tracker.distance(
            latA=lat,
            lonA=lon,
            latB=merchant['latitude'],
            lonB=merchant['longitude']
        ),
        axis=1
    )

    # Filter by radius
    nearby = merchant_users_list.loc[merchant_users_list['distance'] <= radius]

    # Select a merchant
    if not nearby.empty:
        merchant_id = nearby.sample(n=1).squeeze().name
        merchants_list = merchants_list[merchants_list['bvn'] == merchant_id]

    merchant = merchants_list.sample(n=1).squeeze()

    # Final merchant details
    device_id = f'POS_{merchant.name.split("_")[-1]}'
    user_id = merchant['bvn']
    user = users_df[users_df.index == user_id].squeeze()

    return {
        'state': user['state'],
        'latitude': user['latitude'],
        'longitude': user['longitude'],
        'device_id': device_id,
        'account_no': merchant.name,
        'bvn': user_id,
        'bank_id': merchant['bank_id'],
        'balance': merchant['balance']
    }

In [18]:
def random_atm(lat, lon):
    """
        Select a random bank device from bank_device_df

        @params lat: The latitude
        @params lon: The longitude

        @returns: a random account that is a merchant within the set location(lon, lat)
    """

    # Pick radius
    limits = [1, 10, 100, 1000, 10000]
    radius = random.uniform(0, random.choices(limits, [1, .5, .1, .05, .001], k=1)[0])

    # Calculate the distance of merchants
    distances = bank_device_df.apply(
        lambda bank_device: 
        tracker.distance(
            latA=lat,
            lonA=lon,
            latB=bank_device['latitude'],
            lonB=bank_device['longitude']
        ),
        axis=1
    )

    # Filter by radius
    candidates = bank_device_df.loc[distances <= radius]

    # Select a bank device
    if candidates.empty:
        candidates = bank_device_df
    
    bank_device = candidates.sample(n=1).squeeze()

    return {
        'state': bank_device['state'],
        'latitude': bank_device['latitude'],
        'longitude': bank_device['longitude'],
        'device_id': bank_device.name,
        'account_no': bank_device.name,
        'bvn': bank_device.name,
        'bank_id': bank_device['bank_id'],
    }


## Simulation Functions

We will be using the following functions to simulate the real world.

- `add_transaction`: Add a transaction to the transactions dataframe

- `debit_account`: Debits an account

- `credit_account`: Credits an account

- `atm_withdrawal`: Simulates a cart withdrawal

- `atm_payment`: Simulates a card payment like Bills and Shopping

- `atm_deposit`: Simulates a card payment like Bills and Shopping

- `pos_payment`: Simulates a card payment like Bills and Shopping

- `pos_withdrawal`: Simulates a card payment like Bills and Shopping

- `mobile_transfer`: Simulates a mobile transfer using either USSD or APP

- `generate_event`: Simulates a random event (Withdrawal or Payment or Transfer)

- `simulate`: Simulates the banking process through time

In [19]:
def debit_account(holder, related, amount, device_id, location, category, channel, nonce):
    """
        Debit an account

        @params holder: The details of the account the money is leaving
        @params related: The details of the account to money is going to
        @params amount: The amount to be debited
        @params device_id: The device used for the transaction
        @params location: The location of the transaction
        @params category: The category of the transaction
        @params channel: The channel used for the transaction
        @params nonce: The nonce of the transaction

        @return: The debit transaction details
    """

    # Deterine if the transaction will be successful, randomly. All reversals must be successful.
    status = 'SUCCESS' if category == 'REVERSAL' else random.choices(['SUCCESS', 'FAILED'], [0.7, 0.3], k=1)[0]

    # Get the balance of the transaction
    balance = round(holder['balance'] - amount, 2) if status == 'SUCCESS' else holder['balance']

    # Transaction fails if the balance is insufficient.
    if balance < 0:
        status = 'FAILED'
    else:
        # Update user account balance
        accounts_df.loc[accounts_df.index == holder['account_no'], 'balance'] = balance

    # Randomly report this transaction
    reported = random.random() > .8 if status == 'SUCCESS' else False

    return {
        'amount': amount,
        'balance': balance,
        'time': tick(),
        'holder': holder['account_no'],
        'holder_bvn': holder['bvn'],
        'holder_bank': holder['bank_id'],
        'related': related['account_no'],
        'related_bvn': related['bvn'],
        'related_bank': related['bank_id'],
        **location,
        'channel': channel,
        'device': device_id,
        'status': status,
        'category': category,
        'type': 'DEBIT',
        'nonce': nonce,
        'reported': reported
    }

In [20]:
def credit_account(holder, related, amount, device_id, location, category, channel, nonce):
    """
        Credit an account

        @params holder: The details of the account the money is leaving
        @params related: The details of the account to money is going to
        @params amount: The amount to be credited
        @params device_id: The device used for the transaction
        @params location: The location of the transaction
        @params category: The category of the transaction
        @params channel: The channel used for the transaction
        @params nonce: The nonce of the transaction

        @return: The credit transaction details
    """

    # Get the balance of the transaction
    balance = round(holder['balance'] + amount)

    # Update the account balance
    accounts_df.loc[accounts_df.index == holder['account_no'], 'balance'] = balance

    # Randomly report this transaction
    reported = random.random() > .8

    return {
        'amount': amount,
        'balance': balance,
        'time': tick(),
        'holder': holder['account_no'],
        'holder_bvn': holder['bvn'],
        'holder_bank': holder['bank_id'],
        'related': related['account_no'],
        'related_bvn': related['bvn'],
        'related_bank': related['bank_id'],
        **location,
        'channel': channel,
        'device': device_id,
        'status': 'SUCCESS',
        'category': category,
        'type': 'CREDIT',
        'nonce': nonce,
        'reported': reported
    }

In [21]:
def atm_withdrawal(holder, amount, nonce, reverse):
    """
        Simulates card withdrawal transactions

        @params holder: The details of the account the money is leaving
        @params amount: The amount to be credited
        @params nonce: The nonce of the transaction
        @params reverse: A random chance of reversing the transaction
    """

    # Select a random bank device for this transaction
    bank_device = random_atm(holder['latitude'], holder['longitude'])

    # Get the device id
    device_id = bank_device['device_id']

    # Get the device location
    location = {
        'state': bank_device['state'],
        'longitude': bank_device['longitude'],
        'latitude': bank_device['latitude']
    }

    # Set the related party of this transaction (The holder's bank)
    related = {
        'account_no': bank_device['account_no'],
        'bvn': bank_device['bvn'],
        'bank_id': bank_device['bank_id']
    }

    # Debit the holder and add transaction
    debit = debit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category='WITHDRAWAL', channel='CARD', nonce=nonce)
    add_transaction(debit)

    # Balance the books if the transaction was a success
    if (debit['status'] == 'SUCCESS') and reverse:
        # Simulate time passing
        tick(60 * 10)

        # Credit the related account and add the transaction
        credit = credit_account(holder={**holder, 'balance': debit['balance']}, related=related, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
        add_transaction(credit)

In [22]:
def atm_deposit(holder, amount, nonce, reverse):
    """
        Simulates card deposit transactions

        @params holder: The details of the account the money is leaving
        @params amount: The amount to be credited
        @params nonce: The nonce of the transaction
    """

    # Select a random bank device for this transaction
    bank_device = random_atm(holder['latitude'], holder['longitude'])

    # Get the device id
    device_id = bank_device['device_id']

    # Get the device location
    location = {
        'state': bank_device['state'],
        'longitude': bank_device['longitude'],
        'latitude': bank_device['latitude']
    }

    # Set the related party of this transaction (The holder's bank)
    related = {
        'account_no': bank_device['account_no'],
        'bvn': bank_device['bvn'],
        'bank_id': bank_device['bank_id']
    }

    # Credit the holder and add transaction
    credit = credit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category='DEPOSIT', channel='CARD', nonce=nonce)
    add_transaction(credit)

In [23]:
def atm_payment(holder, amount, nonce, reverse):
    """
        Simulates card payment transactions

        @params holder: The details of the account the money is leaving
        @params amount: The amount to be credited
        @params nonce: The nonce of the transaction
        @params reverse: A random chance of reversing the transaction
    """

    # Select a random bank device
    # Select a random bank device for this transaction
    bank_device = random_atm(holder['latitude'], holder['longitude'])

    # Get the device id
    device_id = bank_device['device_id']

    # Get the device location
    location = {
        'state': bank_device['state'],
        'longitude': bank_device['longitude'],
        'latitude': bank_device['latitude']
    }

    # Select a random recipient account
    account = random_account(exclude=holder['account_no'])

    # Set the relate account details
    related = {
        'account_no': account.name,
        'bvn': account['bvn'],
        'bank_id': account['bank_id'],
        'balance': account['balance']
    }

    # Set the category of the transaction randomly
    category = random.choice(['PAYMENT', 'BILL'])

    # Debit the holder and update the transactions dataframe
    debit = debit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category=category, channel='CARD', nonce=nonce)
    add_transaction(debit)

    # Was the transaction a success; balance the books.
    if (debit['status'] == 'SUCCESS'):
        # Simulate time passing
        tick(60 * 2)

        # Credit the related account and update the transactions dataframe
        credit = credit_account(holder=related, related=holder, location=location, device_id=device_id, amount=amount, category=category, channel='CARD', nonce=nonce)
        add_transaction(credit)

        # For some reason a reveral was initiated.
        if reverse:
            # Time passes
            tick(60 * 10)

            # Reverse the debit and update transactions dataframe
            debit_reversal = credit_account(holder={**holder, 'balance': debit['balance']}, related=related, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
            add_transaction(debit_reversal)

            # Reverse the credit and update transactions dataframe
            credit_reversal = debit_account(holder={**related, 'balance': credit['balance']}, related=holder, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
            add_transaction(credit_reversal)

In [24]:
def pos_withdrawal(holder, amount, nonce, reverse):
    """
        Simulates card withdrawal transactions

        @params holder: The details of the account the money is leaving
        @params amount: The amount to be credited
        @params nonce: The nonce of the transaction
        @params reverse: A random chance of reversing the transaction
    """

    # Select a random merchant for this transaction
    merchant = random_merchant(holder['latitude'], holder['longitude'])

    # Get the device id
    device_id = merchant['device_id']

    # Get the device location
    location = {
        'state': merchant['state'],
        'longitude': merchant['longitude'],
        'latitude': merchant['latitude']
    }

    # Set the related party of this transaction (The holder's bank)
    related = {
        'account_no': merchant['account_no'],
        'bvn': merchant['bvn'],
        'bank_id': merchant['bank_id'],
        'balance': merchant['balance']
    }

    # Debit the holder and update the transactions dataframe
    debit = debit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category='WITHDRAWAL', channel='CARD', nonce=nonce)
    add_transaction(debit)

    # Was the transaction a success; balance the books.
    if (debit['status'] == 'SUCCESS'):
        # Simulate time passing
        tick(60 * 2)

        # Credit the related account and update the transactions dataframe
        credit = credit_account(holder=related, related=holder, location=location, device_id=device_id, amount=amount, category='DEPOSIT', channel='CARD', nonce=nonce)
        add_transaction(credit)

        # For some reason a reveral was initiated.
        if reverse:
            # Time passes
            tick(60 * 10)

            # Reverse the debit and update transactions dataframe
            debit_reversal = credit_account(holder={**holder, 'balance': debit['balance']}, related=related, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
            add_transaction(debit_reversal)

            # Reverse the credit and update transactions dataframe
            credit_reversal = debit_account(holder={**related, 'balance': credit['balance']}, related=holder, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
            add_transaction(credit_reversal)

In [25]:
def pos_payment(holder, amount, nonce, reverse):
    """
        Simulates card payment transactions

        @params holder: The details of the account the money is leaving
        @params amount: The amount to be credited
        @params nonce: The nonce of the transaction
        @params reverse: A random chance of reversing the transaction
    """

    # Select a random merchant for this transaction
    merchant = random_merchant(holder['latitude'], holder['longitude'])

    # Get the device id
    device_id = merchant['device_id']

    # Get the device location
    location = {
        'state': merchant['state'],
        'longitude': merchant['longitude'],
        'latitude': merchant['latitude']
    }

    # Set the related party of this transaction (The holder's bank)
    related = {
        'account_no': merchant['account_no'],
        'bvn': merchant['bvn'],
        'bank_id': merchant['bank_id'],
        'balance': merchant['balance']
    }

    # Set the category of the transaction randomly
    category = random.choice(['PAYMENT', 'BILL'])

    # Debit the holder and update the transactions dataframe
    debit = debit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category=category, channel='CARD', nonce=nonce)
    add_transaction(debit)

    # Was the transaction a success; balance the books.
    if (debit['status'] == 'SUCCESS'):
        # Simulate time passing
        tick(60 * 2)

        # Credit the related account and update the transactions dataframe
        credit = credit_account(holder=related, related=holder, location=location, device_id=device_id, amount=amount, category=category, channel='CARD', nonce=nonce)
        add_transaction(credit)

        # For some reason a reveral was initiated.
        if reverse:
            # Time passes
            tick(60 * 10)

            # Reverse the debit and update transactions dataframe
            debit_reversal = credit_account(holder={**holder, 'balance': debit['balance']}, related=related, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
            add_transaction(debit_reversal)

            # Reverse the credit and update transactions dataframe
            credit_reversal = debit_account(holder={**related, 'balance': credit['balance']}, related=holder, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel='CARD', nonce=nonce)
            add_transaction(credit_reversal)

In [26]:
def mobile_transfer(holder, amount, nonce, reverse):
    """
        Simulates mobile transfer transactions

        @params holder: The details of the account the money is leaving
        @params amount: The amount to be credited
        @params nonce: The nonce of the transaction
        @params reverse: A random chance of reversing the transaction
    """

    # Get the user's BVN
    user = users_df[users_df.index == holder['bvn']].squeeze()

    # Select a random device belonging to the user or a random device
    device_id = random.choice(user['devices']) if random.random() >= .95 else random.choice(users_df.sample(n=1).squeeze()['devices'])

    # Set a location for the transaction (Randomly or User's Location)
    location = random_location(holder['latitude'], holder['longitude'])

    # Select a random recipient account
    account = random_account(exclude=holder['account_no'])

    # Set the relate account details
    related = {
        'account_no': account.name,
        'bvn': account['bvn'],
        'bank_id': account['bank_id'],
        'balance': account['balance']
    }

    # Select a random channel for the transaction
    channel = random.choices(['APP', 'USSD'], [3, 1], k=1)[0]

    # Debit the holder and update the transactions dataframe
    debit = debit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category='WITHDRAWAL', channel=channel, nonce=nonce)
    add_transaction(debit)

    # Update the books if the debit was successful
    if (debit['status'] == 'SUCCESS'):
        tick(60 * 2)
        # Credit the related account and update the transactions dataframe
        credit = credit_account(holder=related, related=holder, location=location, device_id=device_id, amount=amount, category='DEPOSIT', channel=channel, nonce=nonce)
        add_transaction(credit)

        # Should the transaction be reversed
        if reverse:
            # Time passes
            tick(60 * 10)

            # Reverse the debit and update the transactions dataframe
            debit_reversal = credit_account(holder={**holder, 'balance': debit['balance']}, related=related, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel=channel, nonce=nonce)
            add_transaction(debit_reversal)

            # Reverse the credit and update the transactions dataframe
            credit_reversal = debit_account(holder={**related, 'balance': credit['balance']}, related=holder, location=location, device_id=device_id, amount=amount, category='REVERSAL', channel=channel, nonce=nonce)
            add_transaction(credit_reversal)

In [27]:
def take_loan(holder, amount, nonce, reverse):
    """
        Simulates loan transactions

        @params holder: The details of the account the money is leaving
    """

    # Get the user's BVN
    user = users_df[users_df.index == holder['bvn']].squeeze()

    # Select a random device belonging to the user or a random device
    device_id = random.choice(user['devices']) if random.random() >= .95 else random.choice(users_df.sample(n=1).squeeze()['devices'])

    # Set a location for the transaction (Randomly or User's Location)
    location = random_location(holder['latitude'], holder['longitude'])

    # Select the bank for the account
    holder_account = accounts_df.loc[holder['account_no']]
    bank_id = holder_account['bank_id']

    # Set the relate account details
    related = {
        'account_no': bank_id,
        'bvn': bank_id,
        'bank_id': bank_id
    }

    # Select a random channel for the transaction
    channel = random.choices(['APP', 'USSD'], [3, 1], k=1)[0]

    # Debit the holder and update the transactions dataframe
    credit = credit_account(holder=holder, related=related, location=location, device_id=device_id, amount=amount, category='LOAN', channel=channel, nonce=nonce)
    add_transaction(credit)

In [28]:
EVENTS = ['ATM_WITHDRAWAL', 'ATM_PAYMENT', 'ATM_DEPOSIT', 'POS_WITHDRAWAL', 'POS_PAYMENT', 'MOBILE_TRANSFER', 'TAKE_LOAN'] # Possible types of events to simulate.
OCCURANCE = [1, 1, 2, 3, 3, 7, 3]

# Create a map for the events.
EVENT_MAPS = {
    'ATM_WITHDRAWAL': atm_withdrawal,
    'ATM_DEPOSIT': atm_deposit,
    'ATM_PAYMENT': atm_payment,
    'POS_WITHDRAWAL': atm_withdrawal,
    'POS_PAYMENT': atm_payment,
    'MOBILE_TRANSFER': mobile_transfer,
    'TAKE_LOAN': take_loan
}

In [29]:
def generate_event():
    """
        Simulates events that can take place in the banking process
    """

    # Selects a random event
    event = random.choices(EVENTS, OCCURANCE, k=1)[0]

    # Selects a random account to initiate event
    account = accounts_df[accounts_df['balance'] >= MIN_AMOUNT].sample(n=1).squeeze()
    balance = account['balance']

    # Decide on amount for transaction
    spend_limit = random.choices([.1, .4, .7, 1], [.7, .4, .3, .1], k=1)[0]
    amount = random_amount(level=account['kyc'], limit=spend_limit * balance)

    # Generate transaction nonce
    nonce = generate_nonce()

    # Will transaction be reversed?
    reverse = random.random() > .9

    user = users_df.loc[users_df.index == account['bvn']].squeeze()

    # Set the account holders details
    holder = {
        'account_no': account.name,
        'bvn': account['bvn'],
        'bank_id': account['bank_id'],
        'balance': account['balance'],
        'latitude': user['latitude'],
        'longitude': user['longitude']
    }

    # Play event
    EVENT_MAPS[event](holder, amount, nonce, reverse)

    # A new user comes in
    None if random.random() < .95 else new_user() if random.random() > .5 else new_account(account['bvn'])

In [30]:
np.random.seed(seed=SEED)
period = 60 * 60 * 24

def simulate(duration):
    """
        Simulate banking process for a given duration

        @params duration: The duration of the simulation
    """

    milestone = 0

    # Run for each scene
    while (duration > TICKER):
        start_time = time.perf_counter()

        # Factoring time for sleep and low transaction volumn
        if BASE_TIME.hour <= 6:
            generate_event() if random.random() > .7 else tick(60 * 5)
        else:
            generate_event()

        end_time = time.perf_counter()
        progress = TICKER // period
        time_taken = (end_time - start_time) * 1000  # convert to seconds

        if milestone < progress:
            milestone = progress
            print(f'Season: {milestone} Time: {time_taken}s')


# Simulate for 1 year
duration = period * 365
simulate(duration)

Season: 1 Time: 4.1293749964097515s
Season: 2 Time: 3.727417002664879s
Season: 3 Time: 4.836833002627827s
Season: 4 Time: 4.971125003066845s
Season: 5 Time: 5.594542002654634s
Season: 6 Time: 3.8776669971412048s
Season: 7 Time: 6.732416994054802s
Season: 8 Time: 5.116625005030073s
Season: 9 Time: 5.072708998341113s
Season: 10 Time: 4.047582988278009s
Season: 11 Time: 3.4521250054240227s
Season: 12 Time: 3.4545840026112273s
Season: 13 Time: 7.708499993896112s
Season: 14 Time: 5.405458010500297s
Season: 15 Time: 7.193750003352761s
Season: 16 Time: 5.476000005728565s
Season: 17 Time: 6.195958005264401s
Season: 18 Time: 7.177666004281491s
Season: 19 Time: 6.205582991242409s
Season: 20 Time: 4.394999996293336s
Season: 21 Time: 5.5442919983761385s
Season: 22 Time: 3.6593749973690137s
Season: 23 Time: 7.49550000182353s
Season: 24 Time: 5.3877079917583615s
Season: 25 Time: 4.548707991489209s
Season: 26 Time: 6.5152499882970005s
Season: 27 Time: 4.559124994557351s
Season: 28 Time: 7.58066598791

## Saving the datasets

In [31]:
# Let's concatinate the transactions of all banks into one dataset.
transactions_df = pd.concat(transactions.values())

In [38]:
transactions_df

Unnamed: 0,amount,balance,time,holder,holder_bvn,holder_bank,related,related_bvn,related_bank,state,latitude,longitude,status,type,category,channel,device,nonce,reported
0,32458.22,3.245822e+04,2024-09-07 09:42:16.133128,ACC_0000000020,USER_000000000020,BANK_00000,BANK_00000,BANK_00000,BANK_00000,Kaduna,10.541201,7.444884,SUCCESS,CREDIT,OPENING,APP,MOBILE_00020_00001,953858487001905987532,False
1,9060154.27,9.060154e+06,2024-09-07 09:49:25.133128,ACC_0000000196,USER_000000000196,BANK_00000,BANK_00000,BANK_00000,BANK_00000,Niger,6.465194,7.498886,SUCCESS,CREDIT,OPENING,APP,MOBILE_00196_00001,685440985787190852952,False
2,8953.68,8.953680e+03,2024-09-07 09:49:56.133128,ACC_0000000210,USER_000000000210,BANK_00000,BANK_00000,BANK_00000,BANK_00000,Akwa Ibom,6.560176,4.232969,SUCCESS,CREDIT,OPENING,APP,MOBILE_00210_00001,688258551699592298468,False
3,12889.66,1.288966e+04,2024-09-07 09:52:06.133128,ACC_0000000263,USER_000000000263,BANK_00000,BANK_00000,BANK_00000,BANK_00000,Katsina,12.987941,7.622800,SUCCESS,CREDIT,OPENING,APP,MOBILE_00263_00001,687216714536206082135,False
4,9724462.31,9.724462e+06,2024-09-07 09:52:57.133128,ACC_0000000281,USER_000000000281,BANK_00000,BANK_00000,BANK_00000,BANK_00000,Sokoto,13.009407,5.232351,SUCCESS,CREDIT,OPENING,APP,MOBILE_00281_00001,325635600518651313685,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25445,219808.23,1.864606e+08,2025-09-07 07:41:24.133128,ACC_0000026616,USER_000000001535,BANK_00049,ACC_0000012488,USER_000000006751,BANK_00026,Sokoto,13.003718,5.257789,SUCCESS,CREDIT,DEPOSIT,APP,MOBILE_17423_00001,879137369135146530641,False
25446,2614386.63,5.617064e+07,2025-09-07 07:41:28.133128,ACC_0000004154,USER_000000002601,BANK_00049,ACC_0000014180,USER_000000000594,BANK_00021,Kwara,10.881490,6.181557,FAILED,DEBIT,PAYMENT,CARD,ATM_0000000086,571444283623491117909,False
25447,216081.67,1.710556e+06,2025-09-07 09:07:45.133128,ACC_0000014380,USER_000000007688,BANK_00049,ATM_0000000012,ATM_0000000012,BANK_00006,Kwara,10.832767,6.161278,SUCCESS,CREDIT,DEPOSIT,CARD,ATM_0000000012,215928828255331533807,False
25448,3426.48,6.873774e+07,2025-09-07 09:20:12.133128,ACC_0000008987,USER_000000000591,BANK_00049,ACC_0000024209,USER_000000001098,BANK_00038,Benue,10.888057,4.366624,SUCCESS,CREDIT,DEPOSIT,USSD,MOBILE_09671_00001,307141949037618058941,False


In [33]:
transactions_df.to_csv(f"../datasets/transactions.csv", index=False)

In [34]:
bank_device_df.to_csv(f"../datasets/bank_devices.csv")

In [35]:
accounts_df = accounts_df.reset_index().merge(
    users_df.reset_index()[['state', 'latitude', 'longitude', 'devices', 'user_id']], 
    how='left', 
    left_on='bvn', 
    right_on='user_id'
).drop(columns='user_id').set_index('account_no')

In [49]:
accounts_df

Unnamed: 0_level_0,balance,kyc,bvn,bank_id,merchant,state,latitude,longitude,devices,opening_device
account_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ACC_0000000001,203204.00,1,USER_000000000001,BANK_00010,False,Plateau,7.351898,4.059210,"[MOBILE_00001_00001, MOBILE_00001_00002]",MOBILE_00001_00001
ACC_0000000002,15654.96,1,USER_000000000002,BANK_00030,False,Bayelsa,7.538992,4.747898,[MOBILE_00002_00001],MOBILE_00002_00001
ACC_0000000003,392448.79,1,USER_000000000003,BANK_00034,False,Ogun,10.907129,3.678199,[MOBILE_00003_00001],MOBILE_00003_00001
ACC_0000000004,19740.26,1,USER_000000000004,BANK_00013,False,Benue,10.888057,4.366624,"[MOBILE_00004_00001, MOBILE_00004_00002]",MOBILE_00004_00002
ACC_0000000005,2100421.00,2,USER_000000000005,BANK_00040,False,Niger,6.463792,7.531530,[MOBILE_00005_00001],MOBILE_00005_00001
...,...,...,...,...,...,...,...,...,...,...
ACC_0000040990,82080.60,1,USER_000000004691,BANK_00000,False,Ondo,8.424752,5.939270,"[MOBILE_04691_00001, MOBILE_04691_00002]",MOBILE_04691_00002
ACC_0000040991,52570766.70,4,USER_000000020937,BANK_00038,False,Zamfara,6.181926,5.388727,[MOBILE_20937_00001],MOBILE_20937_00001
ACC_0000040992,71634.96,1,USER_000000020938,BANK_00026,False,FCT,9.192767,3.932866,"[MOBILE_20938_00001, MOBILE_20938_00002]",MOBILE_20938_00001
ACC_0000040993,290396.95,2,USER_000000010864,BANK_00038,False,Ekiti,6.894277,3.605152,"[MOBILE_10864_00001, MOBILE_10864_00002]",MOBILE_10864_00002


In [50]:
accounts_df.to_csv(f"../datasets/accounts.csv")