
##  `df_transactions` Generator

We’ll simulate:

* \~2 million transactions (average of 20 per customer)
* Across 500 merchants
* Over a year
* Include fraud markers, timestamp logic, and event types

---

### Assumptions

* Every customer may have 5–30 transactions
* Each transaction is linked to a customer and a merchant
* Fraud is rare (\~0.5%–1% flag)
* Channels: POS, Transfer, Web, Mobile
* Types: Deposit, Withdrawal, Transfer, Payment

---



| Column             | Description                               |
| ------------------ | ----------------------------------------- |
| `transaction_id`   | UUID, unique                              |
| `customer_id`      | Foreign key                               |
| `merchant_id`      | Foreign key                               |
| `transaction_date` | Date/time in past year                    |
| `amount`           | Float, varies by type                     |
| `transaction_type` | deposit / withdrawal / payment / transfer |
| `channel`          | POS, Mobile, etc                          |
| `is_fraud`         | 0 or 1 — \~1% of records                  |

---

###  This Powers:

* LTV / CAC
* Churn timeline
* Funnel breakdowns
* Fraud model
* A/B campaign outcomes

---

### Dependencies

In [1]:
import pandas as pd
import numpy as np
import random
from faker import Faker
from datetime import datetime, timedelta

In [4]:
fake = Faker()
np.random.seed(42)
random.seed(42)


df_customers = pd.read_csv("customers.csv")
df_merchants = pd.read_csv("merchants.csv")
# Load your actual customer and merchant IDs
customer_ids = df_customers['customer_id'].tolist()
merchant_ids = df_merchants['merchant_id'].tolist()

In [None]:
# Config
transaction_types = ['deposit', 'withdrawal', 'payment', 'transfer']
channels = ['POS', 'Mobile', 'Web', 'Transfer']
transaction_data = []

# For progress logging
n_customers = len(customer_ids)

for cust_id in customer_ids:
    num_txns = random.randint(5, 30)
    for _ in range(num_txns):
        merchant_id = random.choice(merchant_ids)
        txn_date = fake.date_time_between(start_date='-1y', end_date='now')
        txn_type = random.choice(transaction_types)
        channel = random.choice(channels)

        # Amount logic
        if txn_type == 'deposit':
            amount = round(random.uniform(1000, 100000), 2)
        elif txn_type == 'withdrawal':
            amount = round(random.uniform(500, 50000), 2)
        else:
            amount = round(random.uniform(100, 25000), 2)

        # Fraud flag (1% chance)
        is_fraud = 1 if random.random() < 0.01 else 0

        transaction_data.append({
            'transaction_id': fake.uuid4(),
            'customer_id': cust_id,
            'merchant_id': merchant_id,
            'transaction_date': txn_date,
            'amount': amount,
            'transaction_type': txn_type,
            'channel': channel,
            'is_fraud': is_fraud
        })

# Create DataFrame
df_transactions = pd.DataFrame(transaction_data)


In [10]:
# Optional: Save to CSV
df_transactions.to_csv("df_transactions.csv", index=False)
print("Simulation Succesful")

Simulation Succesful


In [8]:
# Review
df_transactions.head()


Unnamed: 0,transaction_id,customer_id,merchant_id,transaction_date,amount,transaction_type,channel,is_fraud
0,e887c181-22c6-4962-88ce-1c8fb8ca1939,1ea79026-d3da-4402-b0a8-89b8a532d475,703ff905-bc41-4081-9797-7b4b03df94c8,2024-10-18 02:47:47,25244.29,deposit,Web,0
1,03baaec1-3e97-4e0d-b68c-541de032e13a,1ea79026-d3da-4402-b0a8-89b8a532d475,7f2ecadc-b0e2-4c76-8826-86fa2db6b846,2025-06-07 08:30:26,4146.49,deposit,Transfer,0
2,c8f38e66-6811-45f3-8a89-b800e7ee1847,1ea79026-d3da-4402-b0a8-89b8a532d475,a2926fcb-fbd8-4831-a459-75923475d87c,2024-11-06 01:48:17,71885.94,deposit,Mobile,0
3,e874650c-b4e1-424d-a20f-f4b93e501091,1ea79026-d3da-4402-b0a8-89b8a532d475,16d73afd-6e9c-4eea-bfd5-e4dfe5405191,2025-01-11 09:29:58,29668.65,withdrawal,Transfer,0
4,d48b190a-199f-41dc-8430-0a36f4e21174,1ea79026-d3da-4402-b0a8-89b8a532d475,b0b4b207-abfc-47aa-9eae-ef82276737db,2025-02-17 14:20:37,17342.4,withdrawal,Transfer,0


In [9]:
df_transactions['is_fraud'].value_counts(normalize=True)

is_fraud
0    0.98994
1    0.01006
Name: proportion, dtype: float64

In [12]:
df_transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1747780 entries, 0 to 1747779
Data columns (total 8 columns):
 #   Column            Dtype         
---  ------            -----         
 0   transaction_id    object        
 1   customer_id       object        
 2   merchant_id       object        
 3   transaction_date  datetime64[ns]
 4   amount            float64       
 5   transaction_type  object        
 6   channel           object        
 7   is_fraud          int64         
dtypes: datetime64[ns](1), float64(1), int64(1), object(5)
memory usage: 106.7+ MB
