### Merchants.csv

| Column          | Type | Notes                       |
| --------------- | ---- | --------------------------- |
| merchant\_id    | UUID | Primary key                 |
| merchant\_name  | str  | Business name               |
| category        | str  | e.g., Food, Transport, Tech |
| city            | str  | Region/city of operation    |
| onboarded\_date | date | When the merchant joined    |
| risk\_flag      | int  | 1 = high risk, 0 = normal   |


In [1]:
# simulate_merchants.py
from faker import Faker
import pandas as pd
import uuid
import random
from datetime import datetime, timedelta

fake = Faker()
Faker.seed(43)
random.seed(43)

def generate_merchants(n=500):
    categories = ['Food', 'Transport', 'Electronics', 'Fashion', 'Travel', 'Healthcare', 'Tech']
    cities = ['Lagos', 'Abuja', 'Ibadan', 'Kano', 'PH', 'Enugu']

    merchants = []
    for _ in range(n):
        merchant_id = str(uuid.uuid4())
        name = fake.company()
        category = random.choice(categories)
        city = random.choice(cities)
        onboarded_date = fake.date_between(start_date='-3y', end_date='today')
        risk_flag = random.choices([0, 1], weights=[0.95, 0.05])[0]  # 5% fraud-prone

        merchants.append({
            'merchant_id': merchant_id,
            'merchant_name': name,
            'category': category,
            'city': city,
            'onboarded_date': onboarded_date,
            'risk_flag': risk_flag
        })

    return pd.DataFrame(merchants)

df_merchants = generate_merchants(500)
df_merchants.to_csv('merchants.csv', index=False)
print("✅ Merchants simulated and saved.")

✅ Merchants simulated and saved.


In [2]:
df_merchants.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   merchant_id     500 non-null    object
 1   merchant_name   500 non-null    object
 2   category        500 non-null    object
 3   city            500 non-null    object
 4   onboarded_date  500 non-null    object
 5   risk_flag       500 non-null    int64 
dtypes: int64(1), object(5)
memory usage: 23.6+ KB


| Column           | 💬 Insight                                                                   |
| ---------------- | ---------------------------------------------------------------------------- |
| `merchant_id`    | Perfect. Unique key — this links beautifully with transactions.              |
| `merchant_name`  | Clean. Useful for dashboard labels, but you can mask this for compliance.    |
| `category`       | Excellent for segmentation (e.g., retail vs. tech vs. food).                 |
| `city`           | Useful for regional segmentation and city-based fraud heatmaps.              |
| `onboarded_date` | Helps track merchant lifecycle and activation funnel.                        |
| `risk_flag`      | Brilliant. Will be golden for fraud modeling and explainability (SHAP/LIME). |
