 **Data Simulation**














Getting data to simulate production environment is always challenging. From the incomplete data online and inaccurate data from many other sources, its a nightmare most times to get the necessary data to simulate the production environment.

this scripts uses different python packages, functions and modules to create a simulated supermarket data from Prince Ebeano superstores.

We stick closely to reality making sure the store's diverse physical branches are represented here but other information are purely fictitious and only for demonstration purposes


#### **Pip Install Faker**



In [None]:
pip install Faker


Collecting Faker
  Downloading Faker-35.2.0-py3-none-any.whl.metadata (15 kB)
Downloading Faker-35.2.0-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Faker
Successfully installed Faker-35.2.0


###### **Import relevant modules**




In [None]:
import csv
import random
from datetime import datetime, timedelta
from faker import Faker

# Initialize Faker and set seed for reproducibility
fake = Faker()
random.seed(42)




#### **Branch Data**

Here, data for the various branches are simulated with actual branch names and simulated data for the rest of the column

In [None]:
#----------------------------
# 1. Branch Data (with enhancements)
# ----------------------------
# For geolocation, we generate random latitudes and longitudes.
# Operating hours: we'll simulate opening and closing times.
def random_operating_hours():
    open_hour = random.randint(7, 10)
    close_hour = random.randint(20, 23)
    return f"{open_hour:02d}:00 - {close_hour:02d}:00"


branches = [
    {
        "branch_id": "BR001",
        "branch_name": "Lekki Phase I",
        "address": "Plot 9, Northern Business District, Admiralty Road, Lekki Phase I",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(6.40, 6.60), 6),
        "longitude": round(random.uniform(3.20, 3.40), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR002",
        "branch_name": "Chevron Drive",
        "address": "10, Chevron Drive, Lekki Peninsula II",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(6.40, 6.60), 6),
        "longitude": round(random.uniform(3.20, 3.40), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR003",
        "branch_name": "Oniru Estate",
        "address": "T.F. Kuboye Road, Off Akiogun Road, Oniru Estate, Maroko, Lekki Phase I",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(6.40, 6.60), 6),
        "longitude": round(random.uniform(3.20, 3.40), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR004",
        "branch_name": "Ikeja GRA",
        "address": "14, Isaac John Street, Ikeja GRA",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(6.40, 6.60), 6),
        "longitude": round(random.uniform(3.20, 3.40), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR005",
        "branch_name": "Agungi",
        "address": "5, SPG Road, Beside FIRS Office, Agungi, Lekki Phase II",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(6.40, 6.60), 6),
        "longitude": round(random.uniform(3.20, 3.40), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR006",
        "branch_name": "Adeboyo Doherty Street",
        "address": "Road 14, Plot 13, Block 54, Adeboyo Doherty Street, Lekki Phase I",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(6.40, 6.60), 6),
        "longitude": round(random.uniform(3.20, 3.40), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR007",
        "branch_name": "Gaduwa Lokogoma Junction",
        "address": "Plot 551 Abdulsalam Abubakar Way, Gaduwa Lokogoma Junction",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(9.00, 9.50), 6),  # Abuja coordinates may differ
        "longitude": round(random.uniform(7.30, 7.50), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    },
    {
        "branch_id": "BR008",
        "branch_name": "Wuse Zone 4",
        "address": "9, Cape Town Street, Wuse Zone 4",
        "operating_hours": random_operating_hours(),
        "manager_name": fake.name(),
        "manager_email": fake.email(),
        "manager_phone": fake.phone_number(),
        "latitude": round(random.uniform(9.00, 9.50), 6),
        "longitude": round(random.uniform(7.30, 7.50), 6),
        "establishment_date": fake.date_between(start_date='-30y', end_date='-5y').strftime("%Y-%m-%d"),
        "branch_size_sqft": random.randint(1000, 10000)
    }
]

##### **Department Data**

We simulate data for the following departments

*   Sales
*   Marketing
* Operations
* Finance
* Human Resources
* Inventory
* Customer Service





In [None]:

# ----------------------------
# 2. Department Data (with enhancements)
# ----------------------------
departments = [
    {
        "dept_id": "D001",
        "dept_name": "Sales",
        "department_head": fake.name(),
        "budget": random.randint(100000, 500000),
        "description": "Handles all sales activities and customer transactions.",
        "location_within_store": "Front End / Checkout Area"
    },
    {
        "dept_id": "D002",
        "dept_name": "Marketing",
        "department_head": fake.name(),
        "budget": random.randint(50000, 300000),
        "description": "Responsible for promotions and advertising.",
        "location_within_store": "Marketing Office"
    },
    {
        "dept_id": "D003",
        "dept_name": "Operations",
        "department_head": fake.name(),
        "budget": random.randint(100000, 400000),
        "description": "Oversees daily store operations.",
        "location_within_store": "Operations Hub"
    },
    {
        "dept_id": "D004",
        "dept_name": "Finance",
        "department_head": fake.name(),
        "budget": random.randint(150000, 500000),
        "description": "Manages financial records and cash flow.",
        "location_within_store": "Finance Department"
    },
    {
        "dept_id": "D005",
        "dept_name": "Human Resources",
        "department_head": fake.name(),
        "budget": random.randint(50000, 200000),
        "description": "Handles employee relations and recruitment.",
        "location_within_store": "HR Office"
    },
    {
        "dept_id": "D006",
        "dept_name": "Inventory",
        "department_head": fake.name(),
        "budget": random.randint(80000, 300000),
        "description": "Manages stock levels and reordering.",
        "location_within_store": "Stock Room"
    },
    {
        "dept_id": "D007",
        "dept_name": "Customer Service",
        "department_head": fake.name(),
        "budget": random.randint(50000, 150000),
        "description": "Assists customers and handles inquiries.",
        "location_within_store": "Service Desk"
    }
]

#### **Employee Data**

Employee data is also simulated with a function created to give a dynamic number of employee based on user demand

In [None]:


# ----------------------------
# 3. Employee Data (with enhancements)
# ----------------------------
num_employees = 100
employees = []
job_titles = ["Cashier", "Store Manager", "Stock Associate", "Customer Service Rep", "Sales Associate"]

for i in range(1, num_employees + 1):
    emp_id = f"EMP{i:03d}"
    name = fake.name()
    branch = random.choice(branches)
    branch_id = branch["branch_id"]
    dept = random.choices(
        population=departments,
        weights=[30, 10, 10, 10, 10, 20, 10],
        k=1
    )[0]
    dept_id = dept["dept_id"]
    hire_date = fake.date_between(start_date='-5y', end_date='today')
    email = fake.email()
    phone = fake.phone_number()
    job_title = random.choice(job_titles)
    address = fake.address().replace("\n", ", ")
    birth_date = fake.date_of_birth(minimum_age=21, maximum_age=65)
    gender = random.choice(["Male", "Female", "Other"])
    salary = round(random.uniform(30000, 150000), 2)
    employment_status = random.choice(["Full-time", "Part-time", "Contract"])
    shift = random.choice(["Morning", "Afternoon", "Night"])
    emergency_contact_name = fake.name()
    emergency_contact_phone = fake.phone_number()
    performance_rating = round(random.uniform(1, 5), 1)

    employees.append({
        "employee_id": emp_id,
        "name": name,
        "branch_id": branch_id,
        "department_id": dept_id,
        "hire_date": hire_date.strftime("%Y-%m-%d"),
        "email": email,
        "phone": phone,
        "job_title": job_title,
        "address": address,
        "birth_date": birth_date.strftime("%Y-%m-%d"),
        "gender": gender,
        "salary": salary,
        "employment_status": employment_status,
        "shift": shift,
        "emergency_contact_name": emergency_contact_name,
        "emergency_contact_phone": emergency_contact_phone,
        "performance_rating": performance_rating
    })



#### **Supplier Data**

In [None]:
# ----------------------------
# 4. Supplier Data
# ----------------------------
num_suppliers = 10
suppliers = []
all_categories = ["Grocery", "Household", "Toiletries", "Beverages",
                  "Snacks", "Fruits", "Vegetables", "Meat", "Dairy", "Bakery"]

for i in range(1, num_suppliers + 1):
    sup_id = f"SUP{i:03d}"
    supplier_categories = random.sample(all_categories, k=random.randint(1, 4))
    suppliers.append({
        "supplier_id": sup_id,
        "supplier_name": fake.company(),
        "contact_email": fake.company_email(),
        "contact_phone": fake.phone_number(),
        "address": fake.address().replace("\n", ", "),
        "product_categories_supplied": ", ".join(supplier_categories)
    })


#### **Product Data**

In [None]:

# ----------------------------
# 5. Product Data (with enhancements)
# ----------------------------
num_products = 50
products = []

for i in range(1, num_products + 1):
    prod_id = f"PROD{i:03d}"
    product_name = fake.catch_phrase()
    category = random.choice(all_categories)
    price = round(random.uniform(50, 5000), 2)
    description = fake.text(max_nb_chars=50)
    stock_quantity = random.randint(10, 500)
    reorder_level = random.randint(5, 50)
    supplier = random.choice(suppliers)
    supplier_id = supplier["supplier_id"]
    # Manufacturing date in the last 5 years
    manufacturing_date = fake.date_between(start_date='-5y', end_date='-1y')

    # For perishables (fruits, vegetables, meat, dairy, bakery), set an expiry date
    if category in ["Fruits", "Vegetables", "Meat", "Dairy", "Bakery"]:
        expiry_date = manufacturing_date + timedelta(days=random.randint(30, 365))
        expiry_date_str = expiry_date.strftime("%Y-%m-%d")
    else:
        expiry_date_str = ""

    weight = round(random.uniform(0.1, 10.0), 2)  # in kg
    discount_percentage = random.choice([0, 5, 10, 15, 20, 25])
    barcode = f"{random.randint(100000000000, 999999999999)}"
    brand = fake.company()
    rating = round(random.uniform(1, 5), 1)

    products.append({
        "product_id": prod_id,
        "product_name": product_name,
        "category": category,
        "price": price,
        "description": description,
        "stock_quantity": stock_quantity,
        "reorder_level": reorder_level,
        "supplier_id": supplier_id,
        "manufacturing_date": manufacturing_date.strftime("%Y-%m-%d"),
        "expiry_date": expiry_date_str,
        "weight_kg": weight,
        "discount_percentage": discount_percentage,
        "barcode": barcode,
        "brand": brand,
        "rating": rating
    })

#### **Customer Data**

In [None]:
# ----------------------------
# 6. Customer Data
# ----------------------------
num_customers = 200
customers = []

for i in range(1, num_customers + 1):
    cust_id = f"CUST{i:04d}"
    name = fake.name()
    email = fake.email()
    phone = fake.phone_number()
    address = fake.address().replace("\n", ", ")
    loyalty_points = random.randint(0, 1000)
    birthday = fake.date_of_birth(minimum_age=18, maximum_age=80)
    registration_date = fake.date_between(start_date='-5y', end_date='today')

    customers.append({
        "customer_id": cust_id,
        "name": name,
        "email": email,
        "phone": phone,
        "address": address,
        "loyalty_points": loyalty_points,
        "birthday": birthday.strftime("%Y-%m-%d"),
        "registration_date": registration_date.strftime("%Y-%m-%d")
    })

#### **Sales Data**

In [None]:

# ----------------------------
# 7. Sales Transactions Data (with enhancements)
# ----------------------------
num_transactions = 10000
transactions = []

end_date = datetime.now()
start_date = end_date - timedelta(days=2 * 365)

for i in range(1, num_transactions + 1):
    txn_id = f"TXN{i:05d}"
    random_seconds = random.randint(0, int((end_date - start_date).total_seconds()))
    txn_datetime = start_date + timedelta(seconds=random_seconds)

    # Choose a branch
    branch = random.choice(branches)
    branch_id = branch["branch_id"]

    # Choose an employee (preferably from Sales dept: D001)
    sales_employees = [emp for emp in employees if emp["branch_id"] == branch_id and emp["department_id"] == "D001"]
    if not sales_employees:
        sales_employees = [emp for emp in employees if emp["branch_id"] == branch_id]
    employee = random.choice(sales_employees)
    employee_id = employee["employee_id"]

    # For cashier id, we can assume it's the same as employee_id in this simulation
    cashier_id = employee_id

    # Choose a product and its pricing details
    product = random.choice(products)
    product_id = product["product_id"]
    unit_price = product["price"]
    quantity = random.randint(1, 10)

    # Simulate discount applied (use product discount or none)
    discount_applied = random.choice([0, product["discount_percentage"]])
    discount_value = unit_price * quantity * (discount_applied / 100)

    total_without_discount = unit_price * quantity
    total_after_discount = total_without_discount - discount_value

    # Tax at a rate of 7.5%
    tax_amount = round(total_after_discount * 0.075, 2)
    final_total_price = round(total_after_discount + tax_amount, 2)

    # Loyalty points earned: e.g., 1 point per 10 currency units spent
    loyalty_points_earned = int(final_total_price / 10)

    # Return/refund flag: 2% chance
    return_or_refund = "Yes" if random.random() < 0.02 else "No"

    transaction_type = random.choice(["In-store", "Online", "Mobile"])
    customer = random.choice(customers)
    customer_id = customer["customer_id"]

    payment_method = random.choice(["Cash", "Card", "Online Payment"])
    # Simulate payment breakdown (here we randomly decide if payment was split)
    if random.random() < 0.1:
        # Split payment: divide final_total_price between Cash and Card
        cash_amount = round(final_total_price * random.uniform(0.3, 0.7), 2)
        card_amount = round(final_total_price - cash_amount, 2)
        payment_breakdown = f"Cash: {cash_amount}, Card: {card_amount}"
    else:
        payment_breakdown = f"{payment_method}: {final_total_price}"

    transactions.append({
        "transaction_id": txn_id,
        "date": txn_datetime.strftime("%Y-%m-%d %H:%M:%S"),
        "branch_id": branch_id,
        "employee_id": employee_id,
        "cashier_id": cashier_id,
        "product_id": product_id,
        "quantity": quantity,
        "unit_price": unit_price,
        "discount_applied (%)": discount_applied,
        "discount_value": round(discount_value, 2),
        "total_without_discount": round(total_without_discount, 2),
        "total_after_discount": round(total_after_discount, 2),
        "tax_amount": tax_amount,
        "final_total_price": final_total_price,
        "loyalty_points_earned": loyalty_points_earned,
        "return_or_refund": return_or_refund,
        "transaction_type": transaction_type,
        "customer_id": customer_id,
        "payment_method": payment_method,
        "payment_breakdown": payment_breakdown
    })

# ----------------------------
# Utility Function to Write CSV Files
# ----------------------------
def write_csv(filename, data, fieldnames):
    with open(filename, "w", newline="", encoding="utf-8") as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        for row in data:
            writer.writerow(row)

# ----------------------------
# Write Out Datasets to CSV Files
# ----------------------------
write_csv("branches.csv", branches, fieldnames=[
    "branch_id", "branch_name", "address", "operating_hours",
    "manager_name", "manager_email", "manager_phone",
    "latitude", "longitude", "establishment_date", "branch_size_sqft"
])

write_csv("departments.csv", departments, fieldnames=[
    "dept_id", "dept_name", "department_head", "budget", "description", "location_within_store"
])

write_csv("employees.csv", employees, fieldnames=[
    "employee_id", "name", "branch_id", "department_id", "hire_date", "email", "phone",
    "job_title", "address", "birth_date", "gender", "salary", "employment_status", "shift",
    "emergency_contact_name", "emergency_contact_phone", "performance_rating"
])

write_csv("suppliers.csv", suppliers, fieldnames=[
    "supplier_id", "supplier_name", "contact_email", "contact_phone", "address", "product_categories_supplied"
])

write_csv("products.csv", products, fieldnames=[
    "product_id", "product_name", "category", "price", "description", "stock_quantity",
    "reorder_level", "supplier_id", "manufacturing_date", "expiry_date", "weight_kg",
    "discount_percentage", "barcode", "brand", "rating"
])

write_csv("customers.csv", customers, fieldnames=[
    "customer_id", "name", "email", "phone", "address", "loyalty_points", "birthday", "registration_date"
])

write_csv("transactions.csv", transactions, fieldnames=[
    "transaction_id", "date", "branch_id", "employee_id", "cashier_id", "product_id", "quantity",
    "unit_price", "discount_applied (%)", "discount_value", "total_without_discount",
    "total_after_discount", "tax_amount", "final_total_price", "loyalty_points_earned",
    "return_or_refund", "transaction_type", "customer_id", "payment_method", "payment_breakdown"
])

print("Data generation complete. CSV files created:")
print(" - branches.csv")
print(" - departments.csv")
print(" - employees.csv")
print(" - suppliers.csv")
print(" - products.csv")
print(" - customers.csv")
print(" - transactions.csv")




Data generation complete. CSV files created:
 - branches.csv
 - departments.csv
 - employees.csv
 - suppliers.csv
 - products.csv
 - customers.csv
 - transactions.csv


#### **Download Zipped CSV**

In [None]:
!zip data.zip *.csv

  adding: branches.csv (deflated 44%)
  adding: customers.csv (deflated 53%)
  adding: departments.csv (deflated 39%)
  adding: employees.csv (deflated 56%)
  adding: products.csv (deflated 48%)
  adding: suppliers.csv (deflated 39%)
  adding: transactions.csv (deflated 75%)
