This application generates realistic synthetic customer support data for demo analytics and testing purposes. It creates a comprehensive dataset that simulates a customer support operation for an audio equipment company, including support tickets, customer interactions, agent performance data, and communication channel metrics.
The generator produces statistically realistic data with configurable business rules, ensuring that relationships between entities (customers, tickets, interactions) follow real-world patterns. This makes it ideal for:
- Testing and demo customer support analytics dashboards
- Training machine learning models on support data
- Demonstrating customer service KPIs and metrics
- Creating sample datasets for business intelligence tools
The application generates 6 CSV files in the exports/ folder:
Contains support agent information.
Columns:
id: Unique agent identifierfull_name: Agent's full namefirst_name: Agent's first namelast_name: Agent's last namefte: Full-time equivalent (0.75 or 1.0)position: Job position (always "support_agent")start_date: Agent start date (YYYY-MM-DD)status: Employment status (always "active")hourly_rate_eur: Hourly rate in EUR
Contains agents specific working time metrics
Columns:
date: calendar date within agent's working perioduser_id: unique agent identifierpaid_time: paid time based on the number of FTEs of each agent (in minutes)scheduled_time: scheduled work time (in minutes)available_time: time of agent being actually available to work (scheduled time - breaks - non presence) (in minutes)interactions_time: time spend on interactions with customer and post-work (in minutes)productive_time: interactions time + other productive activities (meetings, training, admin work)
Contains customer information.
Columns:
id: Unique customer identifiername: Customer full nameemail: Customer email addressphone: Customer phone numbercountry: Customer country (UK, Germany, Austria, Netherlands, France, Belgium)
Contains support ticket information.
Columns:
ticket_id: Unique ticket identifier (TKT-XXXXX format)origin: Communication channel (email, phone, chat)symptom_cat: Issue category (troubleshooting, finance, logistics, rma, product, complaint)symptom: Specific issue descriptionstatus: Ticket status (new, open, closed)product: Product involved (headphones, speakers, amplifiers, turntables)ticket_owner: Agent ID responsible for ticketlanguage: Customer language (english, french, german)fcr: First Contact Resolution flag (0 or 1)escalated: Escalation flag (0 or 1)ticket_created: Ticket creation timestampticket_closed: Ticket closure timestamp (if closed)last_interaction_time: Last interaction timestampresolution_after_last_interaction_hours: Hours from last interaction to closurelifecycle_hours: Total ticket lifecycle in hours
Contains individual customer-agent interactions.
Columns:
interaction_id: Unique interaction identifier (INT-XXXXXX format)channel: Communication channel (email, phone, chat)customer_id: Customer identifierinteraction_created: Interaction start timestamphandle_time: Duration in minutesspeed_of_answer: Response time (hours for email, seconds for phone/chat)interaction_handled: Interaction completion timestamphandled_by: Agent ID who handled the interactionsubject: Interaction subject (currently empty)body: Interaction content (currently empty)ticket_id: Associated ticket identifier
Contains quality assurance evaluations
Columns:
eval_id: Unique evaluation identifier (QA-XXXXXX format)interaction_id: Interaction identfierqa_score: Quality assurance score (0-1)customer_critical: Customer critical error(s) flag (1 or 0)business_critical: Business critical error(s) flag (1 or 0)compliance_critical: Complience critical error(s) flag (1 or 0)
Contains phone call data including abandoned calls.
Columns:
id: Unique call identifierinitialized: Call start timestampanswered: Call answer timestamp (null if abandoned)abandoned: Call abandonment timestamp (null if answered)is_abandoned: Abandonment flag (0 or 1)
Contains chat session data including abandoned chats.
Columns:
id: Unique chat identifierinitialized: Chat start timestampanswered: Chat answer timestamp (null if abandoned)abandoned: Chat abandonment timestamp (null if answered)is_abandoned: Abandonment flag (0 or 1)
CUSTOMERS (1) ←→ (∞) INTERACTIONS
↓ ↓
Country Channel
Language Handle Time
Speed of Answer
TICKETS (1) ←→ (∞) INTERACTIONS
↓ ↓
Product Channel
Status Handle Time
FCR Speed of Answer
Origin
USERS (1) ←→ (∞) TICKETS
↓ ↓
Agent Owner
USERS (1) ←→ (∞) INTERACTIONS (1) ←→ (∞) QA_ENTRIES
↓ ↓ ↓
Agent Handler QA Score
USERS (1) ←→ (∞) WFM_ENTRIES
↓ ↓
FTE Scheduled Time
CALLS (phone channel only) - independent
CHATS (chat channel only) - independent
Key Relationships:
- Each ticket belongs is owned by one user
- Each interaction belongs to one ticket and is handled by one user
- Each customer belongs to one interaction
- Each wfm entry belongs to one user
- Each qa entry belongs to one interaction
- Calls and chats are generated from phone and chat interactions respectively
- FCR tickets have exactly 1 interaction; others have multiple based on symptom category
- Abandoned calls/chats are additional records not linked to tickets
NUM_TICKETS: Number of tickets to generate (default: 25,000)UNIQUE_CUSTOMERS: Number of unique customers (default: 6,000)UNIQUE_AGENTS: Number of support agents (default: 12)START_DATE: Data generation start date (default: 2023-09-15)END_DATE: Data generation end date (default: 2025-08-21)
MAX_INTERACTION_SPAN_HOURS: Maximum time span for interactions per ticket (default: 6 hours)ESCALATION_RATE: Probability of ticket escalation (default: 0.12)ANCHOR_CLOSURE_TO: Closure time calculation method ('last_interaction' or 'from_creation')
CHANNELS: Communication channel weights (email: 30%, phone: 40%, chat: 30%)
COUNTRIES: Customer country distribution (UK: 30%, Germany: 18%, Austria: 12%, Netherlands: 10%, France: 15%, Belgium: 5%)
troubleshooting: 50% FCR rate, 1.5 avg contacts per casefinance: 0% FCR rate, 2.3 avg contacts per caselogistics: 43% FCR rate, 1.8 avg contacts per caserma: 10% FCR rate, 4.1 avg contacts per caseproduct: 100% FCR rate, 1.2 avg contacts per casecomplaint: 20% FCR rate, 1.1 avg contacts per case
- Calls: 7% average abandonment rate (±3%)
- Chats: 10% average abandonment rate (±3%)
- Python 3.7 or higher
- pip package manager
-
Clone or download the project files
-
Setup virtual environment
python -m venv venv
python venv/Scripts/activate
- Install dependencies:
pip install -r requirements.txt
Edit the configuration in config/settings.py:
@dataclass
class Config:
# Change basic counts
NUM_TICKETS: int = 25000 # Modify this value
UNIQUE_CUSTOMERS: int = 6000 # Modify this value
UNIQUE_AGENTS: int = 12 # Modify this value
# Change date range
START_DATE: datetime = datetime(2023, 9, 15) # Modify dates
END_DATE: datetime = datetime(2025, 8, 21)
# Modify other parameters in __post_init__ methodBasic execution:
python main.pyOutput:
- Creates
exports/folder automatically - Generates 8 CSV files in
exports/folder - Displays generation statistics and analysis in console
- Typical runtime: 30-60 seconds for default dataset size
data_generator/
├── main.py # Entry point - run this file
├── orchestrator.py # Main coordination logic
├── utils.py
| ├── __init__.py
| ├── utils.py # Utility functions (date generation, statistics)
| ├── data_exporter # Data export class
├── config/
│ ├── __init__.py
│ └── settings.py # All configuration parameters
├── generators/ # Data generation classes
│ ├── __init__.py
│ ├── base_generator.py # Abstract base class
│ ├── user_generator.py # Support agent data
│ ├── customer_generator.py # Customer data
│ ├── ticket_generator.py # Support ticket data
│ ├── interaction_generator.py # Customer-agent interactions
│ ├── call_chat_generator.py # Call and chat channel data
│ ├── wfm_generator # WFM data
│ └── qa_generator # QA data
├── models/ # Data model definitions
│ ├── __init__.py
│ └── entities.py # Dataclass models for type safety
├── analysis/ # Export and analysis functionality
│ ├── __init__.py
│ └── metrics.py # Metrics calculators
└── exports/ # Generated output files (created automatically)
├── users_table.csv
├── customers_table.csv
├── tickets_table.csv
├── interactions_table.csv
├── calls_table.csv
└── chats_table.csv
- Orchestrator: Manages generation workflow and dependencies between data tables
- Generators: Individual classes responsible for generating each data type
- Models: Dataclass definitions providing type safety and business logic methods
- Config: Centralized configuration management
- Utils: Shared utility functions for date/time generation and statistics
- Analysis: Export functionality and data validation
- Python 3.13+
- pandas: Data manipulation and CSV export
- numpy: Statistical distributions and numerical operations
- faker: Realistic fake data generation (names, emails, addresses)
- Factory Pattern: For creating data models
- Strategy Pattern: For different closure time calculation methods
- Builder Pattern: For orchestrating complex data generation workflows
- Dependency Injection: For passing required data between generators
- Type Safety: Dataclass models with type hints
- Data Validation: Built-in integrity checks and business rule validation
- Configurable: All parameters centralized and easily modifiable
- Extensible: Easy to add new data tables or modify existing ones
- Reproducible: Configurable random seed for consistent results
- Realistic: Statistical distributions based on real customer support patterns