This project provides tools for generating synthetic power grid data, processing real power grid measurements, and discriminating between real and simulated scenarios using machine learning.
This code supports a paper accepted at the 11th ACM Cyber-Physical System Security Workshop (CPSS'25). The preprint is available on Arxiv.
Cite this repository as follows:
@inproceedings{donadel2025simprocess,
title={SimProcess: High Fidelity Simulation of Noisy ICS Physical Processes},
author={Donadel, Denis and Crestanello, Gabriele and Morandini, Giulio and Antonioli, Daniele and Conti, Mauro and Merro, Massimo},
booktitle={Proceedings of the 11th ACM Cyber-Physical System Security Workshop},
pages={1--12},
year={2025},
organization={ACM Press}
}
SimProcess is a comprehensive framework designed to work with power grid data from both simulated and real sources. It enables researchers and engineers to generate synthetic power system data, analyze real-world measurements, and develop models that can distinguish between authentic and synthetic signals.
The repository is organized into three main components:
The core analysis framework that processes power system data and classifies signals as real or simulated:
- simprocess: Core library with feature extraction, machine learning, and analysis capabilities
- tools: Utility scripts for data handling
- main.py: Primary entry point for the SimProcess framework
- workflow_example.py: Example script showing a complete analysis pipeline
This directory contains different implementations of power grid simulators:
- Mosaik: Generator based on the Mosaik co-simulation framework
- Pandapower: Generator based on the Pandapower network calculation framework
- VariationalRecurrentNeuralNetwork: VRAE-based generator for power system data
Tools for working with the Electric Power and Intelligent Control (EPIC) dataset:
- preprocessor.py: Utility for transforming EPIC dataset measurements to the standardized format used by SimProcess
- Additional support files for EPIC data integration
- Synthetic Data Generation: Multiple approaches to generate realistic power grid data with configurable noise profiles
- Multi-level Noise Modeling: Support for layered noise types including Gaussian, uniform, Laplace, impulse, and more
- Feature Extraction: Statistical feature extraction from time series data
- Machine Learning Classification: Models to discriminate between real and simulated signals
- Visualization Tools: Comprehensive plotting and analysis utilities
- Pipeline Integration: End-to-end workflow from data generation/collection to final classification
The framework works with both synthetic and real-world datasets:
- Synthetic Data: Generated using the Mosaik and Pandapower simulators
- Real Data: EPIC dataset from the Singapore University of Technology and Design (SUTD)
The EPIC dataset can be requested from iTrust, Centre for Research in Cyber Security at SUTD: https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/
The repository includes several README files with detailed instructions for each component:
- See Discriminator/README.md for SimProcess analysis framework documentation
- See Generators/Mosaik/README.md and Generators/Pandapower/README.md for data generation tools
- See EPIC/README.md for information on processing real-world power grid data
MIT License