dataset-sensor-signals.mp4
This project applies the Learning in the Model Space (LeMo) framework to wind turbine fault diagnosis from irregular sensor time series. Rather than performing diagnosis directly in the raw data space, the project represents each sensor sequence in an induced model space, where the temporal dynamics of the sequence can be captured in a more compact and stable form.
To make LeMo suitable for irregularly sampled wind turbine sensor data, we incorporate CtRes, a Continuous-time Reservoir Network, as the sequence fitting model. For each sensor sequence, CtRes models its temporal evolution through continuous-time reservoir dynamics and produces a corresponding readout model, which is used as the sequence representation in the model space.
Based on these representations, the project compares sensor sequences according to their underlying temporal dynamics instead of only their raw observations. This makes the approach effective for separating normal operating behavior from fault-related patterns under irregular sampling conditions.
The clustering behaviour of the learned representations is illustrated below.
The resulting framework supports two diagnosis settings:
- Offline diagnosis, where collected samples are used for fault classification
- Online monitoring, where new observations are compared with a reference model space to detect abnormal behavior
LeMo consists of three key stages for fault diagnosis in irregular sensor time series:
-
Sequence fitting in the model space
Each sensor sequence is fitted by CtRes, which captures its temporal dynamics through continuous-time reservoir state evolution. The resulting readout model is then used as the representation of that sequence in the model space.
-
Distance modeling between sequence representations
A distance metric is defined between readout model representations so that distances in the model space reflect differences in the intrinsic temporal dynamics of the original sensor sequences.
-
Fault diagnosis in the model space
Fault diagnosis is performed directly on the learned representations, supporting both offline fault classification and online streaming fault detection.
To comprehensively characterise the operating condition of the wind turbine, vibration acceleration data were collected from its major components using accelerometers, as shown below.
The dataset covers the key components of the wind turbine and provides a comprehensive description of its operating state:
-
Pitch-bearing
Vibration acceleration signals were collected from the three pitch bearings in both radial and axial directions. Each pitch bearing was monitored by two channels, resulting in six synchronized observation channels sampled at
1280 Hz. This dataset contains three condition labels: normal operation, damage in one pitch bearing, and damage in all three pitch bearings. -
Gearbox
Vibration acceleration signals were collected from six gearbox measurement positions, including the radial direction of the low-speed shaft, the radial direction of the first-stage planetary stage, the radial direction of the high-speed shaft, the axial direction of the high-speed shaft, the radial direction of the input shaft, and the radial direction of the intermediate shaft. All channels were sampled at
2560 Hz. This dataset contains three condition labels: normal operation, fault in the high-speed-end gear, and combined faults in the low-speed shaft and high-speed-end gear. -
Generator
Vibration acceleration signals were collected from the radial directions of the non-drive end and drive end of the generator, forming two synchronized observation channels sampled at
25,600 Hz. This dataset contains two condition labels: normal operation and generator-bearing damage. -
Blade
Vibration acceleration signals were collected from the three blades in both flapwise and edgewise directions. Each blade was monitored by two channels, resulting in six synchronized observation channels sampled at
1280 Hz. This dataset contains two condition labels: normal operation and single-blade abnormality. -
Main-bearing
Vibration acceleration signals were collected from the main bearing in the horizontal direction, forming a single observation channel sampled at
2560 Hz. This dataset contains two condition labels: normal operation and main-bearing damage.
This project has been tested on Windows 11 with the following Python versions:
- Python 3.11, used for the reproducible setup instructions below
- Python 3.12, tested in a project-local
.venvenvironment with dependencies installed fromrequirements.txt
For a consistent reproduction procedure, the instructions in this README use Python 3.11. Typical environment setup time is approximately 10 minutes.
Hardware requirements:
- No non-standard hardware is required. The software can be run on a CPU.
- A CUDA-capable NVIDIA GPU is optional and can accelerate feature extraction and the full demo execution.
- The reported demo runtime below was measured using a GPU in an environment with an NVIDIA driver reporting CUDA 13.1.
Environment setup:
- Create a virtual environment (Python 3.11):
conda create --prefix="./pure_py311" python=3.11- Activate the virtual environment:
conda activate ./pure_py311/- Install dependencies:
python -m pip install -r requirements.txtRun options:
-
python run.pyBy default, the script loads the five wind turbine component datasets in
WindTurbineDatasetand runs the full offline diagnosis pipeline for:- Pitch-bearing
- Gearbox
- Generator
- Blade
- Main-bearing
For each dataset, the script:
- loads the pre-split train/test
.npz - extracts CtRes features
- trains an RBF SVM
- reports training accuracy, test accuracy, classification report, and confusion matrix
- generates a final t-SNE visualization
-
python run.py path/to/split_data.npzRun the pipeline on a user-prepared split dataset.
Running the complete bundled-data demo:
python run.pytakes less than 5 minutes when using a GPU in the tested environment. A GPU is not required; CPU execution is supported but can take longer.
The command processes all five bundled datasets, prints the input shapes, CtRes configuration, extracted feature shapes, SVM accuracy, classification report, and confusion matrix for each component, and generates the t-SNE visualization shown above.
Results from one complete demo run are summarized below:
| Dataset | Train Accuracy | Test Accuracy | Confusion Matrix |
|---|---|---|---|
| Pitch-bearing | 1.0000 | 0.9420 | [[251, 0, 19], [0, 270, 0], [0, 28, 242]] |
| Gearbox | 0.9767 | 0.9840 | [[270, 0, 0], [3, 264, 3], [0, 7, 263]] |
| Generator | 0.9250 | 0.9611 | [[259, 11], [10, 260]] |
| Blade | 0.9100 | 0.9852 | [[266, 4], [4, 266]] |
| Main-bearing | 0.9400 | 0.9944 | [[269, 1], [2, 268]] |
Supported arguments:
npz_path: optional path to a split.npzdataset--dataset-dir: override theWindTurbineDatasetdirectory--batch-size: overrideCtResConfig.batch_size--num-workers: overrideCtResConfig.num_workers--device: choose compute device such ascpuorcuda--plot-tsne: display t-SNE for a single-task run or a custom split.npz
Custom dataset format:
If you want to load your own split dataset, the .npz file only needs to contain one valid name from each of the following groups:
- Train data:
x_train_irregularorX_train_irregularorx_trainorX_train - Test data:
x_test_irregularorX_test_irregularorx_testorX_test - Train timestamps:
timestamps_trainort_trainortimestep_trainortimesteps_train - Test timestamps:
timestamps_testort_testortimestep_testortimesteps_test - Train labels:
y_trainorlabels_trainortrain_labels - Test labels:
y_testorlabels_testortest_labelsortrue_labels
Expected shapes:
- train/test data:
[batch, length, channels] - train/test timestamps:
[batch, observed_length] - train/test labels:
[batch]
Main dependencies:
- python == 3.11
- numpy == 2.1.2
- torch == 2.6.0
- torchcde == 0.2.5
- torchdiffeq == 0.2.5
- scikit-learn == 1.6.1
- scipy == 1.17.1
- h5py == 3.16.0
- matplotlib == 3.10.8
- pandas == 3.0.2
- tqdm == 4.67.3
