A Python-based digital twin pipeline that identifies the per-storey stiffness of a three-storey aluminium shear frame from shaking table measurements, using Transitional Markov Chain Monte Carlo (TMCMC) Bayesian model updating.
Author: Osman Mukuk Supervisor: Dr Marco De Angelis Institution: University of Strathclyde, BEng Civil Engineering, 2025-26
This repository contains the code and data for an undergraduate dissertation that develops, calibrates and validates a digital twin for a laboratory shear frame tested at the University of Strathclyde. The pipeline is open-source, reproducible and grounded in real experimental data. Every stage, from the raw acceleration records to the final posterior stiffness distributions, is documented and can be regenerated by running the scripts in this repository.
The pipeline processes acceleration records from two independent test sessions on a Centrotecnica shaking table. Each session contains six types of dynamic excitation: free vibration, impact, three harmonic tests (one per mode), and earthquake-type broadband excitation. The natural frequencies of the structure are identified from the free vibration tails at the end of each test. These frequencies, together with their empirical measurement uncertainty, are used to update the three storey stiffness parameters of a shear-building forward model through Bayesian inference.
The pipeline is organised into five modules. Module 1 performs system identification, extracting natural frequencies, mode shapes and damping ratios from the acceleration records. Module 2 builds the forward model and computes an initial stiffness estimate from the measured geometry. Module 3 runs the TMCMC sampler to obtain posterior distributions for the three per-storey stiffnesses. Module 4 validates the calibrated model against independent measurements from Session 2. Module 5 propagates the geometric measurement tolerances through the forward model separately from the Bayesian posterior, providing a combined uncertainty budget.
dt/
├── README.md # This file
├── requirements.txt # Python dependencies
├── .gitignore # Files excluded from version control
│
├── config.py # All structural, experimental and BMU parameters
├── forward_model.py # 3-DOF eigenvalue solver and stiffness matrix
├── signal_processing.py # Data loading, FFT, PSD, damping, mode shapes
├── bayesian_updating.py # TMCMC sampler and posterior diagnostics
├── uncertainty_analysis.py # Geometric tolerance propagation, sensitivity
├── run_digital_twin.py # Main pipeline (runs all five modules in order)
├── generate_plots.py # Detailed amplitude and PSD plots per test
├── convert_data.py # Converts Centrotecnica xlsx to compressed npz
│
├── data/ # Experimental data
│ ├── session_1.npz # Session 1 (calibration), 6 tests, 13 MB
│ └── session_2.npz # Session 2 (validation), 6 tests, 12 MB
│
├── figures/ # Generated by run_digital_twin.py (16 plots)
└── output/ # Generated by run_digital_twin.py and generate_plots.py
├── accelerograms/ # Raw acceleration time histories (12 plots)
├── free_vib_windows/ # Free vibration tail windows (8 plots)
├── amplitude/ # Individual amplitude spectra (36 plots)
└── psd/ # Individual PSD plots (36 plots)
The figures/ and output/ folders are not tracked by git. They are
regenerated every time the pipeline is run.
The pipeline requires Python 3.9 or newer. To install the dependencies:
pip install -r requirements.txtThe dependencies are numpy, scipy, matplotlib and openpyxl. The
last one is only needed if you want to regenerate the npz files from
the original xlsx data using convert_data.py.
The main pipeline runs in one command:
python run_digital_twin.pyThis takes approximately one minute on a typical laptop. It produces
console output reporting the identified frequencies, the TMCMC stages,
the posterior means and standard deviations, the validation results,
and the uncertainty budget. It also writes 16 figures to the figures/
folder and 20 plots to output/free_vib_windows/ and
output/accelerograms/.
After the main pipeline finishes, you can optionally generate the full set of detailed per-test amplitude and PSD plots:
python generate_plots.pyThis adds 72 plots to output/amplitude/ and output/psd/ (one plot
per floor per test per spectrum type). It takes approximately three
minutes because it processes each test individually at high resolution.
The experimental data is stored in two compressed NumPy archive files:
data/session_1.npz Session 1 (calibration), 6 dynamic tests
data/session_2.npz Session 2 (validation), 6 dynamic tests
Each archive contains six arrays, one per test. Each array has shape
(N, 4) where column 0 is the time axis (seconds) and columns 1 to 3
are the accelerations at Floor 1, Floor 2 and Floor 3 (arbitrary units,
as output by the DAQ). The sampling rate is 2048 Hz. Record lengths
vary between tests: impact tests are 15-20 seconds, harmonic and
earthquake tests are 40-50 seconds.
The original raw data was supplied as Centrotecnica xlsx files (82 MB
total). These were converted to npz format (25 MB total) for efficient
loading and storage. If you have the original xlsx files, you can
regenerate the npz archives by placing the xlsx files in data/ and
running:
python convert_data.pyThe conversion preserves full float64 precision.
All parameters are defined in config.py. This is the only file you
should need to edit if you want to adapt the pipeline to a different
structure or dataset. The main sections are:
Structural properties — floor masses, Young's modulus, column dimensions, number of columns per storey, storey heights. The default values correspond to the three-storey EN AW-6082-T6 aluminium frame used in this project.
Measurement tolerances — the resolutions of the instruments used to measure the column depth, column width and storey heights. These propagate through Module 5 to give the geometric contribution to the total uncertainty. Young's modulus and floor mass are treated as fixed values, following the approach used by Bonney et al. (2022) for a comparable laboratory structure.
Data files — paths to the two session archives and the sheet names for each of the six tests per session.
TMCMC settings — prior bounds on the stiffness parameters (uniform
distribution between K_LO and K_HI), the number of samples per
stage (NSAMPLES), the proposal scaling factor (TMCMC_BETA) and the
random seed (SEED). The seed ensures that every run produces
identical results.
Frequency identification. Natural frequencies are extracted from the free vibration tail at the end of each dynamic test. The excitation end point is detected by computing the sliding RMS of the Floor 3 acceleration in 0.5-second windows and identifying the last window where the RMS exceeds 25% of its peak value. A 0.5-second buffer is added after this point, and everything that follows is taken as the free vibration tail. The FFT of this tail is computed using the full available length with a Hanning window, and the three strongest peaks above 5 Hz are identified as the natural frequencies.
Tail quality filter. A tail is considered clean if it is at least 5 seconds long (to provide FFT frequency resolution of 0.2 Hz or better) and its RMS amplitude is less than 20% of the mid-record RMS (confirming the excitation has effectively stopped). In the default dataset, 5 out of 6 Session 1 tests and 3 out of 6 Session 2 tests satisfy both conditions. The excluded tests are retained in the raw data but not used for calibration or validation.
Calibration target. The mean and standard deviation of the frequencies identified from the 5 clean Session 1 tails serve as the calibration target and the empirical measurement uncertainty entering the Bayesian likelihood.
Likelihood function. The likelihood is a frequency-only Gaussian on the residuals between predicted and measured frequencies, using the empirical standard deviations as the noise scale. Mode shapes are computed as a post-hoc diagnostic but are not included in the likelihood, because the three-sensor setup does not produce mode shapes of sufficient quality for two of the three modes.
Validation. The calibrated posterior is validated against the 3 clean Session 2 tests, which were not used during calibration. Each test is classified as pass or fail for each mode based on whether the identified frequency falls within the 95% posterior credible interval.
Uncertainty budget. The Bayesian posterior standard deviations capture the measurement variability. The geometric measurement tolerances are propagated separately through the forward model using Monte Carlo simulation, giving a second uncertainty component. The two components are combined via the root sum of squares to give the total stiffness uncertainty.
Running run_digital_twin.py with the default configuration produces
the following posterior stiffness distributions:
| Parameter | Mean (N/m) | Standard deviation (N/m) | Relative uncertainty |
|---|---|---|---|
| k1 (bottom storey) | 52,508 | 1,428 | 2.7% |
| k2 (middle storey) | 56,956 | 2,074 | 3.6% |
| k3 (top storey) | 66,778 | 2,501 | 3.7% |
The posterior mean reproduces the Session 1 calibration frequencies (7.203, 20.961, 30.435 Hz) to within 0.25% for all three modes. The Session 2 validation reveals a systematic upward shift of 0.1-0.2 Hz in the measured frequencies, attributable to reassembly of the frame between sessions.
Because TMCMC uses a fixed random seed, these values are fully reproducible. Every run of the pipeline on the same data produces identical output.
The pipeline is designed to be adapted to other three-storey shear frames or, with more effort, to structures with different numbers of degrees of freedom. To use a different structure:
- Update the structural properties and measurement tolerances in
config.py. - Replace the data files in
data/with your own, using the same format (npz archive with one array per test, each of shape(N, 4)containing time and three floor accelerations). - Update
SESSION_1_FILE,SESSION_2_FILE,S1_SHEETSandS2_SHEETSinconfig.pyto point to your new files and match your test names.
For structures with a different number of storeys, the forward model and BMU modules would need to be generalised. This is not supported by the current code but is a natural extension.
Ching, J., & Chen, Y.-C. (2007). Transitional Markov Chain Monte Carlo method for Bayesian model updating, model class selection, and model averaging. Journal of Engineering Mechanics, 133(7), 816-832.
Bonney, M. S., de Angelis, M., Dal Borgo, M., Andrade, L., Beregi, S., Jamia, N., & Wagg, D. J. (2022). Development of a digital twin operational platform using Python Flask. Data-Centric Engineering, 3, e1.
Chopra, A. K. (2012). Dynamics of Structures: Theory and Applications to Earthquake Engineering (4th ed.). Pearson.
Lye, A., Cicirello, A., & Patelli, E. (2021). Sampling methods for solving Bayesian model updating problems: A tutorial. Mechanical Systems and Signal Processing, 159, 107760.
This code is released as open source to accompany the dissertation "An Open-Source Digital Twin for Structural Dynamics" (University of Strathclyde, 2026). It is free to use, modify and redistribute for academic and educational purposes. If you use the code or data in your own work, please cite the dissertation and contact the author or supervisor for any commercial or derivative applications.