# ****Population-Based SHM Under Environmental Variability: Self supervised learning Damage Detection with Explainable AI****

# Introduction

## Context :
### Importance of SHM:
Civil infrastructure underpins the functioning of society, encompassing bridges, wind turbines, electrical pylons, and more. Depending on various factors like their importance, usage, associated risks, and potential hazards, these structures often have legally mandated inspection and maintenance plans. Traditional inspectional procedures are limited by human biases, cost, and time, leading to a pivot towards continuous, automated, real-time online systems. SHM enhances the push towards a performance-based maintenance philosophy \cite{brownjohn2007structural}.
### Cost Reduction and the Need for Monitoring:
While visual inspection is easy to implement and user-friendly, it's rife with potential human errors. It relies heavily on the inspector's experience and knowledge. These inspections, although regular, may not provide an accurate depiction of a structure's true condition, such as its life expectancy, load-bearing capacity or local/hidden damage. SHM emerges as the most effective damage detection strategy for engineering structures. 
SHM allows society to move towards a performance-based maintenance philosphy and can help in improving future design. Given the increasing dependency on these structures and the evident flaws in conventional inspection techniques, the integration of SHM systems has become essential. Damage, in this context, can be understood as any changes that adversely affect the current or future performance of a structure \cite{jayawickrema2022fibre}.
### Challenges of Traditional SHM:
Despite its efficacy, developing an SHM strategy is not without challenges. Each civil infrastructure is typically unique. This uniqueness implies that there's no universally applicable baseline for these structures. The foundation of traditional way to approach SHM is feature selection and it lies in the premise that damage will substantially alter certain properties of a structure, which in turn affects the dynamic response of the system. Although this basis seems straightforward, its practical application presents numerous technical challenges. There's a need for unsupervised learning in many scenarios because data from damaged systems might be unavailable. The unpredictability of when damage might occur and the vast time-scales over which it can develop further complicate matters. SHM also faces challenges in defining the required sensing system properties before actual deployment. If sensors face potential damage, they might need monitoring too. Consequently, the biggest hurdle in SHM system design is identifying what changes to look for and how to pinpoint them. The architecture of the SHM system heavily depends on the characteristics of the damage in a given structure \cite{farrar2007introduction}.


## Sensor Used in SHM (and hint fingerprint idea and pattern recognition):
Many sensor types find their application in SHM, especially those with high sampling rates and spatial resolution. Examples include:

- Lamb wave-based methods utilize piezoelectric transducers (PZT) \cite{kessler2002damage}.

- Inclinometers, as employed by \cite{zhang2017bridge}, bridge deflections is estimated to detect and locate damages.

- Accelerometers, widely used in SHM, particularly in output-only contexts, have demonstrated their effectiveness in real-world applications like bridges, wind turbines, and electrical pylons \cite{maes2022validation, erazo2019vibration, magalhaes2012vibration, deraemaeker2008vibration, zhang2022vibration, weijtjens2014dealing, bel2023Feasibility}.

- Strain sensors, including strain gauges and fiber Bragg sensors, are primarily employed for fatigue rate estimation and damage detection \cite{sadeghi2022fatigue, iliopoulos2017fatigue, weil2023embedded}.


## Transition to overcome challenges:
The data obtained from these sensors can be likened to a structure's fingerprint. SHM, as described by \cite{farrar2007introduction, sohn2001applying}, can be viewed through the lens of pattern recognition. Typically in SHM, data undergoes feature extraction and selection, condensing the structure's "fingerprint". The selected features are then dependent on the specific damage types targeted, which must then be normalized considering environmental or operational variabilities. Feature selection in SHM, often considered an art\cite{farrar2007introduction}, is susceptible to biases and errors. Sometimes, the process might even necessitate experimentally-validated finite element models for simulating damage scenarios.

1. **Expertise Requirement**: Manual design of these features for intricate domains demands substantial human labor, as well as extensive domain knowledge.
Loss of Fingerprint Signature: Information condensation may risk the loss of the structure's unique signature, thereby impacting the detection of a diverse range of damages.

2. **Scalability Issues**: The features are often tailored to a specific structure and its associated sensors, creating scalability concerns and limiting generalizability, especially in the evolving landscape of the Internet of Things (IoT).

3. **Complex Modules**: The multi-faceted nature of SHM, encompassing feature design, selection, normalization, and damage index calculation, and in case of OMA based solution (Mode tracking) renders it challenging to simultaneously optimize every module.

4. **Exploratory Value**: One notable upside to feature selection is its explainability. It allows stakeholders to decipher and comprehend the underlying damage mechanisms.

## Deep Learning Solution:
Deep learning offers a promising avenue by learning features directly from raw data. It can process complex patterns and efficiently handle high-dimensional data. This approach presents a more holistic data interpretation, devoid of biases. While the aspiration for such holistic strategies isn't new (as evident from studies like \cite{basseville1988detecting}), deep learning presents a novel method for complementing traditional SHM. Poorly designed features might overlook damage scenarios, missing significant impacts on a structure's dynamic response present in the original data. In medical science, deep learning's applications span ECG and EEG data analysis \cite{hosseini2020review}. Condition monitoring of rotating machinery through deep learning is also gaining traction \cite{zhao2019deep, RAO2023110109, Dohi2022}. In the realm of civil structure health monitoring, deep learning applications range from image processing for crack detection to feature normalization \cite{ye2019review, dervilis2014damage, janssens2016convolutional}.


## Scope and Research Questions
- **Holistic Damage Detection**: We aim to devise a deep learning model that either directly uses raw data or requires minimal feature engineering to detect damage in civil structures, providing a holistic understanding of structural health. While traditional methods focus on a subset of possible damage types, the proposed method targets a more exhaustive list.

- **Environmental Variability**: The model must be resilient to different environmental conditions. Environmental changes can induce significant alterations in the dynamic response of a structure, often overshadowing the effects of damage. Achieving this would ensure the model's general applicability across varying terrains and conditions.

- **Model Interpretability**: Understandability is vital, especially when decision-making based on the model affects human lives and expensive assets. How can the model's efficacy be gauged in the absence of known damage scenarios? Would there be a way to reliably validate the findings without true positives?

- **Training and Tuning Concerns**: While many datasets might have abundant healthy structure data, damaged data is rare. The proposed model should be trained predominantly on healthy data and be unsupervised concerning damage. The risk of overfitting in the context of such rare event detection is particularly high.

- **Differentiating Damage from Natural Changes**: Structures evolve with time, and not all changes are damage indicators. Distinguishing genuine damage from standard structural changes, perhaps due to aging or changing environmental conditions, becomes a crucial question.

- **Output-only Methodology**: Given the challenges and biases introduced by unknown or inaccurately quantified loads applied to a structure, an output-only approach (based only on the response of the structure and not the input excitations) would provide a robust and unbiased assessment.

## Evolution to population-based SHM: 
- **Introduction to Population-based SHM**: Recently, the concept of "population-based SHM" was introduced in a series of studies \cite{bull2021foundations}. The primary aim is to exploit data from numerous analogous structures.

- **Core Principle**: Instead of relying solely on individual structures, the approach utilizes shared insights either by transferring models across the population or by identifying outliers within the collective dataset.

- **Scope of Our Study**: While many focus on model transfer, our study emphasizes using cumulative data from the entire population.

- **Applications Beyond Similar Structures**:

    - **Symmetrical Structures**: In structures like transmission towers with congruent faces, each face can be treated as an individual within the population.

    - **Varied Load Cases**: For structures exposed to diverse load scenarios (e.g., a railway bridge facing different train types), each distinct load event can be considered an individual.


- **Methodological Approach**:
    - **Leveraging AI Safety Techniques**: We draw parallels with computer vision, especially in the realm of AI safety and out-of-distribution detection \cite{hendrycks2016baseline}.

    - **Classifier Training**: A classifier is trained to discern data among the population, determining the originating individual of any data — termed the "auxiliary classification task."
    
    - **Anomaly Index Derivation**: The classifier's confidence in data categorization is utilized as an anomaly index. The premise being: data from a damaged structure will challenge the model's classification confidence, signaling potential issues.

- **Explainable IA** : 
    - **Evaluation Without Structural Damage**: we need to gauge our model's ability to detect damage without directly considering structural damage. 
    
    - **Explainability**: Explainability is an integral aspect of the life cycle of deep learning models and SHM solutions, particularly as they find applications in safety-critical and decision-making processes. Classical Explainable AI (XAI) techniques like LIME \cite{ribeiro2016should} and SHAP \cite{lundberg2017unified} are commonly employed to decipher the model's decision-making process and identify salient features. However, to the best of our knowledge, there has been limited work specifically aimed at explaining and evaluating models' performance in detecting damage within the realm of SHM.

# Methodology
This section delineates the research methodology, which focuses on anomaly detection and Population-Based Structural Health Monitoring (PB-SHM) through the application of deep learning algorithms. Initially, the section elucidates the reasoning behind leveraging the structural population for effective damage detection. Subsequently, an in-depth description of the deep learning model's architecture and training protocol is provided. Following this, the derivation of the anomaly index is clarified, and lastly, the process of model explainability is expounded upon.


## Conceptual Framework
Before exploring the detailed conceptual framework, a brief explanation of population-based Structural Health Monitoring (SHM) and its incorporation into the research methodology is offered in the following: A classifier is initially trained to categorize the data based on the structure from which it originates. The classifier's confidence level is then utilized to identify potential structural damage. A significant challenge is encountered in assessing the model's accuracy in damage detection, as no damaged data is available for validation. To address this limitation, a method is proposed to enhance the model's explainability and to evaluate its capability to detect damage, even in the absence of actual damaged data. Virtual damage is synthesized within the healthy data set itself to test the model's detection abilities. In the case of Power Spectral Density (PSD) data, harmonics or spikes are introduced at varying frequencies and amplitudes. This allows for an assessment of the frequency sensitivity of the model, thereby facilitating both its performance evaluation and architectural refinement.
Now let un first explain how the models is trained and then we explain the idea of virtual synhthesised damage in the case of PSD and how we use it to evaluate the model and improve it (through hyperparameter).
In the next section where we present the results, we will show that the ability to detect structural damage of the model is linked to the ability to detect the virtual damage.

### Rational for the Methodology
In our Population Based Structural Health Monitoring (SHM) research, we opt for a classifier-based methodology for several compelling reasons. Initially trained to distinguish between various individual systems or structures, the classifier inherently learns to extract features sensitive to structural differences—features that are consequently likely to be indicators of structural damage. We also introduce an energy function parameterized by the neural network's weights into the model, utilized in the softmax activation function. This function inherently assigns lower energy levels to in-distribution or 'healthy' samples while designating higher energy levels to out-of-distribution or anomalous samples, acting as an efficient anomaly indicator. in and out of distribution is the taxonomy used in by the comunity  Furthermore, the classifier serves as a high-dimensional probability density estimator, a crucial advantage given that structural data often resides in high-dimensional spaces where traditional density estimators falter. Thus, the classifier's confidence score can be reliably leveraged as an anomaly index. Collectively, these elements form a comprehensive rationale for the utility and effectiveness of our classifier-based approach in SHM for both structural classification and anomaly detection.
\cite{liu2020energy}


### Model training
In the present study, we utilize a deep neural network (DNN) that comprises multiple layers of densely connected neurons, also referred to as fully connected layers. Formally, a DNN is a composition of vector-valued functions, each corresponding to a layer in the network.
#### Forward Propagation
During the forward propagation phase, the input data traverses through the network, generating a model output. 
A step in the forward propagation within layer $l$  phase is mathematically defined as:
\begin{equation}
\mathbf{z}^{[l]} = \mathbf{W}^{[l]} \mathbf{a}^{[l-1]} + \mathbf{b}^{[l]}, \
\mathbf{a}^{[l]} = \sigma^{[l]}(\mathbf{z}^{[l]}) \
\end{equation}

Where:

- $\mathbf{a}^{[l]}$ represents the output of layer $l$.
- $\mathbf{z}^{[l]}$ denotes the pre-activation input to layer $l$.
- $\mathbf{W}^{[l]}$ and $\mathbf{b}^{[l]}$ are the weight matrix and bias vector for layer $l$, respectively. Their dimensions are $d_{l-1} \times d_{l}$ and $d_{l} \times 1$, where $d_{l}$ is the number of neurons in layer $l$.
- $\sigma^{[l]}$ signifies the activation function applied at layer $l$.

After the forward propagation, the model's output is compared to the ground truth via a designated loss function, symbolized by $\mathcal{L}$. The resulting loss quantifies the performance discrepancy between the model predictions and the actual observations. 
In the training phase, the model parameters are updated to minimize the loss function, thereby refining the model's performance.




#### Backward Propagation
To minimize the loss, we employ the backpropagation algorithm, which computes the gradient of the loss function, $\mathcal{L}$, with respect to the model's parameters—namely the weights and biases. The algorithm leverages the compositional structure of the DNN to efficiently calculate gradients.


The loss function $\mathcal{L}$ is a function of all the weights and biases in the network:

\begin{equation}
\mathcal{L} = \mathcal{L}(\mathbf{W}^{[1]}, \mathbf{W}^{[2]}, \ldots, \mathbf{W}^{[L]}, \mathbf{b}^{[1]}, \mathbf{b}^{[2]}, \ldots, \mathbf{b}^{[L]})
\end{equation}

To minimize $\mathcal{L}$, we compute its gradient as:

\begin{equation}
\frac{\partial \mathcal{L}}{\partial \mathbf{W}^{[l]}} = \frac{\partial \mathcal{L}}{\partial \mathbf{a}^{[l]}} \frac{\partial \mathbf{a}^{[l]}}{\partial \mathbf{z}^{[l]}} \frac{\partial \mathbf{z}^{[l]}}{\partial \mathbf{W}^{[l]}}
\end{equation}

Each term in the equation is elaborated as follows:

- $\frac{\partial \mathcal{L}}{\partial \mathbf{a}^{[l]}}$: Partial derivative of the loss function with respect to the output of layer $l$.
- $\frac{\partial \mathbf{a}^{[l]}}{\partial \mathbf{z}^{[l]}}$: Derivative of the activation function at layer $l$, formally defined as $\sigma^{[l]'}(\mathbf{z}^{[l]})$.
- $\frac{\partial \mathbf{z}^{[l]}}{\partial \mathbf{W}^{[l]}}$: The output from the preceding layer, $\mathbf{a}^{[l-1]}$.
The term $\frac{\partial \mathcal{L}}{\partial \mathbf{a}^{[l]}}$ is central to the backpropagation algorithm. For the final layer ($l = L$), it is directly calculated from the derivative of the loss function with respect to the model's output. For hidden layers ($l < L$), this term is obtained through recursive application of the chain rule:

\begin{equation}
\begin{split}
 \frac{\partial \mathcal{L}}{\partial a^{[l]}}&=\frac{\partial \mathcal{L}}{\partial \mathbf{z}^{[l+1]}}\frac{\partial \mathbf{z}^{[l+1]}}{\partial \mathbf{a}^{[l]}} \\
 &= \frac{\partial \mathcal{L}}{\partial \mathbf{z}^{[l+1]}} \mathbf{W}^{[l+1]T} \\
 &= \mathbf{W}^{[l+1]T} \left( \frac{\partial \mathcal{L}}{\partial \mathbf{a}^{[l+1]}} \odot \sigma'(\mathbf{z}^{[l+1]}) \right) 
\end{split}
\end{equation}

Here, $\odot$ denotes element-wise multiplication.
In practice, the gradient is not computed for each sample but rather averaged over a batch of samples. This approach is known as mini-batch gradient descent.


#### Parameter Updates
Having computed the gradients through the backpropagation algorithm, the next step involves updating the model's parameters to minimize the loss function $\mathcal{L}$. The optimization process generally employs gradient-based methods such as stochastic gradient descent (SGD) or its variants like Adam and RMSprop.
For each layer $l$, the weights and biases are updated as:
\begin{equation}
\begin{split}
\mathbf{W}^{[l]} &\leftarrow \mathbf{W}^{[l]} - \alpha \frac{\partial \mathcal{L}}{\partial \mathbf{W}^{[l]}} \\
\mathbf{b}^{[l]} &\leftarrow \mathbf{b}^{[l]} - \alpha \frac{\partial \mathcal{L}}{\partial \mathbf{b}^{[l]}}
\end{split}
\end{equation}

Where $\alpha$ is the learning rate, a hyperparameter that controls the magnitude of the parameter updates.
Finally, the model is trained by iteratively repeating the forward and backward propagation steps until the loss function converges to a minimum.
one iteration over the training data is called an epoch. This process is repeated for a fixed number of epochs or until the loss function converges to a minimum.

#### Conclusion
In summary, the deep neural network is a function that have parameters (weights and biases) that we train using training dataset and the training process is done by minimizing the loss function using the backpropagation algorithm and the gradient descent optimization algorithm.
A neural network is defined by its architecture, which is the number of layers and the number of neurons in each layer, and the activation function used in each layer, the mini-batch size, the learning rate, etc.
These parameters are called hyperparameters and they are not learned during the training process, they are set by the user before the training process. and they are tuned to get the best performance of the model.
However, in the present case, as we do not have access to the damaged data, we cannot tune directly the hyperparameters to get the best performance of the model. 


# Pretrain the model to understand the data and not for a particular task

\begin{equation}
\frac{\partial \mathcal{L}}{\partial \mathbf{a}^{[l]}} = \left( \mathbf{W}^{[l+1]} \right)^{T} \odot \left( \frac{\partial \mathcal{L}}{\partial \mathbf{a}^{[l+1]}} \odot \sigma'(\mathbf{z}^{[l+1]}) \right)
\end{equation}

