<a href="https://colab.research.google.com/github/cwhitz/ts-trove/blob/master/notebooks/classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Time Series Classification

This notebook explores various time series classification techniques. It makes much fuller use of the bearings dataset also explored in the signal analysis notebook.

## Overview

Time series classification involves assigning time series instances to predefined categories. This notebook will cover:

### Table of Contents

> 1 [Data Preparation](#Data-Preparation)

1.1. [Data Download](##Data-Download)

1.2 [Data Organization](##Data-Organization)

> 2 [Utility Functions](#Training-Functions)

2.1 [Data Loader](##Data-Loader)

> 3 [Time Series Classification with SciKit](#Scikit-Functions)

3.1 [Scikit Trainer and Evaluator](##Scikit-Trainer-and-Evaluator)

> 4 [Deep Learning Models](#Deep-Learning-Models)

4.1 [PyTorch Trainer and Evaluator](##-Trainer-and-Evaluator)

4.2 [Fully Connected Neural Networks]()

4.3 [Recurrent Neural Networks]()

4.3.1 [Classic Recurrent Neural Network]()

4.3.2 [Long Short Term Memory (LSTM) Neural Network]()

4.3.3 [Gated Recurrent Neural Network]()

4.4 [Convolutional Neural Networks]()

4.4.1 [1D Convolutional Neural Network]()

4.4.2 [Temporal Convolutional Network]()

4.5 [Attention Based Models]()

4.5.1 [LSTM with Attention]()

4.5.2 [Time Series Transformer]()



In [1]:
import pandas as pd
import numpy as np
import os
import pathlib
import matplotlib.pyplot as plt
import json
import pathlib
import shutil
import kagglehub

import tqdm

In [2]:
!pip install cesium



# Data Preparation

## Data Overview

**What is this dataset?**

This is a collection of vibration data from electric motor bearings. Bearings those small spinning parts that let machinery rotate smoothly. Think of a bearing like the axle in a wheel: it's got little metal balls inside that roll around, letting a shaft spin with barely any friction.

**Why was this dataset created?**

The researchers at Case Western Reserve University deliberately damaged bearings in different ways, then recorded how the motor vibrated as a result. They made tiny cracks of various sizes (ranging from 7 to 40 thousandths of an inch) in the bearings, then attached vibration sensors to measure what happened.

Cracks of different mm sizes were introduced on the outer race, inner race, and the balls themselves.

![ball_bearing_diagram](https://www.globalspec.com/ImageRepository/LearnMore/20133/ball%20bearing5364b00280ef4db7b85dfba113f04556.png)

The goal was to understand the relationship between bearing damage and vibration patterns, creating a reference library that shows what different types of bearing failure look like.

**Why is it useful?**

This data is incredibly useful for real-world maintenance and diagnostics. In factories and power plants, you can use vibration patterns to detect bearing problems before they cause catastrophic failures. By comparing vibrations from a running machine to patterns in this dataset, maintenance teams can identify early signs of wear, predict when a bearing will fail, and schedule repairs before expensive downtime happens. It's basically like a fingerprint database for bearing damage—once you know what a damaged bearing "sounds like," you can spot trouble coming.

**So what are we actually trying to predict?**

Good question. We will try to train machine learning models to predict three things: 1) Is the bearing in normal operation? 2) If not, where is the crack? 3) And what size is it?

2 and 3 of course become irrelevant if the bearing is in normal operation, but they allow us to go a step beyond simple detection of irregular operation.


## Data Download

The raw data can be downloaded directly from Kaggle.

In [3]:
kagglepath = "sufian79/cwru-mat-full-dataset"
path = kagglehub.dataset_download(kagglepath)


pathlib.Path(f"./{kagglepath.split('/')[-1]}").mkdir(parents=True, exist_ok=True)
shutil.copytree(path, f"./{kagglepath.split('/')[-1]}", dirs_exist_ok=True)

Using Colab cache for faster access to the 'cwru-mat-full-dataset' dataset.


'./cwru-mat-full-dataset'

## Data Organization

The raw data is a collection of numbered mat files and requires reference back to the [original website](https://engineering.case.edu/bearingdatacenter/48k-drive-end-bearing-fault-data) to make sense of. I've gone ahead and done that with the JSON structure below.

The data is organized at top-level describing the type of fault, or lack thereof with "normal" sample files are the motor operating without faults. The next level down is the sampling rate, followed by the location where the crack was introduced (IR being inner race, B being ball, OR being outer race) and then finally, the size of the cracks ranging from 7 to 21 mm.

The code below this cell moves the individual samples into folders matching the structure below, which aligns with how PyTorch's DataSet and DataLoader work (we will make it work for scikit too).

In [4]:
folder_structure = {
  "normal": {
    "48k": ["97", "98", "99", "100"]
  },
  "drive_end_fault": {
    "12k": {
      "IR": {
        "007": ["105", "106", "107", "108"],
        "014": ["169", "170", "171", "172"],
        "021": ["209", "210", "211", "212"]
      },

      "B": {
        "007": ["118", "119", "120", "121"],
        "014": ["185", "186", "187", "188"],
        "021": ["222", "223", "224", "225"]
      },

      "OR": {
        "007": ["130", "131", "132", "133"],
        "014": ["197", "198", "199", "200"],
        "021": ["234", "235", "236", "237"]
      }
    },

    "48k": {
      "IR": {
        "007": ["109", "110", "111", "112"],
        "014": ["174", "175", "176", "177"],
        "021": ["213", "214", "215", "217"]
      },

      "B": {
        "007": ["122", "123", "124", "125"],
        "014": ["189", "190", "191", "192"],
        "021": ["226", "227", "228", "229"]
      },

      "OR": {
        "007": ["135", "136", "137", "138"],
        "014": ["201", "202", "203", "204"],
        "021": ["238", "239", "240", "241"]
      }
    }
  },

  "fan_end_fault": {
    "12k": {
      "IR": {
        "007": ["278", "279", "280", "281"],
        "014": ["274", "275", "276", "277"],
        "021": ["270", "271", "272", "273"]
      },

      "B": {
        "007": ["282", "283", "284", "285"],
        "014": ["286", "287", "288", "289"],
        "021": ["290", "291", "292", "293"]
      },

      "OR": {
        "007": ["298", "299", "300", "301"],
        "014": ["309", "310", "311", "312"],
        "021": ["315", "316", "317", "318"]
      }
    }
  }
}

In [5]:
SOURCE_DIR = "cwru-mat-full-dataset/"
TARGET_DIR = "classification-cwru-mat-organized"
FILE_EXTENSION = ".mat"

def ensure_dir(path):
    os.makedirs(path, exist_ok=True)

def move_file(file_id, dest_dir):
    filename = file_id + FILE_EXTENSION
    src_path = os.path.join(SOURCE_DIR, filename)
    dst_path = os.path.join(dest_dir, filename)

    if not os.path.exists(src_path):
        print(f"⚠️ Missing file: {src_path}")
        return

    ensure_dir(dest_dir)
    shutil.move(src_path, dst_path)

def walk_structure(node, current_path):
    if isinstance(node, list):
        for file_id in node:
            move_file(file_id, current_path)
    elif isinstance(node, dict):
        for key, child in node.items():
            walk_structure(child, os.path.join(current_path, key))
    else:
        raise ValueError("Unexpected structure type")


walk_structure(folder_structure, TARGET_DIR)
print("Done.")

Done.


# Utility Functions

## Data Loader

Before diving into modeling, we first need a consistent way to load and represent our time-series data. Since later sections will experiment with both deep learning and traditional classifiers, we define a reusable dataset structure that keeps preprocessing, sampling rate handling, and labels consistent across all methods.

In [6]:
from torch.utils.data import Dataset
from torch.nn import Module
import scipy.io
import enum

# samplng rate enum
class SamplingRate(enum.Enum):
    sr12K = "12k"
    sr48K = "48k"

class FaultLocation(enum.Enum):
    DE = "drive_end_fault"
    FE = "front_end_fault"


class BearingDataset(Dataset):
    def __init__(self, file_paths, sampling_rate, fault_location, chunk_length, unified_label=True, transform=None):
        self.file_paths = file_paths
        self.sampling_rate = sampling_rate
        self.fault_location = fault_location
        self.chunk_length = chunk_length
        self.transform = transform
        self.unified_label = unified_label

        self.data = []
        self.labels = []

        self._organize_data()

    def _organize_data(self):
        for fp in self.file_paths:
            if not pathlib.Path(fp).exists():
                raise FileNotFoundError(f"File not found: {fp}")

            mat_data = scipy.io.loadmat(fp)

            key_to_match = f"_{str(self.fault_location)[-2:]}_time"
            sensor_key = [key for key in mat_data.keys() if key_to_match in key][0]

            signal = mat_data[sensor_key].squeeze()

            n_chunks = len(signal) // self.chunk_length
            truncated = signal[:n_chunks * self.chunk_length]

            windows = truncated.reshape(n_chunks, self.chunk_length)

            label_parts = fp.parent.parts
            if label_parts[-2] == 'normal':
                label_dict = {
                    'normal': True,
                    'fault_location': 'NA',
                    'crack_size': 'NA'
                }
            else:
                label_dict = {
                    'normal': False,
                    'fault_location': label_parts[-2],
                    'crack_size': label_parts[-1]
                }


            for window in windows:
              self.data.append(window)

              if self.unified_label:
                self.labels.append(f"{label_dict['fault_location']}_{label_dict['crack_size']}" if label_dict['normal'] == False else "normal")
              else:
                self.labels.append(label_dict)


    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        window = self.data[idx]
        label = self.labels[idx]

        if self.transform:
            window = self.transform(window).astype('float32')

        return window, label

The dataset class above seeks to make the most of the data available in the bearings dataset by splitting each sample in the file into multiple overlapping windows. This increases the effective number of training samples and helps models learn more robust patterns. However, care must be taken to avoid data leakage between training and test sets when using overlapping windows - if we were to pull all the data and then split into train/test, windows from the same original sample could end up in both sets.

To prevent this, we ensure that all windows derived from a given file are assigned to either the training or test set exclusively by splitting into train/test at the file level.

In [7]:
from sklearn.model_selection import train_test_split
from pathlib import Path
from collections import Counter

all_files = list(Path("classification-cwru-mat-organized").rglob("*.mat"))

# derive one label per file
file_labels = [
    '_'.join(f.parent.parts[-2:])
    for f in all_files
]

train_files, test_files = train_test_split(
    all_files,
    test_size=.2,
    shuffle=True,
    stratify=file_labels
)

##

We want to set up a class for testing different classification techniques on the bearings dataset. The class will accept a dataset object and classification model, and be able to train and evaluate the model consistently for metrics like accuracy, precision, recall, and F1-score as well as time for training and inference.

In [8]:
from abc import ABC, abstractmethod
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from tqdm import tqdm
import copy


class ClassificationTrainTestEvaluate(ABC):
    def __init__(self, train_dataset: Dataset, test_dataset: Dataset):
        self.train_dataset = train_dataset
        self.test_dataset = test_dataset

        self.model = None

    def load_model(self, model):
        self.model = model

    def classification_report(self):
      """
      Creates a Plotly figure with three tabs, each showing:
      - Confusion matrix heatmap
      - Metrics summary table

      One tab per task: Fault Detection, Fault Location, Crack Size
      """
      from plotly.subplots import make_subplots
      import plotly.graph_objects as go
      from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

      # Define task names and their corresponding predictions/labels
      tasks = {
          'Fault Detection': {
              'predictions': self.predictions_fault_detection,
              'labels': self.test_y_fault_detection,
              'class_names': ['Normal', 'Fault']
          },
          'Fault Location': {
              'predictions': self.predictions_fault_location,
              'labels': self.test_y_fault_location,
              'class_names': ['B', 'IR', 'OR']
          },
          'Crack Size': {
              'predictions': self.predictions_crack_size,
              'labels': self.test_y_crack_size,
              'class_names': ['007', '014', '021']
          }
      }

      # Create subplots for each task
      figs = []

      for task_name, task_data in tasks.items():
          predictions = task_data['predictions']
          labels = task_data['labels']
          class_names = task_data['class_names']

          # Compute confusion matrix
          cm = confusion_matrix(labels, predictions)

          # Compute metrics
          accuracy = accuracy_score(labels, predictions)
          precision = precision_score(labels, predictions, average='weighted', zero_division=0)
          recall = recall_score(labels, predictions, average='weighted', zero_division=0)
          f1 = f1_score(labels, predictions, average='weighted', zero_division=0)

          # Create subplot layout
          fig = make_subplots(
              rows=1, cols=2,
              column_widths=[0.6, 0.4],
              specs=[[{"type": "heatmap"}, {"type": "table"}]],
              subplot_titles=("Confusion Matrix", "Model Performance Metrics")
          )

          # --- Confusion Matrix Heatmap ---
          fig.add_trace(
              go.Heatmap(
                  z=cm,
                  x=class_names,
                  y=class_names,
                  text=cm,
                  texttemplate="%{text}",
                  colorscale="Blues",
                  showscale=False
              ),
              row=1, col=1
          )

          fig.update_xaxes(title_text="Predicted Label", row=1, col=1)
          fig.update_yaxes(title_text="True Label", row=1, col=1)

          # --- Metrics Table ---
          fig.add_trace(
              go.Table(
                  header=dict(
                      values=["Metric", "Value"],
                      fill_color="lightgrey",
                      align="center"
                  ),
                  cells=dict(
                      values=[
                          ["Accuracy", "Precision", "Recall", "F1 Score"],
                          [f"{accuracy:.4f}", f"{precision:.4f}", f"{recall:.4f}", f"{f1:.4f}"]
                      ],
                      align="center"
                  )
              ),
              row=1, col=2
          )

          fig.update_layout(
              title=f"{task_name} - Evaluation Summary",
              height=500,
              width=900,
              showlegend=False
          )

          figs.append((task_name, fig))

      # Display each figure
      for task_name, fig in figs:
          fig.show()

class SciKitCTTE(ClassificationTrainTestEvaluate):
    def prepare_data(self):
        self.train_X, self.train_y = pd.DataFrame(), pd.Series()
        print("Preparing training data...")
        for i in tqdm(range(len(self.train_dataset))):
            X_chunk, label = self.train_dataset[i]

            self.train_X = pd.concat([self.train_X, X_chunk], ignore_index=True)
            self.train_y = pd.concat([self.train_y, pd.Series(label)], ignore_index=True)

        self.test_X, self.test_y = pd.DataFrame(), pd.Series()
        print("Preparing test data...")
        for i in tqdm(range(len(self.test_dataset))):
            X_chunk, labels = self.test_dataset[i]

            self.test_X = pd.concat([self.test_X, X_chunk], ignore_index=True)
            self.test_y = pd.concat([self.test_y, pd.Series(labels)], ignore_index=True)

    def train(self, train_X, train_y):
        self.model.fit(train_X, train_y)
        self.class_names = sorted(self.train_y.unique())

    def evaluate(self, test_X, test_y):
        self.predictions = self.model.predict(test_X)


# Feature Extraction + Feature Based Classification

With a dataset abstraction in place, we can now explore different families of time-series classification techniques. The goal here is not only to compare performance, but also to understand how different representation choices affect model behavior on sensor-like signals.

We begin with feature-based methods, which transform raw time-series into fixed-length statistical representations. These approaches are often strong baselines, easier to interpret, and computationally efficient compared to end-to-end deep learning models.

### Feature Extraction

We will implement a custom transformer class for the PyTorch dataset to extract statistical features using the `cesium` library.

In [9]:
from cesium import featurize

class FeatureExtractionTransform(Module):
    def forward(self, window):
        features_to_use = [
            "amplitude",
            "percent_beyond_1_std",
            "maximum",
            "max_slope",
            "median",
            "median_absolute_deviation",
            "percent_close_to_median",
            "minimum",
            "period_fast",
            "skew",
            "std",
        ]

        fset = featurize.featurize_time_series(
            times=np.arange(len(window)),
            values=window,
            errors=None,
            features_to_use=features_to_use,
        )

        fset = fset.stack(future_stack=True)

        return fset


In [10]:
train_dataset = BearingDataset(
    train_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    chunk_length=1200,
    unified_label=True,
    transform=FeatureExtractionTransform()
)

test_dataset = BearingDataset(
    test_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    chunk_length=1200,
    unified_label=True,
    transform=FeatureExtractionTransform()
)

In [11]:
# from sklearn.ensemble import RandomForestClassifier

# sk_ctte = SciKitCTTE(
#     train_dataset,
#     test_dataset)

# sk_ctte.prepare_data()


In [12]:
# rfc = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)

# sk_trainer = sk_ctte
# sk_trainer.load_model(rfc)

# sk_trainer.train(sk_trainer.train_X, sk_trainer.train_y)
# sk_trainer.evaluate(sk_trainer.test_X, sk_trainer.test_y)
# sk_trainer.classification_report()

In [13]:
# from sklearn.svm import SVC

# svm = SVC(kernel='linear', C=.1, random_state=42)

# sk_trainer = sk_ctte
# sk_trainer.load_model(svm)

# sk_trainer.train(sk_trainer.train_X, sk_trainer.train_y)
# sk_trainer.evaluate(sk_trainer.test_X, sk_trainer.test_y)
# sk_trainer.classification_report()

# Deep Learning Models

In this section, I will explore a wide variety of neural network models to find which can perform the best at what is essentially a many-to-one problem, where we are giving the model a dataset of many measurements of vibrational movement where ordering matters, because those measurements unfolded across time.

4.1 [PyTorch Trainer and Evaluator](##
PyTorch-Trainer-and-Evaluator)

4.2 [Fully Connected Neural Networks]()

4.3 [Recurrent Neural Networks]()

4.3.1 [Classic Recurrent Neural Network]()

4.3.2 [Long Short Term Memory (LSTM) Neural Network]()

4.3.3 [Gated Recurrent Neural Network]()

4.4 [Convolutional Neural Networks]()

4.4.1 [1D Convolutional Neural Network]()

4.4.2 [Temporal Convolutional Network]()

4.5 [Attention Based Models]()

4.5.1 [LSTM with Attention]()

4.5.2 [Time Series Transformer]()

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

In [14]:
import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

CUDA available: True
GPU count: 1
GPU name: Tesla T4


##PyTorch Trainer and Evaluator

In [15]:
from torch.utils.data import DataLoader
from torch import Tensor, float32, LongTensor
import torch
from tqdm import tqdm

class PyTorchCTTE(ClassificationTrainTestEvaluate):
    def __init__(self, train_dataset: Dataset, test_dataset: Dataset, device='cpu', criterion=None, detection_weighting=.5):
        super().__init__(train_dataset, test_dataset)
        self.device = device
        self.criterion = criterion

        self.target_mapping = {
          'fault_location': {'B': 0, 'IR': 1, 'OR': 2, 'NA': 3},
          'crack_size': {'007': 0, '014': 1, '021': 2, 'NA': 3}
          }

        self.train_dataset_mean = None
        self.train_dataset_std = None
        self.detection_weighting = detection_weighting

    def __deepcopy__(self, memo):
        """Deep copy - recursively copies nested objects"""
        return PyTorchCTTE(
            copy.deepcopy(self.train_dataset, memo),
            copy.deepcopy(self.test_dataset, memo),
            copy.deepcopy(self.device, memo),
            copy.deepcopy(self.criterion, memo)
        )

    def load_model(self, model):
        self.model = model

    def load_optimizer(self, optimizer):
        self.optimizer = optimizer

    def prepare_data(self):
        self.train_dataloader = DataLoader(self.train_dataset, batch_size=64, shuffle=True)
        self.test_dataloader = DataLoader(self.test_dataset, batch_size=64, shuffle=False)

    def train(self, epochs: int, batch_size: int):
        self.model.to(self.device)

        # self.train_dataset_mean = np.mean(np.concatenate(self.train_dataset.data))
        # self.train_dataset_std = np.std(np.concatenate(self.train_dataset.data))

        for epoch in range(epochs):
            self.model.train()
            epoch_loss = 0.0

            progress_bar = tqdm(self.train_dataloader, desc=f"Epoch {epoch+1}/{epochs}")

            for batch_X, batch_y in progress_bar:
                # X
                batch_X = Tensor(batch_X.to(float32))
                batch_X = (batch_X - batch_X.mean(dim=1, keepdim=True)) / (batch_X.std(dim=1, keepdim=True) + 1e-6)
                batch_X = batch_X.to(self.device)

                # ys
                batch_y_fault_detection = [l for l in batch_y['normal']]
                batch_y_fault_location = [self.target_mapping['fault_location'].get(l, 2) for l in batch_y['fault_location']]
                batch_y_crack_size = [self.target_mapping['crack_size'].get(l, 2) for l in batch_y['crack_size']]

                # move to GPU
                batch_y_fault_detection = LongTensor(batch_y_fault_detection).to(self.device)
                batch_y_fault_location = LongTensor(batch_y_fault_location).to(self.device)
                batch_y_crack_size = LongTensor(batch_y_crack_size).to(self.device)

                # clear gradients before training
                self.optimizer.zero_grad()

                # run the inputs through the network
                fault_detection, fault_location, crack_size = self.model(batch_X)

                # calculate the loss
                loss_fault_detection = self.criterion(fault_detection, batch_y_fault_detection)
                loss_fault_location = self.criterion(fault_location, batch_y_fault_location)
                loss_crack_size = self.criterion(crack_size, batch_y_crack_size)

                # sum to total loss
                total_loss = (self.detection_weighting * loss_fault_detection) + loss_fault_location + loss_crack_size

                # backpropagate
                total_loss.backward()

                self.optimizer.step()

                epoch_loss += total_loss.item()
                progress_bar.set_postfix(loss=total_loss.item())

            print(f"Epoch {epoch+1} avg loss: {epoch_loss/len(self.train_dataloader):.4f}")

    def evaluate(self):
        self.model.eval()
        self.predictions_fault_detection = []
        self.predictions_fault_location = []
        self.predictions_crack_size = []
        self.test_y_fault_detection = []
        self.test_y_fault_location = []
        self.test_y_crack_size = []

        with torch.no_grad():
            for batch_X, batch_y in self.test_dataloader:
                batch_X = batch_X = Tensor(batch_X.to(float32))
                batch_X = (batch_X - batch_X.mean(dim=1, keepdim=True)) / (batch_X.std(dim=1, keepdim=True) + 1e-6)
                batch_X = batch_X.to(self.device)

                fault_detection, fault_location, crack_size = self.model(batch_X)

                # Get predictions for each task
                _, pred_fd = torch.max(fault_detection, 1)
                _, pred_fl = torch.max(fault_location, 1)
                _, pred_cs = torch.max(crack_size, 1)

                self.predictions_fault_detection.extend(pred_fd.cpu().numpy().tolist())
                self.predictions_fault_location.extend(pred_fl.cpu().numpy().tolist())
                self.predictions_crack_size.extend(pred_cs.cpu().numpy().tolist())

                # Store true labels
                self.test_y_fault_detection.extend([int(l) for l in batch_y['normal']])
                self.test_y_fault_location.extend([self.target_mapping['fault_location'].get(l, 0) for l in batch_y['fault_location']])
                self.test_y_crack_size.extend([self.target_mapping['crack_size'].get(l, 0) for l in batch_y['crack_size']])

### Datasets for Deep Learning

In [16]:
from torch.nn import CrossEntropyLoss

train_dataset = BearingDataset(
    train_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    unified_label=False,
    chunk_length=1200
)

test_dataset = BearingDataset(
    test_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    unified_label=False,
    chunk_length=1200
)

pytorch_ctte = PyTorchCTTE(
    train_dataset,
    test_dataset,
    device=device,
    criterion=CrossEntropyLoss()
)

##Fully Connected Neural Network

### Model Intuition

A fully connected neural network treats each vibration sample as a fixed-length vector, learning relationships between all points in the signal simultaneously.

Unlike sequential models, it makes no assumptions about temporal ordering — every input element is connected to every neuron in the next layer, allowing it to discover arbitrary correlations across the entire 1200-point reading. The first layer projects the raw signal into a 512-dimensional space, expanding the representation to capture a rich set of features, while the second layer compresses to 256 dimensions, acting as a bottleneck that forces the network to distill the most discriminative patterns. ReLU activations between layers introduce nonlinearity, enabling the network to learn complex decision boundaries that a simple linear classifier could not. This architecture is well suited for vibration classification when the signal length is fixed and the spatial relationships between measurement points carry meaningful information about fault characteristics, as it is in this project.


### Model Definition

In [17]:
from torch import nn
import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss

class FCNN(nn.Module):
    def __init__(self, input_dim=1200, num_fault_locations=4, num_crack_sizes=4):
        super(FCNN, self).__init__()
        self.shared = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
        )

        self.fault_detection_output = nn.Linear(256, 2)
        self.fault_location_output = nn.Linear(256, num_fault_locations)
        self.crack_size_output = nn.Linear(256, num_crack_sizes)

        # Xavier initialization
        nn.init.xavier_uniform_(self.fault_detection_output.weight)
        nn.init.xavier_uniform_(self.fault_location_output.weight)
        nn.init.xavier_uniform_(self.crack_size_output.weight)

    def forward(self, x):
        x = self.shared(x)

        return (
            torch.sigmoid(self.fault_detection_output(x)),
            self.fault_location_output(x),
            self.crack_size_output(x)
        )

In [18]:
fcnn_ctte = copy.deepcopy(pytorch_ctte)

fcnn_model = FCNN(
    input_dim=1200
)

fcnn_ctte.load_model(fcnn_model)

fcnn_ctte.load_optimizer(
    torch.optim.Adam(fcnn_model.parameters(), lr=1e-3)
)


### Training

In [19]:
fcnn_ctte.prepare_data()
fcnn_ctte.train(epochs=20, batch_size=64)
fcnn_ctte.evaluate()

Epoch 1/20: 100%|██████████| 269/269 [00:03<00:00, 86.67it/s, loss=1.55]


Epoch 1 avg loss: 1.7216


Epoch 2/20: 100%|██████████| 269/269 [00:02<00:00, 120.84it/s, loss=0.536]


Epoch 2 avg loss: 1.0191


Epoch 3/20: 100%|██████████| 269/269 [00:03<00:00, 82.21it/s, loss=0.518]


Epoch 3 avg loss: 0.6673


Epoch 4/20: 100%|██████████| 269/269 [00:02<00:00, 94.98it/s, loss=0.295] 


Epoch 4 avg loss: 0.4822


Epoch 5/20: 100%|██████████| 269/269 [00:01<00:00, 154.50it/s, loss=0.317]


Epoch 5 avg loss: 0.3903


Epoch 6/20: 100%|██████████| 269/269 [00:01<00:00, 186.85it/s, loss=0.166]


Epoch 6 avg loss: 0.3352


Epoch 7/20: 100%|██████████| 269/269 [00:01<00:00, 187.31it/s, loss=0.162]


Epoch 7 avg loss: 0.3125


Epoch 8/20: 100%|██████████| 269/269 [00:01<00:00, 176.63it/s, loss=0.221]


Epoch 8 avg loss: 0.2619


Epoch 9/20: 100%|██████████| 269/269 [00:01<00:00, 185.85it/s, loss=0.177]


Epoch 9 avg loss: 0.2747


Epoch 10/20: 100%|██████████| 269/269 [00:01<00:00, 170.38it/s, loss=0.406]


Epoch 10 avg loss: 0.2535


Epoch 11/20: 100%|██████████| 269/269 [00:01<00:00, 152.24it/s, loss=0.182]


Epoch 11 avg loss: 0.2974


Epoch 12/20: 100%|██████████| 269/269 [00:01<00:00, 187.63it/s, loss=0.841]


Epoch 12 avg loss: 0.2684


Epoch 13/20: 100%|██████████| 269/269 [00:01<00:00, 187.30it/s, loss=0.224]


Epoch 13 avg loss: 0.3116


Epoch 14/20: 100%|██████████| 269/269 [00:01<00:00, 183.30it/s, loss=0.16]


Epoch 14 avg loss: 0.2415


Epoch 15/20: 100%|██████████| 269/269 [00:01<00:00, 186.82it/s, loss=0.16]


Epoch 15 avg loss: 0.2181


Epoch 16/20: 100%|██████████| 269/269 [00:01<00:00, 180.70it/s, loss=0.157]


Epoch 16 avg loss: 0.2315


Epoch 17/20: 100%|██████████| 269/269 [00:01<00:00, 183.20it/s, loss=0.171]


Epoch 17 avg loss: 0.2253


Epoch 18/20: 100%|██████████| 269/269 [00:01<00:00, 164.27it/s, loss=0.157]


Epoch 18 avg loss: 0.2523


Epoch 19/20: 100%|██████████| 269/269 [00:01<00:00, 154.36it/s, loss=1.64]


Epoch 19 avg loss: 0.2441


Epoch 20/20: 100%|██████████| 269/269 [00:01<00:00, 177.22it/s, loss=0.157]


Epoch 20 avg loss: 0.2999


### Classification Report

In [20]:
fcnn_ctte.classification_report()

## ResNet

### Model Intuition


### Model Anatomy

**Residual Block Internals**

In the architecture, residual blocks are defined as a separate module class outside the main ResNet1D class and then used in aggregate within the network definition block. Let's look at the definition of the internal components of *ResidualBlock1D* to start. It operates in two modes, depending if the output shape differs from the input shape - then it is downsampling - or not. Looking at the components as they are defined in the forward pass:

* `shortcut` is defined differently if the block is performing downsampling or not. If it is not, then the shortcut is simply a passthrough. If it is performing downsampling, then the input is downsampled to the output dimensions using conv1d with a stride of two to halve the input dimension to the needed output dimension. Note that the identity is used to cache the input identity for later, it is not passed further along the network (yet).
* `conv1` A convolutional layer is applied, but critically only to every other input (stride=2) when downsampling. When every other input is skipped, it halves the length of the output. Batch norm, ReLu and dropout are applied after to normalize, incorporate non-linearity, and prevent overfitting respectively.
* `conv2` is a second convolution to help refine the features again without changing dimensions. Batch norm and dropout applied afterward, but not ReLu.
* `out += identity` is the key move that makes this 'res' net. The input that we cached at the start is added in at the end of the convolutional layers. The final ReLU activates the combined result before passing to the next block.

> This addition of the input layers at the end of this block is the core idea of ResNet. Instead of the network learning the full transformation F(x), it only needs to learn the difference from the input: F(x) = H(x) - x, so the output is x + F(x), the residuals in res net. This means if the optimal transformation is close to doing nothing, the network just learns to push F(x) toward zero — which is much easier than learning a full identity mapping from scratch. This is why deep ResNets can train where plain deep networks collapse.

Now that we understand what is going on in a Residual Block, we will look at the whole architecture of the network.

**Initial Convolutional Layer**

To start, 32 one-dimensional [convolutional filters](https://developers.google.com/machine-learning/glossary#convolutional_filter) are applied to the vibration signal. The convolution uses a wide 1-dimensional kernel (16 samples) to learn filter parameters capable of capturing broad structural features like impulse responses, periodic oscillations, and transient events at different scales from the raw input before passing them into the residual stages. This produces a new value from the filter for each part of the original time series, for each of the 32 filters that are learned - the output shape is 1199 (samples, one lost due to padding) by 32 (learned filters). BatchNorm normalizes each of the 32 channels independently to zero mean and unit variance across the batch, and then ReLu zeros out all negative values.

**Residual Blocks Layers**

A series of eight residual blocks forms the heart of the network. They are arranged into four sequential groups, where each group halves the temporal dimension through stride-2 downsampling while progressively widening the channel count (32→32→64→128→128). The kernel size also steadily shrinks across groups (7→7→5→3), allowing later layers to learn increasingly fine-grained features now that earlier layers have already built up broad contextual awareness.

> Each time a group downsamples, the temporal resolution is cut in half (e.g. 1199→600→300→150), but each remaining position now represents a wider window of the original signal. Combined with the residual connections carrying forward earlier representations, this means deeper blocks operate with an increasingly large receptive field. Each value in the compressed network is influenced by a broader stretch of the original vibration signal, allowing the network to pick up on longer-range structural patterns that wouldn't be visible at finer temporal scales.

**Final Learning Layers**

`AvgPool1D` is the adaptive average pooling layer collapses whatever temporal dimension remains (e.g. 150 time steps) down to a single value per channel by averaging across the entire length. This produces a fixed-size vector of 128 values — one summary statistic per learned feature channel — regardless of the original input length. This is what allows the network to transition from convolutional feature extraction into the fully connected classification heads.

`fc_shared` is a fully connected layer that takes the 128-dimensional pooled vector and maps it to another 128-dimensional representation, followed by ReLU and dropout. It acts as a shared bottleneck that gives the network a chance to learn a final combined representation before branching into the three separate classification heads. Without it, each head would be working directly from the pooled convolutional features — this extra layer lets the network learn a task-aware remixing of those features that benefits all three outputs jointly.

From the outputs of the `fc_shared` layer, the multiple classification heads are able to learn.

### Model Definition

In [21]:

class ResidualBlock1D(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=7, downsample=False, dropout=0.1):
        super().__init__()
        stride = 2 if downsample else 1
        padding = kernel_size // 2

        # residual block layer internals - definition
        self.conv1 = nn.Conv1d(in_channels, out_channels, kernel_size,
                               stride=stride, padding=padding)
        self.bn1 = nn.BatchNorm1d(out_channels)
        self.dropout1 = nn.Dropout(dropout)

        self.conv2 = nn.Conv1d(out_channels, out_channels, kernel_size,
                               stride=1, padding=padding)
        self.bn2 = nn.BatchNorm1d(out_channels)
        self.dropout2 = nn.Dropout(dropout)

        self.shortcut = nn.Identity()
        if downsample or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv1d(in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm1d(out_channels)
            )

    def forward(self, x):
        # residual block layer internals - implementation
        identity = self.shortcut(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu(out)
        out = self.dropout1(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += identity
        return F.relu(out)


class ResNet1D_pt(nn.Module):
    def __init__(self,
                 input_channels=1,
                 num_location_classes=4,
                 num_size_classes=4):
        super().__init__()

        # Convolutional Layer - definition
        self.stem = nn.Sequential(
            nn.Conv1d(input_channels, 32, kernel_size=16, stride=1, padding=7),
            nn.BatchNorm1d(32),
            nn.ReLU(),
        )

        # residual blocks layers - definition
        self.layer1a = ResidualBlock1D(32, 32, kernel_size=7, downsample=False)
        self.layer1b = ResidualBlock1D(32, 32, kernel_size=7, downsample=False)

        self.layer2a = ResidualBlock1D(32, 64, kernel_size=7, downsample=True)   # 1200 -> 600
        self.layer2b = ResidualBlock1D(64, 64, kernel_size=7, downsample=False)

        self.layer3a = ResidualBlock1D(64, 128, kernel_size=5, downsample=True)  # 600 -> 300
        self.layer3b = ResidualBlock1D(128, 128, kernel_size=5, downsample=False)

        self.layer4a = ResidualBlock1D(128, 128, kernel_size=3, downsample=True) # 300 -> 150
        self.layer4b = ResidualBlock1D(128, 128, kernel_size=3, downsample=False)

        self.pool = nn.AdaptiveAvgPool1d(1)
        self.dropout = nn.Dropout(0.3)

        # Shared representation
        self.fc_shared = nn.Linear(128, 128)

        # Three classification heads
        self.fc_fault_detection = nn.Linear(128, 2)
        self.fc_fault_location = nn.Linear(128, num_location_classes)
        self.fc_crack_size = nn.Linear(128, num_size_classes)

        self._init_weights()

    def _init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv1d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.kaiming_normal_(m.weight, nonlinearity='relu')
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm1d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        if x.dim() == 2:
            x = x.unsqueeze(1)

        # convolutional layer - implementation
        x = self.stem(x)

        # residual blocks layer - implementation
        x = self.layer1a(x)
        x = self.layer1b(x)
        x = self.layer2a(x)
        x = self.layer2b(x)
        x = self.layer3a(x)
        x = self.layer3b(x)
        x = self.layer4a(x)
        x = self.layer4b(x)

        # pooling and final layer
        x = self.pool(x).squeeze(-1)
        x = self.fc_shared(x)
        x = F.relu(x)
        x = self.dropout(x)

        fault_detection = self.fc_fault_detection(x)
        fault_location = self.fc_fault_location(x)
        crack_size = self.fc_crack_size(x)

        return fault_detection, fault_location, crack_size

### Training Preparation

In [22]:
resnet_ctte = copy.deepcopy(pytorch_ctte)

resnet_model = ResNet1D_pt()

resnet_ctte.load_model(resnet_model)

resnet_ctte.load_optimizer(
    torch.optim.Adam(resnet_model.parameters(), lr=1e-3)
)

### Training

In [23]:
resnet_ctte.prepare_data()
resnet_ctte.train(epochs=20, batch_size=64)
resnet_ctte.evaluate()

Epoch 1/20: 100%|██████████| 269/269 [00:14<00:00, 18.49it/s, loss=3.82]


Epoch 1 avg loss: 0.9506


Epoch 2/20: 100%|██████████| 269/269 [00:14<00:00, 18.38it/s, loss=3.27]


Epoch 2 avg loss: 0.4445


Epoch 3/20: 100%|██████████| 269/269 [00:14<00:00, 17.97it/s, loss=2.33]


Epoch 3 avg loss: 0.3374


Epoch 4/20: 100%|██████████| 269/269 [00:14<00:00, 19.18it/s, loss=0.18]


Epoch 4 avg loss: 0.2392


Epoch 5/20: 100%|██████████| 269/269 [00:13<00:00, 19.30it/s, loss=0.955]


Epoch 5 avg loss: 0.1968


Epoch 6/20: 100%|██████████| 269/269 [00:14<00:00, 19.18it/s, loss=0.246]


Epoch 6 avg loss: 0.2463


Epoch 7/20: 100%|██████████| 269/269 [00:14<00:00, 18.98it/s, loss=0.068]


Epoch 7 avg loss: 0.1617


Epoch 8/20: 100%|██████████| 269/269 [00:14<00:00, 18.87it/s, loss=3.73]


Epoch 8 avg loss: 0.1415


Epoch 9/20: 100%|██████████| 269/269 [00:14<00:00, 18.88it/s, loss=0.0959]


Epoch 9 avg loss: 0.1551


Epoch 10/20: 100%|██████████| 269/269 [00:14<00:00, 18.95it/s, loss=0.0737]


Epoch 10 avg loss: 0.1300


Epoch 11/20: 100%|██████████| 269/269 [00:14<00:00, 19.00it/s, loss=0.242]


Epoch 11 avg loss: 0.1230


Epoch 12/20: 100%|██████████| 269/269 [00:14<00:00, 19.06it/s, loss=0.62]


Epoch 12 avg loss: 0.1213


Epoch 13/20: 100%|██████████| 269/269 [00:14<00:00, 18.99it/s, loss=2.51]


Epoch 13 avg loss: 0.1104


Epoch 14/20: 100%|██████████| 269/269 [00:14<00:00, 18.94it/s, loss=0.171]


Epoch 14 avg loss: 0.1655


Epoch 15/20: 100%|██████████| 269/269 [00:14<00:00, 19.00it/s, loss=0.416]


Epoch 15 avg loss: 0.0969


Epoch 16/20: 100%|██████████| 269/269 [00:14<00:00, 18.99it/s, loss=0.363]


Epoch 16 avg loss: 0.1138


Epoch 17/20: 100%|██████████| 269/269 [00:14<00:00, 18.96it/s, loss=1.48]


Epoch 17 avg loss: 0.1094


Epoch 18/20: 100%|██████████| 269/269 [00:14<00:00, 18.84it/s, loss=1.58]


Epoch 18 avg loss: 0.0991


Epoch 19/20: 100%|██████████| 269/269 [00:14<00:00, 18.85it/s, loss=0.351]


Epoch 19 avg loss: 0.1031


Epoch 20/20: 100%|██████████| 269/269 [00:14<00:00, 18.97it/s, loss=0.315]


Epoch 20 avg loss: 0.0789


### Classification Report

In [24]:
resnet_ctte.classification_report()


### Results Interpretation

ResNet's good performance makes sense due to the architecture's natural fit for vibration signal classification. Vibration signals contain diagnostic information at multiple scales — high-frequency transients from crack impacts, medium-frequency resonance patterns, and longer-range periodic structures from rotating components. The progressive downsampling through residual groups means the network builds up representations at each of these scales, from fine-grained waveform features in early layers to broad structural patterns in deeper ones.
The residual connections also help here specifically. Subtle fault signatures can be small perturbations on top of dominant healthy vibration patterns — which is essentially what a residual is. By learning differences from the input rather than full transformations, the network is well-suited to detecting these small but diagnostically meaningful deviations. A healthy signal passes through with near-zero residuals, while a fault introduces learnable differences that propagate through to the classification heads.
The multi-head design also plays a role. Because detection, location, and size classification share the same learned feature backbone, the network can exploit correlations between tasks — for instance, certain frequency signatures that indicate a crack also carry information about where it is. This shared learning likely gives better results than training three separate models, especially when data is limited.

## Gated Recurrent Unit (GRU)

### Model Intuition

### Model Anatomy

### Model Definition

In [25]:
class GRU1D_pt(nn.Module):
    def __init__(self, input_size=1, seq_len=1200, hidden_size=128, num_location_classes=4, num_size_classes=4):
        super(GRU1D_pt, self).__init__()

        self.seq_len = seq_len
        self.hidden_size = hidden_size

        # GRU Layer
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, batch_first=True)

        # Fully Connected Layers
        self.fc1 = nn.Linear(hidden_size, 64)
        self.fc2 = nn.Linear(64, 32)

        # Output heads
        self.fc_fault_detection = nn.Linear(32, 2)
        self.fc_fault_location = nn.Linear(32, num_location_classes)
        self.fc_crack_size = nn.Linear(32, num_size_classes)

        self._init_weights()

    def _init_weights(self):
        for layer in [self.fc_fault_detection, self.fc_fault_location, self.fc_crack_size]:
            nn.init.xavier_uniform_(layer.weight)
            if layer.bias is not None:
                nn.init.zeros_(layer.bias)

    def forward(self, x):
        # Input shape normalization
        if x.ndim == 2:
            x = x.unsqueeze(-1)  # (B, T) -> (B, T, 1)
        elif x.ndim == 3 and x.shape[1] == 1:
            x = x.permute(0, 2, 1)  # (B, 1, T) -> (B, T, 1)

        x = x[:, :self.seq_len, :]

        # GRU forward
        gru_out, _ = self.gru(x)
        out = gru_out[:, -1, :]  # Take last time step

        # Fully connected layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))

        # Multi-head outputs
        fault_detection = self.fc_fault_detection(out)
        fault_location = self.fc_fault_location(out)
        crack_size = self.fc_crack_size(out)

        return fault_detection, fault_location, crack_size

In [56]:
gru_ctte = copy.deepcopy(pytorch_ctte)

gru_model = GRU1D_pt()

gru_ctte.load_model(gru_model)

gru_ctte.load_optimizer(
    torch.optim.Adam(gru_model.parameters(), lr=1e-3)
)

In [57]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

In [None]:
gru_ctte.prepare_data()
gru_ctte.train(epochs=100, batch_size=64)
gru_ctte.evaluate()

Epoch 1/100: 100%|██████████| 269/269 [00:04<00:00, 58.45it/s, loss=2.48]


Epoch 1 avg loss: 2.6788


Epoch 2/100: 100%|██████████| 269/269 [00:04<00:00, 59.54it/s, loss=2.44]


Epoch 2 avg loss: 2.6298


Epoch 3/100: 100%|██████████| 269/269 [00:04<00:00, 57.42it/s, loss=2.22]


Epoch 3 avg loss: 2.5710


Epoch 4/100: 100%|██████████| 269/269 [00:04<00:00, 57.96it/s, loss=1.91]


Epoch 4 avg loss: 2.3397


Epoch 5/100: 100%|██████████| 269/269 [00:04<00:00, 57.35it/s, loss=2.4]


Epoch 5 avg loss: 2.0841


Epoch 6/100: 100%|██████████| 269/269 [00:04<00:00, 56.74it/s, loss=1.79]


Epoch 6 avg loss: 2.1133


Epoch 7/100: 100%|██████████| 269/269 [00:04<00:00, 57.96it/s, loss=1.55]


Epoch 7 avg loss: 2.0717


Epoch 8/100: 100%|██████████| 269/269 [00:04<00:00, 57.28it/s, loss=1.61]


Epoch 8 avg loss: 1.9306


Epoch 9/100: 100%|██████████| 269/269 [00:04<00:00, 58.00it/s, loss=1.98]


Epoch 9 avg loss: 1.7126


Epoch 10/100: 100%|██████████| 269/269 [00:04<00:00, 58.07it/s, loss=1.94]


Epoch 10 avg loss: 2.5806


Epoch 11/100: 100%|██████████| 269/269 [00:04<00:00, 57.45it/s, loss=1.95]


Epoch 11 avg loss: 1.8896


Epoch 12/100: 100%|██████████| 269/269 [00:04<00:00, 58.11it/s, loss=0.827]


Epoch 12 avg loss: 1.7339


Epoch 13/100: 100%|██████████| 269/269 [00:04<00:00, 57.43it/s, loss=1.56]


Epoch 13 avg loss: 1.5966


Epoch 14/100: 100%|██████████| 269/269 [00:04<00:00, 58.25it/s, loss=0.827]


Epoch 14 avg loss: 1.4957


Epoch 15/100: 100%|██████████| 269/269 [00:04<00:00, 58.05it/s, loss=1.9]


Epoch 15 avg loss: 1.4379


Epoch 16/100: 100%|██████████| 269/269 [00:04<00:00, 57.49it/s, loss=0.883]


Epoch 16 avg loss: 1.7868


Epoch 17/100: 100%|██████████| 269/269 [00:04<00:00, 57.86it/s, loss=2.05]


Epoch 17 avg loss: 1.6011


Epoch 18/100: 100%|██████████| 269/269 [00:04<00:00, 57.29it/s, loss=1.77]


Epoch 18 avg loss: 1.4011


Epoch 19/100: 100%|██████████| 269/269 [00:04<00:00, 57.53it/s, loss=0.946]


Epoch 19 avg loss: 1.3059


Epoch 20/100: 100%|██████████| 269/269 [00:04<00:00, 57.99it/s, loss=0.544]


Epoch 20 avg loss: 1.2396


Epoch 21/100: 100%|██████████| 269/269 [00:04<00:00, 57.44it/s, loss=1.25]


Epoch 21 avg loss: 1.2087


Epoch 22/100: 100%|██████████| 269/269 [00:04<00:00, 57.93it/s, loss=0.847]


Epoch 22 avg loss: 1.1168


Epoch 23/100: 100%|██████████| 269/269 [00:04<00:00, 57.49it/s, loss=0.338]


Epoch 23 avg loss: 1.0363


Epoch 24/100: 100%|██████████| 269/269 [00:04<00:00, 57.61it/s, loss=0.753]


Epoch 24 avg loss: 1.0031


Epoch 25/100: 100%|██████████| 269/269 [00:04<00:00, 58.15it/s, loss=0.481]


Epoch 25 avg loss: 0.9062


Epoch 26/100: 100%|██████████| 269/269 [00:04<00:00, 57.25it/s, loss=0.515]


Epoch 26 avg loss: 0.8304


Epoch 27/100: 100%|██████████| 269/269 [00:04<00:00, 57.97it/s, loss=0.117]


Epoch 27 avg loss: 0.7397


Epoch 28/100: 100%|██████████| 269/269 [00:04<00:00, 58.09it/s, loss=1.07]


Epoch 28 avg loss: 0.6865


Epoch 29/100: 100%|██████████| 269/269 [00:04<00:00, 57.31it/s, loss=1.47]


Epoch 29 avg loss: 0.6470


Epoch 30/100: 100%|██████████| 269/269 [00:04<00:00, 57.88it/s, loss=0.000319]


Epoch 30 avg loss: 0.5821


Epoch 31/100: 100%|██████████| 269/269 [00:04<00:00, 57.20it/s, loss=0.184]


Epoch 31 avg loss: 0.5379


Epoch 32/100: 100%|██████████| 269/269 [00:04<00:00, 57.91it/s, loss=0.195]


Epoch 32 avg loss: 0.4997


Epoch 33/100: 100%|██████████| 269/269 [00:04<00:00, 57.63it/s, loss=0.766]


Epoch 33 avg loss: 0.5154


Epoch 34/100: 100%|██████████| 269/269 [00:04<00:00, 56.73it/s, loss=0.133]


Epoch 34 avg loss: 0.4565


Epoch 35/100: 100%|██████████| 269/269 [00:04<00:00, 57.89it/s, loss=0.453]


Epoch 35 avg loss: 0.4037


Epoch 36/100: 100%|██████████| 269/269 [00:04<00:00, 56.92it/s, loss=0.0168]


Epoch 36 avg loss: 0.3902


Epoch 37/100: 100%|██████████| 269/269 [00:04<00:00, 57.40it/s, loss=0.0957]


Epoch 37 avg loss: 0.3372


Epoch 38/100: 100%|██████████| 269/269 [00:04<00:00, 58.11it/s, loss=0.162]


Epoch 38 avg loss: 0.3198


Epoch 39/100: 100%|██████████| 269/269 [00:04<00:00, 56.91it/s, loss=0.021]


Epoch 39 avg loss: 0.3017


Epoch 40/100: 100%|██████████| 269/269 [00:04<00:00, 58.06it/s, loss=0.189]


Epoch 40 avg loss: 0.2835


Epoch 41/100: 100%|██████████| 269/269 [00:04<00:00, 57.93it/s, loss=0.263]


Epoch 41 avg loss: 0.2720


Epoch 42/100:  40%|████      | 108/269 [00:01<00:02, 57.64it/s, loss=0.272]

In [42]:
gru_ctte.classification_report()

## Long Short Term Memory (LSTM) Neural Network

### Model Intuition

LSTMs are a subtype of recurrent neural networks.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

### Model Definition

In [29]:
class LSTM1D_pt(nn.Module):
    def __init__(self,
                 sequence_length=1200,
                 chunk_size=10,
                 hidden_size=128,
                 num_layers=3,
                 dropout_rate=0.3,
                 num_fault_locations=4,
                 num_crack_sizes=4):
        super().__init__()

        self.sequence_length = sequence_length
        self.chunk_size = chunk_size
        self.num_steps = sequence_length // chunk_size  # 1200/10 = 120 steps

        # Input projection: transform each chunk into a richer representation
        self.input_proj = nn.Sequential(
            nn.Linear(chunk_size, hidden_size),
            nn.ReLU(),
            nn.LayerNorm(hidden_size),
        )

        # Bidirectional LSTM
        self.lstm = nn.LSTM(
            input_size=hidden_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout_rate,
            bidirectional=True,
        )

        # Attention pooling over timesteps
        self.attention = nn.Sequential(
            nn.Linear(hidden_size * 2, 64),
            nn.Tanh(),
            nn.Linear(64, 1),
        )

        self.layer_norm = nn.LayerNorm(hidden_size * 2)
        self.dropout = nn.Dropout(dropout_rate)

        # Shared FC
        self.fc_shared = nn.Linear(hidden_size * 2, 256)

        # Output heads — all raw logits
        self.fault_detection_output = nn.Linear(256, 2)
        self.fault_location_output = nn.Linear(256, num_fault_locations)
        self.crack_size_output = nn.Linear(256, num_crack_sizes)

        self._init_weights()

    def _init_weights(self):
        # Orthogonal init for LSTM (proven to help with long sequences)
        for name, param in self.lstm.named_parameters():
            if 'weight_ih' in name:
                nn.init.xavier_uniform_(param)
            elif 'weight_hh' in name:
                nn.init.orthogonal_(param)
            elif 'bias' in name:
                nn.init.zeros_(param)
                # Set forget gate bias to 1 to encourage remembering
                hidden = self.lstm.hidden_size
                param.data[hidden:2*hidden].fill_(1.0)

        for layer in [self.fc_shared, self.fault_detection_output,
                      self.fault_location_output, self.crack_size_output]:
            nn.init.kaiming_normal_(layer.weight, nonlinearity='relu')
            if layer.bias is not None:
                nn.init.zeros_(layer.bias)

    def forward(self, x):
        # x: [batch, seq_len] or [batch, 1, seq_len]
        if x.dim() == 3:
            x = x.squeeze(1)

        batch_size = x.size(0)

        # Chunk the signal: [batch, 1200] -> [batch, 120, 10]
        x = x.view(batch_size, self.num_steps, self.chunk_size)

        # Project each chunk: [batch, 120, 10] -> [batch, 120, hidden_size]
        x = self.input_proj(x)

        # Bidirectional LSTM: [batch, 120, hidden_size*2]
        lstm_out, _ = self.lstm(x)

        # Attention pooling: learn which timesteps matter
        attn_weights = self.attention(lstm_out)        # [batch, 120, 1]
        attn_weights = F.softmax(attn_weights, dim=1)  # [batch, 120, 1]
        features = (lstm_out * attn_weights).sum(dim=1) # [batch, hidden_size*2]

        features = self.layer_norm(features)
        features = self.dropout(F.relu(self.fc_shared(features)))

        fault_detection = self.fault_detection_output(features)
        fault_location = self.fault_location_output(features)
        crack_size = self.crack_size_output(features)

        return fault_detection, fault_location, crack_size

In [30]:
lstm_ctte = copy.deepcopy(pytorch_ctte)

lstm_model = LSTM1D_pt(
    sequence_length=1200,
    hidden_size=128,
    num_layers=2,
    dropout_rate=0.1
)

lstm_ctte.load_model(lstm_model)

lstm_ctte.load_optimizer(
    torch.optim.Adam(lstm_model.parameters(), lr=1e-3)
)

In [31]:
lstm_ctte.prepare_data()
lstm_ctte.train(epochs=20, batch_size=64)
lstm_ctte.evaluate()

Epoch 1/20: 100%|██████████| 269/269 [00:05<00:00, 49.45it/s, loss=0.0792]


Epoch 1 avg loss: 1.1172


Epoch 2/20: 100%|██████████| 269/269 [00:05<00:00, 49.76it/s, loss=0.534]


Epoch 2 avg loss: 0.4973


Epoch 3/20: 100%|██████████| 269/269 [00:05<00:00, 49.76it/s, loss=0.195]


Epoch 3 avg loss: 0.3829


Epoch 4/20: 100%|██████████| 269/269 [00:05<00:00, 49.65it/s, loss=0.0116]


Epoch 4 avg loss: 0.3239


Epoch 5/20: 100%|██████████| 269/269 [00:05<00:00, 49.62it/s, loss=0.0103]


Epoch 5 avg loss: 0.2601


Epoch 6/20: 100%|██████████| 269/269 [00:05<00:00, 49.70it/s, loss=0.0223]


Epoch 6 avg loss: 0.2099


Epoch 7/20: 100%|██████████| 269/269 [00:05<00:00, 49.73it/s, loss=0.0229]


Epoch 7 avg loss: 0.2324


Epoch 8/20: 100%|██████████| 269/269 [00:05<00:00, 49.74it/s, loss=0.0266]


Epoch 8 avg loss: 0.1892


Epoch 9/20: 100%|██████████| 269/269 [00:05<00:00, 49.67it/s, loss=0.0618]


Epoch 9 avg loss: 0.1305


Epoch 10/20: 100%|██████████| 269/269 [00:05<00:00, 49.66it/s, loss=0.000472]


Epoch 10 avg loss: 0.1234


Epoch 11/20: 100%|██████████| 269/269 [00:05<00:00, 49.68it/s, loss=4.61e-5]


Epoch 11 avg loss: 0.1130


Epoch 12/20: 100%|██████████| 269/269 [00:05<00:00, 49.70it/s, loss=0.00268]


Epoch 12 avg loss: 0.0909


Epoch 13/20: 100%|██████████| 269/269 [00:05<00:00, 49.74it/s, loss=2.25e-5]


Epoch 13 avg loss: 0.0870


Epoch 14/20: 100%|██████████| 269/269 [00:05<00:00, 49.68it/s, loss=0.00309]


Epoch 14 avg loss: 0.1156


Epoch 15/20: 100%|██████████| 269/269 [00:05<00:00, 49.70it/s, loss=0.086]


Epoch 15 avg loss: 0.0569


Epoch 16/20: 100%|██████████| 269/269 [00:05<00:00, 49.76it/s, loss=0.4]


Epoch 16 avg loss: 0.0645


Epoch 17/20: 100%|██████████| 269/269 [00:05<00:00, 49.69it/s, loss=0.121]


Epoch 17 avg loss: 0.0690


Epoch 18/20: 100%|██████████| 269/269 [00:05<00:00, 49.74it/s, loss=0.000601]


Epoch 18 avg loss: 0.0618


Epoch 19/20: 100%|██████████| 269/269 [00:05<00:00, 49.72it/s, loss=0.00995]


Epoch 19 avg loss: 0.0386


Epoch 20/20: 100%|██████████| 269/269 [00:05<00:00, 49.76it/s, loss=0.0763]


Epoch 20 avg loss: 0.0348


In [34]:
lstm_ctte.classification_report()
