<a href="https://colab.research.google.com/github/cwhitz/ts-trove/blob/master/notebooks/classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Time Series Classification

This notebook explores various time series classification techniques. It makes much fuller use of the bearings dataset also explored in the signal analysis notebook.

## Overview

Time series classification involves assigning time series instances to predefined categories. This notebook will cover:

### Table of Contents

> 1 [Data Preparation](#Data-Preparation)

1.1. [Data Download](##Data-Download)

1.2 [Data Organization](##Data-Organization)

> 2 [Utility Functions](#Training-Functions)

2.1 [Data Loader](##Data-Loader)

> 3 [Time Series Classification with SciKit](#Scikit-Functions)

3.1 [Scikit Trainer and Evaluator](##Scikit-Trainer-and-Evaluator)

> 4 [Deep Learning Models](#Deep-Learning-Models)

4.1 [PyTorch Trainer and Evaluator](##-Trainer-and-Evaluator)

4.2 [Fully Connected Neural Networks]()

4.3 [Recurrent Neural Networks]()

4.3.1 [Classic Recurrent Neural Network]()

4.3.2 [Long Short Term Memory (LSTM) Neural Network]()

4.3.3 [Gated Recurrent Neural Network]()

4.4 [Convolutional Neural Networks]()

4.4.1 [1D Convolutional Neural Network]()

4.4.2 [Temporal Convolutional Network]()

4.5 [Attention Based Models]()

4.5.1 [LSTM with Attention]()

4.5.2 [Time Series Transformer]()



In [79]:
import pandas as pd
import numpy as np
import os
import pathlib
import matplotlib.pyplot as plt
import json
import pathlib
import shutil
import kagglehub

import tqdm

In [80]:
!pip install cesium



# Data Preparation

## Data Overview

**What is this dataset?**

This is a collection of vibration data from electric motor bearings. Bearings those small spinning parts that let machinery rotate smoothly. Think of a bearing like the axle in a wheel: it's got little metal balls inside that roll around, letting a shaft spin with barely any friction.

**Why was this dataset created?**

The researchers at Case Western Reserve University deliberately damaged bearings in different ways, then recorded how the motor vibrated as a result. They made tiny cracks of various sizes (ranging from 7 to 40 thousandths of an inch) in the bearings, then attached vibration sensors to measure what happened.

Cracks of different mm sizes were introduced on the outer race, inner race, and the balls themselves.

![ball_bearing_diagram](https://www.globalspec.com/ImageRepository/LearnMore/20133/ball%20bearing5364b00280ef4db7b85dfba113f04556.png)

The goal was to understand the relationship between bearing damage and vibration patterns, creating a reference library that shows what different types of bearing failure look like.

**Why is it useful?**

This data is incredibly useful for real-world maintenance and diagnostics. In factories and power plants, you can use vibration patterns to detect bearing problems before they cause catastrophic failures. By comparing vibrations from a running machine to patterns in this dataset, maintenance teams can identify early signs of wear, predict when a bearing will fail, and schedule repairs before expensive downtime happens. It's basically like a fingerprint database for bearing damage—once you know what a damaged bearing "sounds like," you can spot trouble coming.

**So what are we actually trying to predict?**

Good question. We will try to train machine learning models to predict three things: 1) Is the bearing in normal operation? 2) If not, where is the crack? 3) And what size is it?

2 and 3 of course become irrelevant if the bearing is in normal operation, but they allow us to go a step beyond simple detection of irregular operation.


## Data Download

The raw data can be downloaded directly from Kaggle.

In [81]:
kagglepath = "sufian79/cwru-mat-full-dataset"
path = kagglehub.dataset_download(kagglepath)


pathlib.Path(f"./{kagglepath.split('/')[-1]}").mkdir(parents=True, exist_ok=True)
shutil.copytree(path, f"./{kagglepath.split('/')[-1]}", dirs_exist_ok=True)

Using Colab cache for faster access to the 'cwru-mat-full-dataset' dataset.


'./cwru-mat-full-dataset'

## Data Organization

The raw data is a collection of numbered mat files and requires reference back to the [original website](https://engineering.case.edu/bearingdatacenter/48k-drive-end-bearing-fault-data) to make sense of. I've gone ahead and done that with the JSON structure below.

The data is organized at top-level describing the type of fault, or lack thereof with "normal" sample files are the motor operating without faults. The next level down is the sampling rate, followed by the location where the crack was introduced (IR being inner race, B being ball, OR being outer race) and then finally, the size of the cracks ranging from 7 to 21 mm.

The code below this cell moves the individual samples into folders matching the structure below, which aligns with how PyTorch's DataSet and DataLoader work (we will make it work for scikit too).

In [82]:
folder_structure = {
  "normal": {
    "48k": ["97", "98", "99", "100"]
  },
  "drive_end_fault": {
    "12k": {
      "IR": {
        "007": ["105", "106", "107", "108"],
        "014": ["169", "170", "171", "172"],
        "021": ["209", "210", "211", "212"]
      },

      "B": {
        "007": ["118", "119", "120", "121"],
        "014": ["185", "186", "187", "188"],
        "021": ["222", "223", "224", "225"]
      },

      "OR": {
        "007": ["130", "131", "132", "133"],
        "014": ["197", "198", "199", "200"],
        "021": ["234", "235", "236", "237"]
      }
    },

    "48k": {
      "IR": {
        "007": ["109", "110", "111", "112"],
        "014": ["174", "175", "176", "177"],
        "021": ["213", "214", "215", "217"]
      },

      "B": {
        "007": ["122", "123", "124", "125"],
        "014": ["189", "190", "191", "192"],
        "021": ["226", "227", "228", "229"]
      },

      "OR": {
        "007": ["135", "136", "137", "138"],
        "014": ["201", "202", "203", "204"],
        "021": ["238", "239", "240", "241"]
      }
    }
  },

  "fan_end_fault": {
    "12k": {
      "IR": {
        "007": ["278", "279", "280", "281"],
        "014": ["274", "275", "276", "277"],
        "021": ["270", "271", "272", "273"]
      },

      "B": {
        "007": ["282", "283", "284", "285"],
        "014": ["286", "287", "288", "289"],
        "021": ["290", "291", "292", "293"]
      },

      "OR": {
        "007": ["298", "299", "300", "301"],
        "014": ["309", "310", "311", "312"],
        "021": ["315", "316", "317", "318"]
      }
    }
  }
}

In [83]:
SOURCE_DIR = "cwru-mat-full-dataset/"
TARGET_DIR = "classification-cwru-mat-organized"
FILE_EXTENSION = ".mat"

def ensure_dir(path):
    os.makedirs(path, exist_ok=True)

def move_file(file_id, dest_dir):
    filename = file_id + FILE_EXTENSION
    src_path = os.path.join(SOURCE_DIR, filename)
    dst_path = os.path.join(dest_dir, filename)

    if not os.path.exists(src_path):
        print(f"⚠️ Missing file: {src_path}")
        return

    ensure_dir(dest_dir)
    shutil.move(src_path, dst_path)

def walk_structure(node, current_path):
    if isinstance(node, list):
        for file_id in node:
            move_file(file_id, current_path)
    elif isinstance(node, dict):
        for key, child in node.items():
            walk_structure(child, os.path.join(current_path, key))
    else:
        raise ValueError("Unexpected structure type")


walk_structure(folder_structure, TARGET_DIR)
print("Done.")

Done.


# Utility Functions

## Data Loader

Before diving into modeling, we first need a consistent way to load and represent our time-series data. Since later sections will experiment with both deep learning and traditional classifiers, we define a reusable dataset structure that keeps preprocessing, sampling rate handling, and labels consistent across all methods.

In [84]:
from torch.utils.data import Dataset
from torch.nn import Module
import scipy.io
import enum

# samplng rate enum
class SamplingRate(enum.Enum):
    sr12K = "12k"
    sr48K = "48k"

class FaultLocation(enum.Enum):
    DE = "drive_end_fault"
    FE = "front_end_fault"


class BearingDataset(Dataset):
    def __init__(self, file_paths, sampling_rate, fault_location, chunk_length, unified_label=True, transform=None):
        self.file_paths = file_paths
        self.sampling_rate = sampling_rate
        self.fault_location = fault_location
        self.chunk_length = chunk_length
        self.transform = transform
        self.unified_label = unified_label

        self.data = []
        self.labels = []

        self._organize_data()

    def _organize_data(self):
        for fp in self.file_paths:
            if not pathlib.Path(fp).exists():
                raise FileNotFoundError(f"File not found: {fp}")

            mat_data = scipy.io.loadmat(fp)

            key_to_match = f"_{str(self.fault_location)[-2:]}_time"
            sensor_key = [key for key in mat_data.keys() if key_to_match in key][0]

            signal = mat_data[sensor_key].squeeze()

            n_chunks = len(signal) // self.chunk_length
            truncated = signal[:n_chunks * self.chunk_length]

            windows = truncated.reshape(n_chunks, self.chunk_length)

            label_parts = fp.parent.parts
            if label_parts[-2] == 'normal':
                label_dict = {
                    'normal': True,
                    'fault_location': 'NA',
                    'crack_size': 'NA'
                }
            else:
                label_dict = {
                    'normal': False,
                    'fault_location': label_parts[-2],
                    'crack_size': label_parts[-1]
                }


            for window in windows:
              self.data.append(window)

              if self.unified_label:
                self.labels.append(f"{label_dict['fault_location']}_{label_dict['crack_size']}" if label_dict['normal'] == False else "normal")
              else:
                self.labels.append(label_dict)


    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        window = self.data[idx]
        label = self.labels[idx]

        if self.transform:
            window = self.transform(window).astype('float32')

        return window, label

The dataset class above seeks to make the most of the data available in the bearings dataset by splitting each sample in the file into multiple overlapping windows. This increases the effective number of training samples and helps models learn more robust patterns. However, care must be taken to avoid data leakage between training and test sets when using overlapping windows - if we were to pull all the data and then split into train/test, windows from the same original sample could end up in both sets.

To prevent this, we ensure that all windows derived from a given file are assigned to either the training or test set exclusively by splitting into train/test at the file level.

In [85]:
from sklearn.model_selection import train_test_split
from pathlib import Path
from collections import Counter

all_files = list(Path("classification-cwru-mat-organized").rglob("*.mat"))

# derive one label per file
file_labels = [
    '_'.join(f.parent.parts[-2:])
    for f in all_files
]

train_files, test_files = train_test_split(
    all_files,
    test_size=.2,
    shuffle=True,
    stratify=file_labels
)

##

We want to set up a class for testing different classification techniques on the bearings dataset. The class will accept a dataset object and classification model, and be able to train and evaluate the model consistently for metrics like accuracy, precision, recall, and F1-score as well as time for training and inference.

In [86]:
from abc import ABC, abstractmethod
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from tqdm import tqdm
import copy


class ClassificationTrainTestEvaluate(ABC):
    def __init__(self, train_dataset: Dataset, test_dataset: Dataset):
        self.train_dataset = train_dataset
        self.test_dataset = test_dataset

        self.model = None

    def load_model(self, model):
        self.model = model

    def classification_report(self):
      """
      Creates a Plotly figure with three tabs, each showing:
      - Confusion matrix heatmap
      - Metrics summary table

      One tab per task: Fault Detection, Fault Location, Crack Size
      """
      from plotly.subplots import make_subplots
      import plotly.graph_objects as go
      from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

      # Define task names and their corresponding predictions/labels
      tasks = {
          'Fault Detection': {
              'predictions': self.predictions_fault_detection,
              'labels': self.test_y_fault_detection,
              'class_names': ['Normal', 'Fault']
          },
          'Fault Location': {
              'predictions': self.predictions_fault_location,
              'labels': self.test_y_fault_location,
              'class_names': ['B', 'IR', 'OR']
          },
          'Crack Size': {
              'predictions': self.predictions_crack_size,
              'labels': self.test_y_crack_size,
              'class_names': ['007', '014', '021']
          }
      }

      # Create subplots for each task
      figs = []

      for task_name, task_data in tasks.items():
          predictions = task_data['predictions']
          labels = task_data['labels']
          class_names = task_data['class_names']

          # Compute confusion matrix
          cm = confusion_matrix(labels, predictions)

          # Compute metrics
          accuracy = accuracy_score(labels, predictions)
          precision = precision_score(labels, predictions, average='weighted', zero_division=0)
          recall = recall_score(labels, predictions, average='weighted', zero_division=0)
          f1 = f1_score(labels, predictions, average='weighted', zero_division=0)

          # Create subplot layout
          fig = make_subplots(
              rows=1, cols=2,
              column_widths=[0.6, 0.4],
              specs=[[{"type": "heatmap"}, {"type": "table"}]],
              subplot_titles=("Confusion Matrix", "Model Performance Metrics")
          )

          # --- Confusion Matrix Heatmap ---
          fig.add_trace(
              go.Heatmap(
                  z=cm,
                  x=class_names,
                  y=class_names,
                  text=cm,
                  texttemplate="%{text}",
                  colorscale="Blues",
                  showscale=False
              ),
              row=1, col=1
          )

          fig.update_xaxes(title_text="Predicted Label", row=1, col=1)
          fig.update_yaxes(title_text="True Label", row=1, col=1)

          # --- Metrics Table ---
          fig.add_trace(
              go.Table(
                  header=dict(
                      values=["Metric", "Value"],
                      fill_color="lightgrey",
                      align="center"
                  ),
                  cells=dict(
                      values=[
                          ["Accuracy", "Precision", "Recall", "F1 Score"],
                          [f"{accuracy:.4f}", f"{precision:.4f}", f"{recall:.4f}", f"{f1:.4f}"]
                      ],
                      align="center"
                  )
              ),
              row=1, col=2
          )

          fig.update_layout(
              title=f"{task_name} - Evaluation Summary",
              height=500,
              width=900,
              showlegend=False
          )

          figs.append((task_name, fig))

      # Display each figure
      for task_name, fig in figs:
          fig.show()

class SciKitCTTE(ClassificationTrainTestEvaluate):
    def prepare_data(self):
        self.train_X, self.train_y = pd.DataFrame(), pd.Series()
        print("Preparing training data...")
        for i in tqdm(range(len(self.train_dataset))):
            X_chunk, label = self.train_dataset[i]

            self.train_X = pd.concat([self.train_X, X_chunk], ignore_index=True)
            self.train_y = pd.concat([self.train_y, pd.Series(label)], ignore_index=True)

        self.test_X, self.test_y = pd.DataFrame(), pd.Series()
        print("Preparing test data...")
        for i in tqdm(range(len(self.test_dataset))):
            X_chunk, labels = self.test_dataset[i]

            self.test_X = pd.concat([self.test_X, X_chunk], ignore_index=True)
            self.test_y = pd.concat([self.test_y, pd.Series(labels)], ignore_index=True)

    def train(self, train_X, train_y):
        self.model.fit(train_X, train_y)
        self.class_names = sorted(self.train_y.unique())

    def evaluate(self, test_X, test_y):
        self.predictions = self.model.predict(test_X)


# Feature Extraction + Feature Based Classification

With a dataset abstraction in place, we can now explore different families of time-series classification techniques. The goal here is not only to compare performance, but also to understand how different representation choices affect model behavior on sensor-like signals.

We begin with feature-based methods, which transform raw time-series into fixed-length statistical representations. These approaches are often strong baselines, easier to interpret, and computationally efficient compared to end-to-end deep learning models.

### Feature Extraction

We will implement a custom transformer class for the PyTorch dataset to extract statistical features using the `cesium` library.

In [87]:
from cesium import featurize

class FeatureExtractionTransform(Module):
    def forward(self, window):
        features_to_use = [
            "amplitude",
            "percent_beyond_1_std",
            "maximum",
            "max_slope",
            "median",
            "median_absolute_deviation",
            "percent_close_to_median",
            "minimum",
            "period_fast",
            "skew",
            "std",
        ]

        fset = featurize.featurize_time_series(
            times=np.arange(len(window)),
            values=window,
            errors=None,
            features_to_use=features_to_use,
        )

        fset = fset.stack(future_stack=True)

        return fset


In [88]:
train_dataset = BearingDataset(
    train_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    chunk_length=1200,
    unified_label=True,
    transform=FeatureExtractionTransform()
)

test_dataset = BearingDataset(
    test_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    chunk_length=1200,
    unified_label=True,
    transform=FeatureExtractionTransform()
)

In [89]:
# from sklearn.ensemble import RandomForestClassifier

# sk_ctte = SciKitCTTE(
#     train_dataset,
#     test_dataset)

# sk_ctte.prepare_data()


In [90]:
# rfc = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)

# sk_trainer = sk_ctte
# sk_trainer.load_model(rfc)

# sk_trainer.train(sk_trainer.train_X, sk_trainer.train_y)
# sk_trainer.evaluate(sk_trainer.test_X, sk_trainer.test_y)
# sk_trainer.classification_report()

In [91]:
# from sklearn.svm import SVC

# svm = SVC(kernel='linear', C=.1, random_state=42)

# sk_trainer = sk_ctte
# sk_trainer.load_model(svm)

# sk_trainer.train(sk_trainer.train_X, sk_trainer.train_y)
# sk_trainer.evaluate(sk_trainer.test_X, sk_trainer.test_y)
# sk_trainer.classification_report()

# Deep Learning Models

In this section, I will explore a wide variety of neural network models to find which can perform the best at what is essentially a many-to-one problem, where we are giving the model a dataset of many measurements of vibrational movement where ordering matters, because those measurements unfolded across time.

4.1 [PyTorch Trainer and Evaluator](##
PyTorch-Trainer-and-Evaluator)

4.2 [Fully Connected Neural Networks]()

4.3 [Recurrent Neural Networks]()

4.3.1 [Classic Recurrent Neural Network]()

4.3.2 [Long Short Term Memory (LSTM) Neural Network]()

4.3.3 [Gated Recurrent Neural Network]()

4.4 [Convolutional Neural Networks]()

4.4.1 [1D Convolutional Neural Network]()

4.4.2 [Temporal Convolutional Network]()

4.5 [Attention Based Models]()

4.5.1 [LSTM with Attention]()

4.5.2 [Time Series Transformer]()

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

In [92]:
import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

CUDA available: True
GPU count: 1
GPU name: Tesla T4


##PyTorch Trainer and Evaluator

In [105]:
from torch.utils.data import DataLoader
from torch import Tensor, float32, LongTensor
import torch
from tqdm import tqdm

class PyTorchCTTE(ClassificationTrainTestEvaluate):
    def __init__(self, train_dataset: Dataset, test_dataset: Dataset, device='cpu', criterion=None, detection_weighting=1):
        super().__init__(train_dataset, test_dataset)
        self.device = device
        self.criterion = criterion

        self.target_mapping = {
          'fault_location': {'B': 0, 'IR': 1, 'OR': 2, 'NA': 3},
          'crack_size': {'007': 0, '014': 1, '021': 2, 'NA': 3}
          }

        self.train_dataset_mean = None
        self.train_dataset_std = None
        self.detection_weighting = detection_weighting

    def __deepcopy__(self, memo):
        """Deep copy - recursively copies nested objects"""
        return PyTorchCTTE(
            copy.deepcopy(self.train_dataset, memo),
            copy.deepcopy(self.test_dataset, memo),
            copy.deepcopy(self.device, memo),
            copy.deepcopy(self.criterion, memo)
        )

    def load_model(self, model):
        self.model = model

    def load_optimizer(self, optimizer):
        self.optimizer = optimizer

    def prepare_data(self):
        self.train_dataloader = DataLoader(self.train_dataset, batch_size=64, shuffle=True)
        self.test_dataloader = DataLoader(self.test_dataset, batch_size=64, shuffle=False)

    def train(self, epochs: int, batch_size: int):
        self.model.to(self.device)

        # self.train_dataset_mean = np.mean(np.concatenate(self.train_dataset.data))
        # self.train_dataset_std = np.std(np.concatenate(self.train_dataset.data))

        for epoch in range(epochs):
            self.model.train()
            epoch_loss = 0.0

            progress_bar = tqdm(self.train_dataloader, desc=f"Epoch {epoch+1}/{epochs}")

            for batch_X, batch_y in progress_bar:
                # X
                batch_X = Tensor(batch_X.to(float32))
                batch_X = (batch_X - batch_X.mean(dim=1, keepdim=True)) / (batch_X.std(dim=1, keepdim=True) + 1e-6)
                batch_X = batch_X.to(self.device)

                # ys
                batch_y_fault_detection = [l for l in batch_y['normal']]
                batch_y_fault_location = [self.target_mapping['fault_location'].get(l, 2) for l in batch_y['fault_location']]
                batch_y_crack_size = [self.target_mapping['crack_size'].get(l, 2) for l in batch_y['crack_size']]

                # move to GPU
                batch_y_fault_detection = LongTensor(batch_y_fault_detection).to(self.device)
                batch_y_fault_location = LongTensor(batch_y_fault_location).to(self.device)
                batch_y_crack_size = LongTensor(batch_y_crack_size).to(self.device)

                # clear gradients before training
                self.optimizer.zero_grad()

                # run the inputs through the network
                fault_detection, fault_location, crack_size = self.model(batch_X)

                # calculate the loss
                loss_fault_detection = self.criterion(fault_detection, batch_y_fault_detection)
                loss_fault_location = self.criterion(fault_location, batch_y_fault_location)
                loss_crack_size = self.criterion(crack_size, batch_y_crack_size)

                # sum to total loss
                total_loss = (self.detection_weighting * loss_fault_detection) + loss_fault_location + loss_crack_size

                # backpropagate
                total_loss.backward()

                self.optimizer.step()

                epoch_loss += total_loss.item()
                progress_bar.set_postfix(loss=total_loss.item())

            print(f"Epoch {epoch+1} avg loss: {epoch_loss/len(self.train_dataloader):.4f}")

    def evaluate(self):
        self.model.eval()
        self.predictions_fault_detection = []
        self.predictions_fault_location = []
        self.predictions_crack_size = []
        self.test_y_fault_detection = []
        self.test_y_fault_location = []
        self.test_y_crack_size = []

        with torch.no_grad():
            for batch_X, batch_y in self.test_dataloader:
                batch_X = batch_X = Tensor(batch_X.to(float32))
                batch_X = (batch_X - batch_X.mean(dim=1, keepdim=True)) / (batch_X.std(dim=1, keepdim=True) + 1e-6)
                batch_X = batch_X.to(self.device)

                fault_detection, fault_location, crack_size = self.model(batch_X)

                # Get predictions for each task
                _, pred_fd = torch.max(fault_detection, 1)
                _, pred_fl = torch.max(fault_location, 1)
                _, pred_cs = torch.max(crack_size, 1)

                self.predictions_fault_detection.extend(pred_fd.cpu().numpy().tolist())
                self.predictions_fault_location.extend(pred_fl.cpu().numpy().tolist())
                self.predictions_crack_size.extend(pred_cs.cpu().numpy().tolist())

                # Store true labels
                self.test_y_fault_detection.extend([int(l) for l in batch_y['normal']])
                self.test_y_fault_location.extend([self.target_mapping['fault_location'].get(l, 0) for l in batch_y['fault_location']])
                self.test_y_crack_size.extend([self.target_mapping['crack_size'].get(l, 0) for l in batch_y['crack_size']])

### Datasets for Deep Learning

In [106]:
from torch.nn import CrossEntropyLoss

train_dataset = BearingDataset(
    train_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    unified_label=False,
    chunk_length=1200
)

test_dataset = BearingDataset(
    test_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    unified_label=False,
    chunk_length=1200
)

pytorch_ctte = PyTorchCTTE(
    train_dataset,
    test_dataset,
    device=device,
    criterion=CrossEntropyLoss()
)

##Fully Connected Neural Network

### Model Intuition


### Model Definition

In [107]:
from torch import nn
import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss

class FCNN(nn.Module):
    def __init__(self, input_dim=1200, num_fault_locations=4, num_crack_sizes=4):
        super(FCNN, self).__init__()
        self.shared = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
        )

        self.fault_detection_output = nn.Linear(256, 2)
        self.fault_location_output = nn.Linear(256, num_fault_locations)
        self.crack_size_output = nn.Linear(256, num_crack_sizes)

        # Xavier initialization
        nn.init.xavier_uniform_(self.fault_detection_output.weight)
        nn.init.xavier_uniform_(self.fault_location_output.weight)
        nn.init.xavier_uniform_(self.crack_size_output.weight)

    def forward(self, x):
        x = self.shared(x)

        return (
            torch.sigmoid(self.fault_detection_output(x)),
            self.fault_location_output(x),
            self.crack_size_output(x)
        )

In [108]:
fcnn_ctte = copy.deepcopy(pytorch_ctte)

fcnn_model = FCNN(
    input_dim=1200
)

fcnn_ctte.load_model(fcnn_model)

fcnn_ctte.load_optimizer(
    torch.optim.Adam(fcnn_model.parameters(), lr=1e-3)
)


### Training

In [109]:
fcnn_ctte.prepare_data()
fcnn_ctte.train(epochs=20, batch_size=64)
fcnn_ctte.evaluate()

Epoch 1/20: 100%|██████████| 261/261 [00:01<00:00, 177.77it/s, loss=2.01]


Epoch 1 avg loss: 1.9155


Epoch 2/20: 100%|██████████| 261/261 [00:01<00:00, 182.21it/s, loss=1.23]


Epoch 2 avg loss: 1.1504


Epoch 3/20: 100%|██████████| 261/261 [00:01<00:00, 177.26it/s, loss=0.752]


Epoch 3 avg loss: 0.8243


Epoch 4/20: 100%|██████████| 261/261 [00:01<00:00, 179.93it/s, loss=0.729]


Epoch 4 avg loss: 0.6469


Epoch 5/20: 100%|██████████| 261/261 [00:01<00:00, 159.53it/s, loss=0.389]


Epoch 5 avg loss: 0.5316


Epoch 6/20: 100%|██████████| 261/261 [00:01<00:00, 150.64it/s, loss=2.07]


Epoch 6 avg loss: 0.5032


Epoch 7/20: 100%|██████████| 261/261 [00:01<00:00, 177.35it/s, loss=0.327]


Epoch 7 avg loss: 0.4838


Epoch 8/20: 100%|██████████| 261/261 [00:01<00:00, 174.16it/s, loss=0.314]


Epoch 8 avg loss: 0.4343


Epoch 9/20: 100%|██████████| 261/261 [00:01<00:00, 176.56it/s, loss=0.583]


Epoch 9 avg loss: 0.4396


Epoch 10/20: 100%|██████████| 261/261 [00:01<00:00, 174.71it/s, loss=0.376]


Epoch 10 avg loss: 0.4549


Epoch 11/20: 100%|██████████| 261/261 [00:01<00:00, 177.44it/s, loss=0.608]


Epoch 11 avg loss: 0.4336


Epoch 12/20: 100%|██████████| 261/261 [00:01<00:00, 177.10it/s, loss=0.328]


Epoch 12 avg loss: 0.4517


Epoch 13/20: 100%|██████████| 261/261 [00:01<00:00, 162.03it/s, loss=0.424]


Epoch 13 avg loss: 0.4000


Epoch 14/20: 100%|██████████| 261/261 [00:01<00:00, 146.12it/s, loss=0.314]


Epoch 14 avg loss: 0.4344


Epoch 15/20: 100%|██████████| 261/261 [00:01<00:00, 177.79it/s, loss=0.553]


Epoch 15 avg loss: 0.3917


Epoch 16/20: 100%|██████████| 261/261 [00:01<00:00, 173.33it/s, loss=0.314]


Epoch 16 avg loss: 0.4081


Epoch 17/20: 100%|██████████| 261/261 [00:01<00:00, 153.39it/s, loss=1.25]


Epoch 17 avg loss: 0.4200


Epoch 18/20: 100%|██████████| 261/261 [00:01<00:00, 131.30it/s, loss=0.624]


Epoch 18 avg loss: 0.4622


Epoch 19/20: 100%|██████████| 261/261 [00:01<00:00, 174.63it/s, loss=0.51]


Epoch 19 avg loss: 0.4132


Epoch 20/20: 100%|██████████| 261/261 [00:01<00:00, 176.85it/s, loss=0.33]


Epoch 20 avg loss: 0.3970


### Classification Report

In [111]:
fcnn_ctte.classification_report()

## ResNet

https://karpathy.github.io/2015/05/21/rnn-effectiveness/

### Model Intuition



### Model Definition

In [118]:
class ResidualBlock1D(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, downsample=False):
        super().__init__()
        stride = 2 if downsample else 1
        padding = kernel_size // 2

        self.conv1 = nn.Conv1d(
            in_channels, out_channels, kernel_size,
            stride=stride, padding=padding
        )
        self.bn1 = nn.BatchNorm1d(out_channels)

        self.conv2 = nn.Conv1d(
            out_channels, out_channels, kernel_size,
            stride=1, padding=padding
        )
        self.bn2 = nn.BatchNorm1d(out_channels)

        self.shortcut = nn.Identity()
        if downsample or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv1d(in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm1d(out_channels)
            )

    def forward(self, x):
        identity = self.shortcut(x)
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += identity
        return F.relu(out)


class ResNet1D_pt(nn.Module):
    def __init__(self,
                 input_channels=1,
                 num_location_classes=4,
                 num_size_classes=4):
        super().__init__()

        # Backbone
        self.layer1 = ResidualBlock1D(input_channels, 64)
        self.layer2 = ResidualBlock1D(64, 128, downsample=True)
        self.layer3 = ResidualBlock1D(128, 256, downsample=True)
        self.layer4 = ResidualBlock1D(256, 256, downsample=True)
        self.layer5 = ResidualBlock1D(256, 256, downsample=True)
        self.layer6 = ResidualBlock1D(256, 256, downsample=True)

        self.pool = nn.AdaptiveAvgPool1d(1)
        self.dropout = nn.Dropout(0.3)

        self.fc1 = nn.Linear(256, 256)
        self.fc2 = nn.Linear(256, 128)

        # Three classification heads (trainer-compatible)
        self.fc_fault_detection = nn.Linear(128, 2)
        self.fc_fault_location = nn.Linear(128, num_location_classes)
        self.fc_crack_size = nn.Linear(128, num_size_classes)

        self._init_weights()

    def _init_weights(self):
        for m in self.modules():
            if isinstance(m, (nn.Conv1d, nn.Linear)):
                nn.init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    nn.init.zeros_(m.bias)

    def forward(self, x):
        # Accept (batch, length) or (batch, 1, length)
        if x.dim() == 2:
            x = x.unsqueeze(1)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = self.layer6(x)

        x = self.pool(x).squeeze(-1)
        x = self.dropout(F.relu(self.fc1(x)))
        x = F.relu(self.fc2(x))

        fault_detection = self.fc_fault_detection(x)
        fault_location = self.fc_fault_location(x)
        crack_size = self.fc_crack_size(x)

        return fault_detection, fault_location, crack_size


### Training Preparation

In [119]:
resnet_ctte = copy.deepcopy(pytorch_ctte)

resnet_model = ResNet1D_pt()

resnet_ctte.load_model(resnet_model)

resnet_ctte.load_optimizer(
    torch.optim.Adam(fcnn_model.parameters(), lr=1e-3)
)

### Training

In [120]:
resnet_ctte.prepare_data()
resnet_ctte.train(epochs=20, batch_size=64)
resnet_ctte.evaluate()

Epoch 1/20: 100%|██████████| 261/261 [00:15<00:00, 17.00it/s, loss=3.22]


Epoch 1 avg loss: 3.5499


Epoch 2/20:  18%|█▊        | 46/261 [00:02<00:12, 16.73it/s, loss=3.58]


KeyboardInterrupt: 

### Classification Report

## Long Short Term Memory (LSTM) Neural Network

### Model Intuition

LSTMs are a subtype of recurrent neural networks.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

### Model Definition

In [121]:
class LSTM1D_pt(nn.Module):
    def __init__(self, sequence_length=1024, hidden_size=128, num_layers=2, dropout_rate=0.3, num_fault_locations=4, num_crack_sizes=4):
        super(LSTM1D_pt, self).__init__()
        self.sequence_length = sequence_length
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.lstm = nn.LSTM(input_size=1,
                            hidden_size=hidden_size,
                            num_layers=num_layers,
                            batch_first=True,
                            dropout=dropout_rate,
                            bidirectional=False)

        self.bn = nn.BatchNorm1d(hidden_size)

        # Add a fully connected layer to expand features to 256
        self.fc_expand = nn.Linear(hidden_size, 256)

        # Output heads for three classification tasks
        self.fault_detection_output = nn.Linear(256, 2)
        self.fault_location_output = nn.Linear(256, num_fault_locations)
        self.crack_size_output = nn.Linear(256, num_crack_sizes)

        # Xavier initialization
        self._init_weights()

    def _init_weights(self):
        for layer in [self.fc_expand, self.fault_detection_output,
                      self.fault_location_output, self.crack_size_output]:
            nn.init.xavier_uniform_(layer.weight)
            if layer.bias is not None:
                nn.init.zeros_(layer.bias)

    def forward(self, x):
        x = x.unsqueeze(-1)  # [batch, seq_len, 1]
        lstm_out, (h_n, _) = self.lstm(x)  # [batch, seq_len, hidden_size]

        # Average pooling over sequence dimension
        features = lstm_out.mean(dim=1)  # [batch, hidden_size]
        features = self.bn(features)     # [batch, hidden_size]

        # Expand to 256 dimensions
        features = torch.relu(self.fc_expand(features))  # [batch, 256]

        return (
            torch.sigmoid(self.fault_detection_output(features)),
            self.fault_location_output(features),
            self.crack_size_output(features)
        )

In [123]:
lstm_ctte = copy.deepcopy(pytorch_ctte)

lstm_model = LSTM1D_pt(
    sequence_length=1200,
    hidden_size=128,
    num_layers=2,
    dropout_rate=0.1
)

lstm_ctte.load_model(lstm_model)

lstm_ctte.load_optimizer(
    torch.optim.Adam(lstm_model.parameters(), lr=1e-3)
)

In [None]:
lstm_ctte.prepare_data()
lstm_ctte.train(epochs=10, batch_size=64)
lstm_ctte.evaluate()

Epoch 1/10: 100%|██████████| 261/261 [00:14<00:00, 18.43it/s, loss=4.12]


Epoch 1 avg loss: 1.8233


Epoch 2/10: 100%|██████████| 261/261 [00:14<00:00, 18.49it/s, loss=0.646]


Epoch 2 avg loss: 1.2089


Epoch 3/10: 100%|██████████| 261/261 [00:14<00:00, 17.99it/s, loss=0.812]


Epoch 3 avg loss: 1.0264


Epoch 4/10: 100%|██████████| 261/261 [00:14<00:00, 17.96it/s, loss=0.505]


Epoch 4 avg loss: 0.9213


Epoch 5/10: 100%|██████████| 261/261 [00:14<00:00, 17.70it/s, loss=1.31]


Epoch 5 avg loss: 0.8559


Epoch 6/10:  79%|███████▉  | 206/261 [00:11<00:03, 17.54it/s, loss=0.769]

In [None]:
lstm_ctte.classification_report()
