# Time Series Classification

This notebook explores various time series classification techniques. It makes much fuller use of the bearings dataset also explored in the signal analysis notebook.

## Overview

Time series classification involves assigning time series instances to predefined categories. This notebook will cover:

- **Feature-Based Methods**: Statistical features, tsfresh, Catch22
- **Distance-Based Methods**: DTW-KNN, Euclidean Distance
- **Dictionary-Based Methods**: BOSS, cBOSS
- **Shapelet-Based Methods**: Shapelet Transform
- **Deep Learning Methods**: CNN, ResNet, InceptionTime
- **Ensemble Methods**: HIVE-COTE, TS-CHIEF

In [1]:
import pandas as pd
import numpy as np
import os
import pathlib
import matplotlib.pyplot as plt

## Load Data

Here I am implementing a PyTorch dataset for the bearings dataset. While we will explore various classification techniques, the dataset loading and preprocessing will remain consistent including for simpler feature-based methods.

In [2]:
from torch.utils.data import Dataset
from torch.nn import Module
import scipy.io
import enum

# samplng rate enum
class SamplingRate(enum.Enum):
    sr12K = "12k"
    sr48K = "48k"

class FaultLocation(enum.Enum):
    DE = "drive_end_fault"
    FE = "front_end_fault"


class BearingDataset(Dataset):
    def __init__(self, file_paths, sampling_rate, fault_location, chunk_length, transform=None):
        self.file_paths = file_paths
        self.sampling_rate = sampling_rate
        self.fault_location = fault_location
        self.chunk_length = chunk_length
        self.transform = transform

    def __len__(self):
        return len(self.file_paths)

    def __getitem__(self, idx):
        file_path = self.file_paths[idx]

        mat_data = scipy.io.loadmat(file_path)

        key_to_match = f"_{str(self.fault_location)[-2:]}_time"
        sensor_key = [key for key in mat_data.keys() if key_to_match in key][0]

        signal = mat_data[sensor_key].squeeze()

        n_chunks = len(signal) // self.chunk_length
        truncated = signal[:n_chunks * self.chunk_length]

        windows = truncated.reshape(n_chunks, self.chunk_length)

        label = '_'.join(file_path.parent.parts[-2:])
        labels = [label, ] * n_chunks

        if self.transform:
            windows, label = self.transform(windows, label)

        # return ALL windows for this file
        return windows, labels




## Classification Techniques

### Feature-Based Methods

The first set of techniques focuses on extracting meaningful features from time series data to facilitate classification using traditional classification algorithms. We will implement a custom transforer class for the PyTorch dataset to extract statistical features using the `cesium` library.

In [3]:
from cesium import featurize

class FeatureExtractionTransform(Module):
    def forward(self, stacked_chunks, label):
        features_to_use = [
            "amplitude",
            "percent_beyond_1_std",
            "maximum",
            "max_slope",
            "median",
            "median_absolute_deviation",
            "percent_close_to_median",
            "minimum",
            "skew",
            "std",
        ]

        fset = featurize.featurize_time_series(
            times=np.arange(stacked_chunks.shape[1]),
            values=stacked_chunks,
            errors=None,
            features_to_use=features_to_use,
        )

        fset = fset.stack(future_stack=True)

        return fset, label


In [4]:
from sklearn.model_selection import train_test_split
from pathlib import Path
from collections import Counter

all_files = list(Path("../data/classification/cwru-bearing-full-organized").rglob("*.mat"))

# derive one label per file
file_labels = [
    '_'.join(f.parent.parts[-2:])
    for f in all_files
]

train_files, test_files = train_test_split(
    all_files,
    test_size=.2,
    shuffle=True,
    stratify=file_labels
)


In [5]:
train_dataset = BearingDataset(
    train_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    chunk_length=48000,
    transform=FeatureExtractionTransform()
)

test_dataset = BearingDataset(
    test_files,
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DE,
    chunk_length=48000,
    transform=FeatureExtractionTransform()
)


In [6]:
print("Processing training data...")
train_X, train_y = pd.DataFrame(), pd.Series()

for i in range(len(train_dataset)):
    print(i, "of", len(train_dataset))
    X_chunk, labels = train_dataset[i]

    train_X = pd.concat([train_X, X_chunk], ignore_index=True)
    train_y = pd.concat([train_y, pd.Series(labels)], ignore_index=True)

print("Training data shape:", train_X.shape)

test_X, test_y = pd.DataFrame(), pd.Series()

for i in range(len(test_dataset)):
    X_chunk, labels = test_dataset[i]

    test_X = pd.concat([test_X, X_chunk], ignore_index=True)
    test_y = pd.concat([test_y, pd.Series(labels)], ignore_index=True)

print("Test data shape:", test_X.shape)


Processing training data...
0 of 89
1 of 89
2 of 89
3 of 89
4 of 89
5 of 89
6 of 89
7 of 89
8 of 89
9 of 89
10 of 89
11 of 89
12 of 89
13 of 89
14 of 89
15 of 89
16 of 89
17 of 89
18 of 89
19 of 89
20 of 89
21 of 89
22 of 89
23 of 89
24 of 89
25 of 89
26 of 89
27 of 89
28 of 89
29 of 89
30 of 89
31 of 89
32 of 89
33 of 89
34 of 89
35 of 89
36 of 89
37 of 89
38 of 89
39 of 89
40 of 89
41 of 89
42 of 89
43 of 89
44 of 89
45 of 89
46 of 89
47 of 89
48 of 89
49 of 89
50 of 89
51 of 89
52 of 89
53 of 89
54 of 89
55 of 89
56 of 89
57 of 89
58 of 89
59 of 89
60 of 89
61 of 89
62 of 89
63 of 89
64 of 89
65 of 89
66 of 89
67 of 89
68 of 89
69 of 89
70 of 89
71 of 89
72 of 89
73 of 89
74 of 89
75 of 89
76 of 89
77 of 89
78 of 89
79 of 89
80 of 89
81 of 89
82 of 89
83 of 89
84 of 89
85 of 89
86 of 89
87 of 89
88 of 89
Training data shape: (387, 10)
Test data shape: (103, 10)


In [7]:
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier = rfc.fit(train_X, train_y)

In [9]:
from sklearn.metrics import classification_report, confusion_matrix

test_predictions = rf_classifier.predict(test_X)

print("Classification Report:\n", classification_report(test_y, test_predictions))
print("Confusion Matrix:\n", confusion_matrix(test_y, test_predictions))

Classification Report:
               precision    recall  f1-score   support

       B_007       0.87      0.87      0.87        15
       B_014       0.83      0.83      0.83         6
       B_021       0.31      0.83      0.45         6
      IR_007       0.80      1.00      0.89        12
      IR_014       0.67      0.17      0.27        12
      IR_021       1.00      1.00      1.00        15
      OR_007       0.85      0.92      0.88        12
      OR_014       0.40      0.33      0.36         6
      OR_021       1.00      0.71      0.83        14
  normal_48k       1.00      1.00      1.00         5

    accuracy                           0.78       103
   macro avg       0.77      0.77      0.74       103
weighted avg       0.82      0.78      0.77       103

Confusion Matrix:
 [[13  0  0  0  0  0  0  2  0  0]
 [ 0  5  0  0  1  0  0  0  0  0]
 [ 0  1  5  0  0  0  0  0  0  0]
 [ 0  0  0 12  0  0  0  0  0  0]
 [ 0  0  9  1  2  0  0  0  0  0]
 [ 0  0  0  0  0 15  0  0  0  0]


array([0.09819227, 0.07980937, 0.08927998, 0.09511949, 0.04562255,
       0.15653409, 0.06582102, 0.07662608, 0.05438362, 0.1500449 ,
       0.08856665])

### Distance-Based Methods

Techniques to be implemented:
- Dynamic Time Warping (DTW) with KNN
- Euclidean Distance KNN
- Edit Distance on Real Sequences (EDR)

In [None]:
# Distance-based methods will be implemented here

### Dictionary-Based Methods

Techniques to be implemented:
- BOSS (Bag of SFA Symbols)
- cBOSS (Contractable BOSS)
- WEASEL

In [None]:
# Dictionary-based methods will be implemented here

### Shapelet-Based Methods

Techniques to be implemented:
- Shapelet Transform
- Learning Shapelets

In [None]:
# Shapelet-based methods will be implemented here

### Deep Learning Methods

Techniques to be implemented:
- Convolutional Neural Networks (CNN)
- ResNet for Time Series
- InceptionTime
- LSTM/GRU Classifiers

In [None]:
# Deep learning methods will be implemented here

### Ensemble Methods

Techniques to be implemented:
- HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)
- TS-CHIEF

In [None]:
# Ensemble methods will be implemented here

## Model Comparison

Compare the performance of different classification techniques.

In [None]:
# Model comparison will be implemented here

## Conclusion

Summary of findings and best practices for time series classification.