# Time Series Classification

This notebook explores various time series classification techniques. It makes much fuller use of the bearings dataset also explored in the signal analysis notebook.

## Overview

Time series classification involves assigning time series instances to predefined categories. This notebook will cover:

- **Feature-Based Methods**: Statistical features, tsfresh, Catch22
- **Distance-Based Methods**: DTW-KNN, Euclidean Distance
- **Dictionary-Based Methods**: BOSS, cBOSS
- **Shapelet-Based Methods**: Shapelet Transform
- **Deep Learning Methods**: CNN, ResNet, InceptionTime
- **Ensemble Methods**: HIVE-COTE, TS-CHIEF

In [1]:
import pandas as pd
import numpy as np
import os
import pathlib
import matplotlib.pyplot as plt

## Load Data

In [None]:
from torch.utils.data import Dataset
import scipy.io
import enum

# samplng rate enum
class SamplingRate(enum.Enum):
    sr12K = "12k"
    sr48K = "48k"

class FaultLocation(enum.Enum):
    DRIVE_END = "drive_end_fault"
    FAN_END = "front_end_fault"


class BearingDataset(Dataset):
    def __init__(self, data_dir: pathlib.Path, sampling_rate: SamplingRate, fault_location: FaultLocation):
        self.fault_dir = data_dir / fault_location.value / sampling_rate.value
        self.normal_dir = data_dir / "normal" / sampling_rate.value
        self.file_list = list(self.fault_dir.rglob("*.mat")) + list(self.normal_dir.rglob("*.mat"))
    
    def __len__(self):
        # Length is all mat files in data_dir recursively
        return len(self.file_list)
    
    def __getitem__(self, idx):
        file_path = self.file_list[idx]
        # Load the .mat file
        
        mat_data = scipy.io.loadmat(file_path)
        # Assuming the time series data is stored under the key 'X'

        label = file_path.parent.name

        return mat_data, label
    
    def _find_signal_data_key(signal_dict, file_id, sensor_location='DE'):
        """
        Finds the key for signal data in a dictionary,
        prioritizing '_<sensor_location>_time' and falling back to '_time'.

        Args:
            signal_dict (dict): The dictionary containing signal data (from scipy.io.loadmat).
            file_id (str): The file ID to search for in the key.
            sensor_location (str): The sensor location ('DE' or 'FE'). Defaults to 'DE'.

        Returns:
            str: The key for the signal data.

        Raises:
            KeyError: If no matching key is found.
        """

        # Find keys ending with '_<sensor_location>_time' and containing file_id
        matching_keys_id_time = [
            key for key in signal_dict.keys()
            if key.endswith(f"_{sensor_location}_time") and file_id in key
        ]

        if matching_keys_id_time:
            return matching_keys_id_time[0]
        else:  # If no '_<sensor_location>_time' key is found, look for '_time'
            matching_keys_time = [
                key for key in signal_dict.keys()
                if key.endswith(f"_{sensor_location}_time")
            ]
            if matching_keys_time:
                return matching_keys_time[0]
            else:
                display(signal_dict.keys())
                raise KeyError(f"No key ending with '_{sensor_location}_time' or '_time' and containing '{file_id}' found in the .mat file.")


bearing_dataset = BearingDataset(
    data_dir=pathlib.Path("../data/classification/cwru-bearing-full-organized"),
    sampling_rate=SamplingRate.sr48K,
    fault_location=FaultLocation.DRIVE_END
)

bearing_dataset.__getitem__(10)



({'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Fri Jan 28 14:10:08 2000',
  '__version__': '1.0',
  '__globals__': [],
  'X227_DE_time': array([[-0.06195877],
         [-0.03755077],
         [-0.01794092],
         ...,
         [ 0.21174462],
         [ 0.22926831],
         [ 0.21925477]], shape=(486804, 1)),
  'X227_FE_time': array([[ 0.12471091],
         [ 0.11669818],
         [ 0.08115455],
         ...,
         [ 0.02629818],
         [-0.04704909],
         [-0.06060909]], shape=(486804, 1)),
  'X227RPM': array([[1774]], dtype=uint16)},
 '021')

In [32]:
samples

97         normal
98         normal
99         normal
100        normal
105    12k_IR_007
          ...    
204    48k_OR_014
238    48k_OR_021
239    48k_OR_021
240    48k_OR_021
241    48k_OR_021
Length: 76, dtype: object

## Classification Techniques

### Feature-Based Methods

Techniques to be implemented:
- Statistical Feature Extraction
- tsfresh (Time Series FeatuRe Extraction)
- Catch22 Features

In [4]:
# Feature-based methods will be implemented here

### Distance-Based Methods

Techniques to be implemented:
- Dynamic Time Warping (DTW) with KNN
- Euclidean Distance KNN
- Edit Distance on Real Sequences (EDR)

In [5]:
# Distance-based methods will be implemented here

### Dictionary-Based Methods

Techniques to be implemented:
- BOSS (Bag of SFA Symbols)
- cBOSS (Contractable BOSS)
- WEASEL

In [6]:
# Dictionary-based methods will be implemented here

### Shapelet-Based Methods

Techniques to be implemented:
- Shapelet Transform
- Learning Shapelets

In [7]:
# Shapelet-based methods will be implemented here

### Deep Learning Methods

Techniques to be implemented:
- Convolutional Neural Networks (CNN)
- ResNet for Time Series
- InceptionTime
- LSTM/GRU Classifiers

In [8]:
# Deep learning methods will be implemented here

### Ensemble Methods

Techniques to be implemented:
- HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)
- TS-CHIEF

In [9]:
# Ensemble methods will be implemented here

## Model Comparison

Compare the performance of different classification techniques.

In [10]:
# Model comparison will be implemented here

## Conclusion

Summary of findings and best practices for time series classification.