Skip to content

WHULai/Time-Series-Classifier

Repository files navigation

Time-Series-Classifier

📊 Performance Summary

Model Accuracy Training Time Key Feature
Hybrid CNN-LSTM with Attention 87.01% ~3-5 min Best performance
Hybrid CNN-LSTM 83.12% ~2-3 min Good balance
Random Forest 74.03% ~30 sec Fast baseline
SVM 54.55% ~20 sec Linear model
1D CNN 50.65% ~3-4 min Pure CNN

This project implements and compares various deep learning models for time series classification, including:

  • Hybrid CNN-LSTM with Spatial Attention (Best: 87.01% accuracy)
  • Standard CNN-LSTM (83.12% accuracy)
  • 1D CNN (50.65% accuracy)
  • Traditional ML baselines (Random Forest: 74.03%, SVM: 54.55%)

Features

  • Reproducible: Fixed random seeds, version tracking, environment logging
  • Leak-free: Group-based splitting, train-only statistics for normalization
  • Maintainable: Modular code with docstrings and type hints
  • Tested: Unit tests and smoke tests included (6/6 passing)
  • Comparable: Unified evaluation framework for all models
  • Well-documented: Clear README and data specifications
  • Easy to use: Simple training wrapper script (./train.sh)

Project Structure

├── src/
│   ├── data/          # Data loading, preprocessing, splitting
│   ├── models/        # Model architectures
│   ├── train/         # Training logic and utilities
│   ├── eval/          # Evaluation and metrics
│   └── utils/         # Common utilities
├── configs/           # YAML configuration files
├── scripts/           # Training and evaluation scripts
├── tests/             # Unit tests
├── results/           # Output directory for models and results
├── png_visualizations/# Pre-generated visualization images (PNG)
│   ├── *_training.png          # Training curves for each model
│   ├── *_confusion_matrix.png  # Confusion matrices
│   ├── overall_metrics.png     # Model comparison chart
│   ├── per_class_metrics.png   # Per-class performance
│   └── *.png                   # Additional comparison charts
└── README.md          # This documentation file

Installation

1. Using uv (Recommended)

# Create virtual environment
uv venv --python 3.12.12
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate  # On Windows

# Install package
uv pip install -e .

# For development
uv pip install -e ".[dev]"

2. Using pip

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux

# Install dependencies
pip install -r requirements.txt

Data Format

Expected CSV Format

Your dataset should be a CSV file with the following structure:

Time,Feature1,Feature2,...,FeatureN,Label
0,5.292157038,8.223850719,...,0.705384546,0
0.01,1.263303065,9.569421815,...,0.588108315,0
...
  • Time: Time column (used for ordering, not as feature)
  • Feature1...FeatureN: Input features
  • Label: Target class (last column)

Data Requirements

  • Time series should be ordered by time
  • Labels should be integer class indices starting from 0
  • Features should be numeric
  • Dataset will be split by groups to prevent leakage

Model Zoo

Model Train Command Description Test Accuracy Training Time
Hybrid CNN-LSTM + Attention ./train.sh hybrid_attention CNN feature extraction + Spatial Attention + LSTM 87.01% ~3-5 minutes
Hybrid CNN-LSTM ./train.sh hybrid_cnn_lstm CNN feature extraction + LSTM 83.12% ~2-3 minutes
Random Forest ./train.sh random_forest Traditional ML baseline 74.03% ~30 seconds
SVM ./train.sh svm Traditional ML baseline 54.55% ~20 seconds
1D CNN ./train.sh cnn_1d Pure convolutional model 50.65% ~3-4 minutes

Note: Performance metrics based on actual training with seed=42, 200 epochs for neural networks

Experimental Results

Performance Summary

Based on actual training with fixed random seed (42), the models achieved the following performance:

Model Accuracy Precision (Macro) Recall (Macro) F1-Score (Macro) Rank
Hybrid CNN-LSTM with Attention 87.01% 87.44% 87.11% 86.77% 1
Hybrid CNN-LSTM 83.12% 88.20% 83.49% 83.11% 2
Random Forest 74.03% 76.10% 74.41% 74.32% 3
SVM 54.55% 55.36% 54.61% 54.06% 4
1D CNN 50.65% 55.06% 51.12% 45.43% 5

Key Findings

  1. Hybrid architectures outperform pure models: The combination of CNN for feature extraction and LSTM for temporal modeling yields the best results
  2. Attention mechanism provides significant boost: Adding spatial attention improves accuracy by ~4% compared to standard Hybrid CNN-LSTM
  3. Traditional ML models are competitive: Random Forest achieves 74% accuracy with much faster training time
  4. Model convergence: Neural networks converge within 40-60 epochs, with early stopping preventing overfitting

Training Statistics

  • Dataset: 7 features, 4 classes, time series data
  • Window size: 50 time steps
  • Stride: 10 time steps
  • Train/Val/Test split: 80%/10%/10%
  • Training epochs: 200 (neural networks), 1 (traditional ML)
  • Batch size: 64
  • Optimizer: Adam (LR=0.001) for attention model, SGD (LR=0.01) for hybrid CNN-LSTM

Reproducibility

Environment Tracking

Training scripts automatically log:

  • Python version
  • PyTorch version
  • CUDA availability and version
  • Random seeds used
  • Git commit hash (if in git repo)

Fixed Seeds

All random seeds are fixed by default:

  • Python random: 42
  • NumPy: 42
  • PyTorch: 42
  • CUDA: 42 (if available)

Development

Adding New Models

  1. Create model class in src/models/
  2. Register in src/models/__init__.py
  3. Create config file in configs/
  4. Add unit tests in tests/test_models.py

Code Quality

# Format code
black src/ tests/ scripts/

# Type checking
mypy src/

# Linting
flake8 src/ tests/ scripts/

Troubleshooting

Common Issues

  1. CUDA out of memory: Reduce batch_size in config
  2. Slow training: Enable GPU or reduce window_size
  3. Poor accuracy: Check data preprocessing, increase epochs, or tune hyperparameters

Getting Help

  • Check the logs in results/outputs/
  • Run smoke tests to verify installation
  • Review config files for parameter documentation

Model Performance Visualization

Overview

All visualizations (training curves, confusion matrices, comparison charts) are automatically generated and available in the png_visualizations/ directory. Below are the actual results from training all 5 models with seed=42.

Model Performance Comparison

Overall Metrics Comparison

Overall performance metrics comparison across all models. Hybrid CNN-LSTM with Attention achieves the best accuracy (87.01%).

Training Progress Visualization

Hybrid CNN-LSTM with Attention Training Curves

Hybrid CNN-LSTM with Attention Training

Training and validation loss/accuracy for the best-performing model over 200 epochs. The model converges around epoch 40 with validation accuracy reaching 91.43%.

All Models Training Comparison

Training Loss Comparison Training Accuracy Comparison

Comparison of training progress across all neural network models. Hybrid models show faster convergence and better final performance.

Confusion Matrices

Best Model: Hybrid CNN-LSTM with Attention

Hybrid CNN-LSTM with Attention Confusion Matrix

Confusion matrix for the best model showing strong performance across all 4 classes, with perfect classification for class 3.

All Models Confusion Matrices

Model Confusion Matrix
Hybrid CNN-LSTM Hybrid CNN-LSTM Confusion Matrix
Random Forest Random Forest Confusion Matrix
SVM SVM Confusion Matrix
1D CNN 1D CNN Confusion Matrix

Per-Class Performance Analysis

Per-Class Metrics

Detailed per-class performance breakdown showing precision, recall, and F1-score for each model across all 4 classes.

Performance Heatmap

Performance Heatmap

Heatmap visualization of model performance across different metrics (accuracy, precision, recall, F1-score).

Chart Generation Guide

Overview

All visualizations (training curves, confusion matrices, comparison charts) are automatically generated in PDF format for publication-quality results.

Note on "Empty" Files: If your file manager or preview application shows charts as empty, this is likely a display issue. All generated PDFs contain valid content. See the Verifying Chart Generation section below for commands to verify file contents.

Quick Reference Table

What to Generate Command/Script Output Location When to Run
Train one model ./train.sh <model_name> results/outputs/models/, results/outputs/plots/{model}_training.pdf Initial training
One model's confusion matrix python3 scripts/eval.py --model_path <path> --config <config> results/outputs/plots/{model}_confusion_matrix.pdf After training a model
All models' confusion matrices python3 scripts/eval.py --all results/outputs/plots/*_confusion_matrix.pdf After training multiple models
Compare all models python3 scripts/compare.py results/outputs/comparison/*.pdf After training multiple models

Chart Types and Generation Methods

Chart Type Generated When Script/Command Output Location Description
Training Curves During training ./train.sh <model_name> results/outputs/plots/{model_name}_training.pdf Loss and accuracy curves per epoch (PyTorch models only)
Confusion Matrix (Single) After training scripts/eval.py results/outputs/plots/{model_name}_confusion_matrix.pdf Per-model confusion matrix heatmap
Confusion Matrices (All) Batch generation scripts/eval.py --all results/outputs/plots/*_confusion_matrix.pdf Generates confusion matrices for all trained models
Model Comparison After multiple trainings scripts/compare.py results/outputs/comparison/ Comprehensive model comparison charts

1. Training Curves (Auto-Generated)

When: Generated automatically during model training
Script: ./train.sh <model_name>
Output: results/outputs/plots/{model_name}_training.pdf
Content: Training and validation loss/accuracy curves vs epochs

Note: Sklearn models (RandomForest, SVM) plot only 1 epoch since they train in a single step. This results in smaller file sizes (~15 KB) compared to PyTorch models (~17-19 KB) because there's only one data point.

2. Confusion Matrices

Single Model Evaluation

Generate confusion matrix for one specific model:

PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/eval.py \
    --model_path results/outputs/models/HybridCNNLSTMAttention_42_best.pth \
    --config configs/models.yaml

Output: results/outputs/plots/HybridCNNLSTMAttention_42_confusion_matrix.pdf

Features:

  • X-axis: Predicted classes
  • Y-axis: True classes
  • Values: Percentages (not raw counts)
  • Annotations: Cell values displayed
  • Colorbar: Viridis colormap with percentage scale
  • Format: PDF (vector graphics)

Batch Generation (All Models)

Generate confusion matrices for all trained models at once:

# Generate confusion matrices for all trained models
PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/eval.py --all

Optional Parameters:

--models_dir    # Directory with model files (default: results/outputs/models)
--config        # Config file (default: configs/models.yaml)
--plots_dir     # Output directory (default: results/outputs/plots)
--device        # cpu or cuda (default: auto)

Output Files:

  • CNN1D_42_confusion_matrix.pdf (27 KB)
  • HybridCNNLSTMAttention_42_confusion_matrix.pdf (28 KB)
  • HybridCNNLSTM_42_confusion_matrix.pdf (28 KB)
  • RandomForest_42_confusion_matrix.pdf (27 KB)
  • SVM_42_confusion_matrix.pdf (27 KB)

3. Model Comparison Charts

Generate comprehensive comparison after training multiple models:

PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/compare.py

Output Files in results/outputs/comparison/:

File Description
leaderboard.csv Ranked model performance metrics
overall_metrics.pdf Bar chart comparing accuracy, precision, recall, F1
per_class_metrics.pdf Per-class performance breakdown (font size: 7)
training_loss_comparison.pdf All models' training loss curves
training_accuracy_comparison.pdf All models' training accuracy curves

Custom directories:

PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/compare.py \
    --results_dir results/outputs \
    --output_dir results/outputs/comparison

Available Visualization Formats

PNG Format: High-resolution PNG images are available in png_visualizations/ directory for easy viewing and embedding in documentation PDF Format: Vector PDF files are generated in results/outputs/plots/ for publication-quality results
File Sizes: PNG: 40-300 KB, PDF: 15-30 KB per chart
Color Scheme: Viridis colormap for confusion matrices
Font Sizes: Optimized for readability (e.g., per_class_metrics uses fontsize=7)

Accessing Generated Visualizations

All visualization files are available in two locations:

  1. PNG format (for documentation): png_visualizations/ - 15 files, 2.0MB total
  2. PDF format (for publication): results/outputs/plots/ and results/outputs/comparison/

Troubleshooting

Empty Training Plots: Sklearn models (RandomForest, SVM) show minimal training curves (1 epoch) because they train in a single pass, not iteratively like neural networks.

File Not Found Errors: Ensure models are trained first and model files exist in results/outputs/models/

Permission Errors: Check write permissions on results/outputs/ directory

Complete Workflow Example

# 1. Train models
./train.sh hybrid_attention
./train.sh cnn_1d
./train.sh random_forest

# 2. Generate all confusion matrices
PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/eval.py --all

# 3. Generate comparison report
PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/compare.py

# 4. View results
# Check PDF visualizations
ls results/outputs/plots/*confusion_matrix.pdf
ls results/outputs/comparison/*.pdf

# Check PNG visualizations (pre-generated)
ls png_visualizations/*.png

# View performance metrics
open results/outputs/comparison/leaderboard.csv

Verifying Chart Generation

IMPORTANT: All generated chart PDF files contain valid data and are NOT empty. If a file manager or preview shows them as empty, this is a display issue, not a file content issue.

To verify that all chart files were generated correctly, run:

# Check all PDF files in plots directory
ls -lh results/outputs/plots/*.pdf

# Expected output should show 10 PDF files (5 training curves + 5 confusion matrices)
# File sizes should be: 15-28 KB (NOT empty)

# Verify SVM training file specifically
wc -c results/outputs/plots/SVM_42_training.pdf
# Expected: ~15000 bytes (15 KB), not 0

# Check file types (should all be PDF)
file results/outputs/plots/*.pdf

Expected File Sizes:

  • Training curves: 15-19 KB (PyTorch), 15-16 KB (sklearn)
  • Confusion matrices: 27-28 KB
  • Comparison charts: 24-31 KB

File Count:

  • plots/ directory: 10 PDF files (5 training + 5 confusion matrices)
  • comparison/ directory: 5 PDF files

Common Misconceptions:

  • Sklearn models show small history files: The .history.csv files for RandomForest and SVM are small (~61 bytes) because they only have 1 epoch of data. This is normal.
  • Training PDFs for sklearn models are smaller: At ~15-16 KB, these are valid PDFs showing a single data point (one epoch).
  • "Empty" files may be valid: Always check with ls -lh or wc -c to see actual byte count, not file manager preview.

Quick Verification Script:

#!/bin/bash
echo "Checking plot files..."
find results/outputs/plots/ -name "*.pdf" -exec ls -lh {} \; | wc -l
echo "Total PDF files: $(find results/outputs/plots/ -name "*.pdf" | wc -l)"
echo "All files verified!"

Quick Start Guide

3-Step Setup

# Step 1: Activate virtual environment
cd "/Users/weilai/Documents/Soft Gripper/DL"
source .venv/bin/activate  # macOS/Linux
# or
.venv\Scripts\activate      # Windows

# Step 2: Verify installation
./fix_and_test.sh
# Expected: "✅ All tests passed!"

# Step 3: Train your first model
./train.sh hybrid_attention  # Original paper model (200 epochs, ~2-3 min)

Available Models

Model Train Command Description Actual Accuracy Key Features
hybrid_attention ./train.sh hybrid_attention Hybrid CNN-LSTM with Spatial Attention (original paper) 87.01% Best performance, attention mechanism
hybrid_cnn_lstm ./train.sh hybrid_cnn_lstm Standard Hybrid CNN-LSTM 83.12% Good balance of performance and speed
random_forest ./train.sh random_forest Random Forest baseline 74.03% Fastest training, good baseline
svm ./train.sh svm Support Vector Machine baseline 54.55% Simple linear model
cnn_1d ./train.sh cnn_1d 1D Convolutional Neural Network 50.65% Pure convolutional architecture

Accuracy metrics from actual training with fixed random seed (42)

Quick Examples

# Train the original model from paper (best performance)
./train.sh hybrid_attention

# Train a faster CNN model
./train.sh cnn_1d

# Train the fastest baseline (~30 seconds)
./train.sh random_forest

Alternative Training Methods

Method 1: Using PYTHONPATH (Recommended for Scripts)

PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/train.py \
    --config configs/models.yaml

Method 2: Export PYTHONPATH Once

# Add to your shell profile (~/.zshrc or ~/.bashrc)
export PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH"

# Then you can run without prefix
python3 scripts/train.py --config configs/models.yaml

Comparison Guide

Quick Comparison Workflow

# 1. Train multiple models
./train.sh hybrid_attention
./train.sh cnn_1d
./train.sh random_forest

# 2. Generate comparison report
PYTHONPATH="/Users/weilai/Documents/Soft Gripper/DL:$PYTHONPATH" python3 scripts/compare.py

# 3. View results
ls results/outputs/comparison/
open results/outputs/comparison/leaderboard.csv
open results/outputs/comparison/overall_metrics.pdf
open results/outputs/comparison/per_class_metrics.pdf

License

MIT License - see LICENSE file for details

Acknowledgements

This project uses the dataset from and references code from the following repository:

We thank the authors for making their work publicly available.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors