Skip to content

Armanfard-Lab/AADSCL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AADSCL: Self-Supervised Acoustic Anomaly Detection via Contrastive Learning

Paper License: CC BY-NC-SA 3.0 Python 3.8+

PyTorch implementation of "Self-Supervised Acoustic Anomaly Detection via Contrastive Learning" (ICASSP 2022).

AADSCL framework

Abstract

We propose an acoustic anomaly detection algorithm based on the framework of contrastive learning. Contrastive learning is a recently proposed self-supervised approach that has shown promising results in image classification and speech recognition. However, its application in anomaly detection is underexplored. Earlier studies have demonstrated that it can achieve state-of-the-art performance in image anomaly detection, but its capability in anomalous sound detection is yet to be investigated. For the first time, we propose a contrastive learning-based framework that is suitable for acoustic anomaly detection. Since most existing contrastive learning approaches are targeted toward images, the effect of other data transformations on the performance of the algorithm is unknown. Our framework learns a representation from unlabeled data by applying audio-specific data augmentations. We show that in the resulting latent space, normal and abnormal points are distinguishable. Experiments conducted on the MIMII dataset confirm that our approach can outperform competing methods in detecting anomalies.

Installation

git clone https://github.com/hhojjati/AADSCL.git
cd AADSCL
pip install -e .

This installs all dependencies (PyTorch, torchaudio, librosa, etc.) and registers the aadscl command.

Dataset Setup

This project uses the MIMII dataset.

  1. Download the dataset from the link above.
  2. Place .wav files under data/ following this structure:
data/
└── Pump/                    # or Fan, Valve, Slider
    ├── normal/
    │   ├── 00000000.wav
    │   └── ...
    └── abnormal/
        ├── 00000000.wav
        └── ...

For the full dataset with multiple machine IDs, use: data/<machine>/id_<XX>/normal/. Both layouts are auto-detected. See data/README.md for details.

Quick Start

Train and Evaluate

aadscl --machine Pump --epochs 400

Or equivalently:

python -m aadscl --machine Pump --epochs 400

Load a Pretrained Model and Evaluate

aadscl --machine Pump --pretrain --save_path state_dict_model.pt

All Options

Argument Default Description
--machine Pump Machine type: Pump, Fan, Valve, Slider
--id 0 Machine ID (0, 2, 4, 6)
--epochs 400 Number of training epochs
--pretrain False Load pretrained model instead of training
--save_path state_dict_model.pt Path to save/load model weights
--verbosity 1 0 = silent, 1 = print epoch loss
--num_runs 5 Number of evaluation runs to average

Project Structure

├── aadscl/                      # Main package
│   ├── main.py                  # Entry point: train + evaluate pipeline
│   ├── train_test.py            # Data splitting into train/test loaders
│   ├── trainer.py               # Training loop (contrastive + classification loss)
│   ├── test.py                  # Evaluation with Mahalanobis distance & AUC
│   ├── utils.py                 # Transforms, augmentation pipeline, anomaly scoring
│   ├── data_loader.py           # MIMII dataset loader
│   ├── networks/
│   │   ├── resnet18.py          # ResNet-18 encoder (1-channel input for spectrograms)
│   │   ├── projection_head.py   # MLP projection head (512 → 256 → 128)
│   │   └── linear_classifier.py # Linear classifier for transform prediction (8 classes)
│   └── transforms/
│       ├── mel_spec.py          # Mel spectrogram extraction (128 mel bins, 2048 FFT)
│       ├── awgn.py              # Additive White Gaussian Noise
│       ├── fade.py              # Fade in/out
│       ├── freq_mask.py         # Frequency masking
│       ├── time_mask.py         # Time masking
│       ├── time_shift.py        # Time shifting
│       ├── time_stretch.py      # Time stretching
│       └── pitch_shift.py       # Pitch shifting
├── data/                        # Dataset directory (see Dataset Setup)
├── pyproject.toml               # Package configuration & dependencies
├── requirements.txt             # Dependency list (also in pyproject.toml)
├── Makefile                     # Convenience commands
└── LICENSE

Method

  1. Augmentation: Each audio sample is augmented twice using random audio transforms (noise injection, pitch shift, time stretch, fade, masking, time shift, identity)
  2. Encoding: Augmented Mel spectrograms are fed through a ResNet-18 encoder
  3. Contrastive Loss: NT-Xent loss (τ=0.07) pulls together representations of two augmented views from the same sample
  4. Auxiliary Task: A linear classifier predicts the applied transform type (cross-entropy, weighted by λ=0.1)
  5. Anomaly Scoring: At inference, Mahalanobis distance from the normal training distribution serves as the anomaly score

Citation

Paper on IEEE Xplore.

@inproceedings{hojjati2022selfsupervised,
  author={Hojjati, Hadi and Armanfard, Narges},
  booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Self-Supervised Acoustic Anomaly Detection Via Contrastive Learning},
  year={2022},
  pages={3253-3257},
  doi={10.1109/ICASSP43922.2022.9746207}
}

License

See LICENSE for details.

About

[ICASSP 2022] Self-Supervised Acoustic Anomaly Detection Via Contrastive Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors