Skip to content

ArmanBehkish/MDTAIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⭐ MDTAIM: Multi-Dimensional Time-Series Anomaly Detection and Itemset Mining

Python Version Matrix Profile SPMF GitHub stars GitHub forks GitHub issues GitHub license

TABLE OF CONTENTS

πŸ” About

This repository is an implementation of a research done to detect multi-dimensional time-series anomalies using a novel method. It builds on top of existing anomaly detection scoring funcitons (Primarily matrix profile) and leveragtes the output as transactions to mine frequent itemsets. We try to investigate the possibility of using current efficient frequent itemset mining algorithms as a fast way to detect multi-dimensional anomalies.

πŸ”Œ Architecture

This is the High level Architecture of MDTAIM:

block-beta
  columns 4
  Data_Preprocessing
  blockArrowId1<["Plots"]>(right) space space  
  blockArrowId2<["Data + Labels"]>(down)   space space space  
  Anomaly_Scoring
  blockArrowId3<["Plots"]>(right) space space 
  blockArrowId4<["Matrix Profile Scores"]>(down) space space space 
  KDP
  blockArrowId5<["Plots"]>(right) space space 
  Anomaly_to_Transactions space space space 
  blockArrowId7<["Transaction Database"]>(down) space space space 
  Itemset_Mining space space space 
  blockArrowId8<["Frequent/High Utility Itemsets"]>(down) space space space 
  Postprocessing
  blockArrowId9<["Plots"]>(right) space space  
  blockArrowId10<["MDTAIM Outputs"]>(down) space space space 

  classDef rightarrow fill:#8cb369,stroke:#28112b;
  classDef mainblocks fill:#f2e9e4,stroke:#28112b;
  classDef downarrow fill:#81c3d7,stroke:#28112b;
  class blockArrowId1,blockArrowId5,blockArrowId3,blockArrowId9 rightarrow
  class Data_Preprocessing,Anomaly_Scoring,KDP,Anomaly_to_Transactions,Itemset_Mining,Postprocessing  mainblocks
  class blockArrowId2,blockArrowId4,blockArrowId7,blockArrowId8,blockArrowId10 downarrow
Loading

πŸ› οΈ Installation

# make sure you have poetry installed
poetry --version

# Clone the repository
git clone https://github.com/armanbehkish/mdtaim.git

# Navigate to the directory
cd mdtaim

# install dependencies
poetry install

# activate the virtual environment
poetry shell

# make sure to double check the configuration file in config/config_<dataset>.yaml

# run the code
python main.py

# check the ouputs in /data/output/final
# check the plots in /plots

πŸ“Š Dataset

Dataset location: data/input/raw/<dataset_name>

  • Create the path if doesn't exist
  • Put the CSV files of the dataset and the ground truth in the data/input/raw/ directory under the dataset title/name which is also set in dataset_title: toy field in the configuration file!
  • Also set the names of the files and the ground truth type in the configuration file like this:
data:
  dataset_title: toy
  dataset_file_name: toy_data.csv
  dataset_gt_file_name: toy_data_GT.csv
  ground_truth_type: range

πŸ“Š Output

MDTAIM output includes the dimension numbers and location of N-dimensional anomalies (also the importance score if the algorithms outputs utility). MDTAIM Output location: data/output/final/...

Raw SPMF algorithm outputs are saved in : SPMF AlgorithmsOutput location: data/output/spmf/...

The processed directory includes the intermediate files that can be used to debug. Processed data location: data/processed/...

Create the paths if doesn't exist!

CLI Output Example

CLI output of the program on the pvsystem dataset:

2025-12-12 15:12:00,999 - mdtaim.utility - load_config - INFO - Config loaded successfully!
2025-12-12 15:12:01,031 - mdtaim.utility - load_data - INFO - dataset pvsystem_data.csv loaded successfully!
2025-12-12 15:12:01,069 - mdtaim.utility - load_data - INFO - ground truth pvsystem_GT.csv loaded successfully!
2025-12-12 15:12:02,838 - mdtaim.utility - load_data - INFO - anomaly rate of padded labels: 0.048494882600842865
MP calculation:   0%|          | 0/22 [00:00<?, ?it/s]MP calculation: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 22/22 [00:00<00:00, 304.60it/s]
2025-12-12 15:12:02,949 - mdtaim.utility - calculate_score - INFO - Baseline correction with quantile: 0.65 done!
2025-12-12 15:12:02,949 - mdtaim.utility - cal_anomaly_score - INFO - MP calculation done! total time to calculate Matrix Profile: 0.085 seconds
2025-12-12 15:12:09,316 - mdtaim.utility - plot_single_ts - INFO - Single time series (dimension 20) plotted successfully.
2025-12-12 15:12:10,060 - mdtaim.utility - cal_kdp - INFO -  KDP calculation done! total time to calculate KDP: 2030.09 Β΅s
2025-12-12 15:12:12,830 - mdtaim.utility - convert_anomalies_to_transactions - INFO - Converting anomalies to transactions using beta1 method!
2025-12-12 15:12:12,837 - mdtaim.utility - convert_beta_1 - INFO - Time taken: 6173.55 Β΅s
2025-12-12 15:12:12,837 - mdtaim.utility - convert_anomalies_to_transactions - INFO - time for conversion: 7117.59 Β΅s
2025-12-12 15:12:12,838 - mdtaim.utility - build_transactions - INFO - 3 transactions were created successfully!
2025-12-12 15:12:12,839 - mdtaim.utility - save_transactions_to_file - INFO - current transactions saved to pickle successfully!
2025-12-12 15:12:12,841 - mdtaim.utility - cal_anomaly_detec_accuracy - INFO - number of matches: number of matches: 9 out of 9 detected anomalies for 9 labels
2025-12-12 15:12:12,841 - mdtaim.utility - cal_anomaly_detec_accuracy - INFO - Precision: 1.0, Recall: 1.0, F1 Score: 1.0
2025-12-12 15:12:13,561 - mdtaim.utility - load_transactions_from_file - INFO - transaction DB loaded successfully!
2025-12-12 15:12:13,563 - mdtaim.utility - prepare_transaction_database - INFO - Transaction DB written to file successfully!
2025-12-12 15:12:14,049 - mdtaim.utility - check_java_version - INFO - Found Java version: 23, going on...
2025-12-12 15:12:14,061 - mdtaim.utility - run_algorithm - INFO - Running SPMF: NegFIN algorithm with the command java -jar ./lib/spmfe.jar run NegFIN ./data/processed/transaction_databases/context.txt ./data/output/spmf/NegFIN_pvsystem_out.txt 0.5%...
2025-12-12 15:12:14,334 - mdtaim.utility - run_algorithm - INFO - SPMF Output:
>/Volumes/develop/MsThesis/MDTAIM/lib/spmfe.jar
========== negFIN - STATS ============
 Minsup = 1
 Number of transactions: 3
 Number of frequent  itemsets: 64
 Total time ~: 2169 ΞΌs
 Max memory:9.627677917480469 MB
=====================================

2025-12-12 15:12:14,334 - mdtaim.utility - perform_itemset_mining - INFO - MP conversion to transactions took 0.009 seconds and SPMF execution took 0.492 seconds

πŸ“ˆ Plots

Each step of the pipeline outputs the associated plots in the plots/ directory. This is the list of plots (for the toy dataset) to be consulted to better understand each step of the pipeline:

  • The input dataset:

Dataset Visualization

- the matrix profile scores with padded labels:

Matrix Profile Scores

- the kdp (from TSADIS to compare):

KDP

πŸ“ Configuration

Project configuration file: config/config_<dataset>.yaml

  • create the file if doesn't exist (you can use config_toy.yaml as a template)
  • fill in the parameters according to the documentation

GENERAL CONFIGURATION SECTIONS:

Anomaly scoring: settings to calculate anomaly scores

anomalyscoring:
  which: matrixprofile
  matrixprofile: 
    subsequence_length: 10 
    auto_subsequence_length: False 
  iForest:
    num_trees: 100    

itemset mining: settings to convert anomaly scores into transactions

itemset_mining_preparation:
  window_size: 10
  ignore_win_smaller_than: 0.5
  windowing_method: energy 
  enable_threshold_tuning: True 
  train_size: 150
  threshold_tuning_step: 0.2
  custom_threshold: 2  
  compare_to_train_for_detection: True 
  cut_baseline: False 
  quantile: 0.8
  utility_function: max  
  cons_trans_chk_for_merge: 4  

SPMF Settings: settings to configure the choosen SPMF freauent itemset mining algorithm, not all settings are used for all algorithms, refer to documentation for more details!

spmf:
  algorithm: Apriori 
  min_support: 0.5%
  max_support: 1%
  min_support_count: 1 
  max_pattern_length: 3
  min_pattern_length: 1
  show_transaction_ids: False  
  high_utility_itemsets: False
  min_utility: 1
  empty_trans_replacement: 1000
  sort_input_items:
    enable: True 
    ascending: True
  replace_zero:  
    enable: True 
    replace_zero_with: 99
  jar_file: ./lib/spmf.jar

Data Settings: settings to prepare the data, set the dataset name here!

data:
  dataset_title: toy
  dataset_file_name: toy_data.csv
  dataset_gt_file_name: toy_data_GT.csv
  ground_truth_type: range
  dataset_path: ./data/raw/toy/
  spmf_output_path: ./data/output/spmf/
  final_output_path: ./data/output/final/
  processed_data_path: ./data/processed/
  transactions_path: ./data/processed/saved_transactions/
  scores_path: ./data/processed/saved_scores/
  transaction_db_path: ./data/processed/transaction_databases/
  label_pad_size: 10

Log Settings: settings used to configure the logging, Log level and location!

logging:
  console_log_level: INFO
  log_dir: ./logs/
  log_file_prefix: dev

Plot Settings:: configure the plotting output.

plot:
  output_path: ./plots/
  subplot_size: 160
  show_dataset: True
  show_matrixprofile: False
  show_kdp: False
  show_detected_anomalies_vs_gt: False
  show_final_output: True

πŸ“š Libraries

We use MATRIX PROFILE to extract anomaly scores and SPMF for various itemset mining algorithms. Also, some code excerpts were used in the anomaly scoring module from the TSADIS paper implementation.

πŸ‘€ Authorship & Citation

Author: Arman Behkish Affiliation: Politecnico di Torino Supervisor: Prof. Luca Cagliero

This repository contains the implementation of the Master's thesis: "Multivariate Anomaly Detection Using Frequent Itemset Mining" Politecnico di Torino, 2025

Thesis Document: https://webthesis.biblio.polito.it/35359/
Embargo Period: The thesis is under embargo until 11 April 2026

How to Cite

If you reference this work, please cite the thesis. GitHub will display citation formats (APA, BibTeX) via the "Cite this repository" button in the sidebar, or you can use the information from the CITATION.cff file.

Important: This code is provided under a restrictive license. Please review the LICENSE file before any use. The code may not be used in academic publications, commercial products, or derivative works without explicit permission.

πŸ“œ License

Copyright (c) 2025 Arman Behkish. All Rights Reserved.

This software is provided under a restrictive proprietary license. The code is provided for viewing and educational purposes only.

Prohibited without written permission:

  • Copying, modifying, or redistributing the code
  • Using the code or algorithms in academic publications
  • Using the code in commercial or industrial products
  • Creating derivative works

Note: The thesis document on the university portal is licensed under CC BY-NC-ND 3.0, while this code repository has a separate, more restrictive license.

See the LICENSE file for complete terms.

About

Multidimensional Time Series Anomaly detection with Itemset Mining

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages