Skip to content

Anomaly detection related books, papers, videos, and toolboxes

License

Notifications You must be signed in to change notification settings

HIT-MSC/anomaly-detection-resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anomaly Detection Learning Resources

GitHub stars GitHub forks License 996.ICU

Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field, which aims to identify outlying objects that are deviant from the general data distribution. Outlier detection has been proven critical in many fields, such as credit card fraud analytics, network intrusion detection, and mechanical unit defect detection.

This repository collects:

  1. Books & Academic Papers
  2. Online Courses and Videos
  3. Outlier Datasets
  4. Open-source and Commercial Libraries/Toolkits
  5. Key Conferences & Journals
  6. Paper Downloader (under development): a Python script to download open-access papers listed in this repository.

More items will be added to the repository. Please feel free to add other key resources by opening an issue report, submitting a pull request, or dropping me an email @ (zhaoy@cmu.edu). Enjoy reading!


Table of Contents


1. Books & Tutorials

1.1. Books

Outlier Analysis by Charu Aggarwal: Classical text book covering most of the outlier analysis techniques. A must-read for people in the field of outlier detection. [Preview.pdf]

Outlier Ensembles: An Introduction by Charu Aggarwal and Saket Sathe: Great intro book for ensemble learning in outlier analysis.

Data Mining: Concepts and Techniques (3rd) by Jiawei Han and Micheline Kamber and Jian Pei: Chapter 12 discusses outlier detection with many key points. [Google Search]

1.2. Tutorials

Tutorial Title Venue Year Ref Materials
Outlier detection techniques ACM SIGKDD 2010 [22] [PDF]
Anomaly Detection: A Tutorial ICDM 2011 [10] [PDF]
Data mining for anomaly detection PKDD 2008 [23] [Video]

2. Courses/Seminars/Videos

Coursera Introduction to Anomaly Detection (by IBM): [See Video]

Coursera Real-Time Cyber Threat Detection and Mitigation partly covers the topic: [See Video]

Coursera Machine Learning by Andrew Ng also partly covers the topic:

Udemy Outlier Detection Algorithms in Data Mining and Data Science: [See Video]

Stanford Data Mining for Cyber Security also covers part of anomaly detection techniques: [See Video]


3. Toolbox & Datasets

3.1. Multivariate Data

[Python] Python Outlier Detection (PyOD): PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. It contains more than 20 detection algorithms, including emerging deep learning models and outlier ensembles.

[Python] Scikit-learn Novelty and Outlier Detection. It supports some popular algorithms like LOF, Isolation Forest, and One-class SVM.

[Java] ELKI: Environment for Developing KDD-Applications Supported by Index-Structures: ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.

[Java] RapidMiner Anomaly Detection Extension: The Anomaly Detection Extension for RapidMiner comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets. It allows you to find data, which is significantly different from the normal, without the need for the data being labeled.

[R] outliers package: A collection of some tests commonly used for identifying outliers in R.

[Matlab] Anomaly Detection Toolbox - Beta: A collection of popular outlier detection algorithms in Matlab.

3.2. Time series outlier detection

[Python] datastream.io: An open-source framework for real-time anomaly detection using Python, Elasticsearch and Kibana.

[Python] skyline: Skyline is a near real time anomaly detection system.

[Python] banpei: Banpei is a Python package of the anomaly detection.

[Python] telemanom: A framework for using LSTMs to detect anomalies in multivariate time series data.

[R] AnomalyDetection: AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend.

3.3. Datasets

ELKI Outlier Datasets: https://elki-project.github.io/datasets/outlier

Outlier Detection DataSets (ODDS): http://odds.cs.stonybrook.edu/#table1

Unsupervised Anomaly Detection Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF

Anomaly Detection Meta-Analysis Benchmarks: https://ir.library.oregonstate.edu/concern/datasets/47429f155


4. Papers

4.1. Overview & Survey Papers

Paper Title Venue Year Ref Materials
A survey of outlier detection methodologies ARTIF INTELL REV 2004 [20] [PDF]
Anomaly detection: A survey CSUR 2009 [9] [PDF]
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study DMKD 2016 [7] [HTML], [SLIDES]
A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data PLOS ONE 2016 [18] [PDF]
Research Issues in Outlier Detection Book Chapter 2019 [43] [HTML]
Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection SAC 2019 [16] [HTML]

4.2. Key Algorithms

Abbreviation Paper Title Venue Year Ref Materials
kNN Efficient algorithms for mining outliers from large data sets ACM SIGMOD Record 2000 [38] [PDF]
KNN Fast outlier detection in high dimensional spaces PKDD 2002 [5] [PDF]
LOF LOF: identifying density-based local outliers ACM SIGMOD Record 2000 [6] [PDF]
IForest Isolation forest ICDM 2008 [26] [PDF]
OCSVM Estimating the support of a high-dimensional distribution Neural Computation 2001 [42] [PDF]
AutoEncoder Ensemble Outlier detection with autoencoder ensembles SDM 2017 [11] [PDF]

4.3. Graph & Network Outlier Detection

Paper Title Venue Year Ref Materials
Graph based anomaly detection and description: a survey DMKD 2015 [4] [PDF]
Anomaly detection in dynamic networks: a survey WIREs Computational Statistic 2015 [39] [PDF]

4.4. Time Series Outlier Detection

Paper Title Venue Year Ref Materials
Outlier detection for temporal data: A survey TKDE 2014 [19] [PDF]
Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding KDD 2018 [21] [PDF], [Code]

4.5. Feature Selection in Outlier Detection

Paper Title Venue Year Ref Materials
Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings ICDM 2016 [33] [PDF]
Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection IJCAI 2017 [34] [PDF]

4.6. High-dimensional & Subspace Outliers

Paper Title Venue Year Ref Materials
A survey on unsupervised outlier detection in high-dimensional numerical data Stat Anal Data Min 2012 [50] [HTML]
Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection SIGKDD 2018 [35] [PDF]
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection TKDE 2015 [37] [PDF], [SLIDES]
Outlier detection for high-dimensional data Biometrika 2015 [40] [PDF]

4.7. Outlier Ensembles

Paper Title Venue Year Ref Materials
Outlier ensembles: position paper SIGKDD Explorations 2013 [2] [PDF]
Ensembles for unsupervised outlier detection: challenges and research questions a position paper SIGKDD Explorations 2014 [51] [PDF]
An Unsupervised Boosting Strategy for Outlier Detection Ensembles PAKDD 2018 [8] [HTML]
LSCP: Locally selective combination in parallel outlier ensembles SDM 2019 [49] [PDF]

4.8. Outlier Detection in Evolving Data

Paper Title Venue Year Ref Materials
A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction] SIGKDD Explorations 2018 [41] [PDF]
Outlier Detection in Feature-Evolving Data Streams SIGKDD 2018 [30] [PDF], [Github]

4.9. Representation Learning in Outlier Detection

Paper Title Venue Year Ref Materials
Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection SIGKDD 2018 [35] [PDF]
Learning representations for outlier detection on a budget Preprint 2015 [31] [PDF]
XGBOD: improving supervised outlier detection with unsupervised representation learning IJCNN 2018 [48] [PDF]

4.10. Interpretability

Paper Title Venue Year Ref Materials
Explaining Anomalies in Groups with Characterizing Subspace Rules DMKD 2018 [29] [PDF]
Beyond Outlier Detection: LookOut for Pictorial Explanation ECML-PKDD 2018 [32] [PDF]
Contextual outlier interpretation IJCAI 2018 [27] [PDF]
Mining multidimensional contextual outliers from categorical relational data IDA 2015 [44] [PDF]
Discriminative features for identifying and interpreting outliers ICDE 2014 [12] [PDF]

4.11. Social Media Anomaly Detection

Paper Title Venue Year Ref Materials
A survey on social media anomaly detection SIGKDD Explorations 2016 [47] [PDF]
GLAD: group anomaly detection in social media analysis TKDD 2015 [46] [PDF]

4.12. Outlier Detection in Other fields

Kannan, R., Woo, H., Aggarwal, C.C. and Park, H., 2017, June. Outlier detection for text data. In Proceedings of the 2017 SIAM International Conference on Data Mining (pp. 489-497). Society for Industrial and Applied Mathematics. [PDF]

4.13. Outlier Detection Applications

Field Paper Title Venue Year Ref Materials
Security A survey of distance and similarity measures used within network intrusion anomaly detection IEEE Commun. Surv. Tutor. 2015 [45] [PDF]
Security Anomaly-based network intrusion detection: Techniques, systems and challenges Computers & Security 2009 [17] [PDF]
Finance A survey of anomaly detection techniques in financial domain Future Gener Comput Syst 2016 [3] [PDF]
Traffic Outlier Detection in Urban Traffic Data WIMS 2018 [15] [HTML]

4.14. Active Anomaly Detection

Paper Title Venue Year Ref Materials
Active learning for anomaly and rare-category detection NeurIPS 2005 [36] [PDF]
Outlier detection by active learning SIGKDD 2006 [1] [PDF]
Active Anomaly Detection via Ensembles: Insights, Algorithms, and Interpretability Preprint 2019 [13] [PDF]

4.15. Outlier Detection with Neural Networks

Paper Title Venue Year Ref Materials
Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding KDD 2018 [21] [PDF], [Code]
MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks Preprint 2019 [25] [PDF], [Code]
Generative Adversarial Active Learning for Unsupervised Outlier Detection TKDE 2019 [28] [PDF], [Code]
Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection ICLR 2018 [52] [PDF], [Code]

4.16. Interactive Outlier Detection

Paper Title Venue Year Ref Materials
Learning On-the-Job to Re-rank Anomalies from Top-1 Feedback SDM 2019 [24] [PDF]
Interactive anomaly detection on attributed networks WSDM 2019 [14] [PDF]

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops

Key data mining conference deadlines, historical acceptance rates, and more can be found data-mining-conferences.

ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Note: SIGKDD usually has an Outlier Detection Workshop (ODD), see ODD 2018.

ACM International Conference on Management of Data (SIGMOD)

The Web Conference (WWW)

IEEE International Conference on Data Mining (ICDM)

SIAM International Conference on Data Mining (SDM)

IEEE International Conference on Data Engineering (ICDE)

ACM InternationalConference on Information and Knowledge Management (CIKM)

ACM International Conference on Web Search and Data Mining (WSDM)

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

5.2. Journals

ACM Transactions on Knowledge Discovery from Data (TKDD)

IEEE Transactions on Knowledge and Data Engineering (TKDE)

ACM SIGKDD Explorations Newsletter

Data Mining and Knowledge Discovery

Knowledge and Information Systems (KAIS)


References

[1]Abe, N., Zadrozny, B. and Langford, J., 2006, August. Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 504-509, ACM.
[2]Aggarwal, C.C., 2013. Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter, 14(2), pp.49-58.
[3]Ahmed, M., Mahmood, A.N. and Islam, M.R., 2016. A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55, pp.278-288.
[4]Akoglu, L., Tong, H. and Koutra, D., 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3), pp.626-688.
[5]Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery, pp. 15-27.
[6]Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM SIGMOD Record, 29(2), pp. 93-104.
[7]Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I. and Houle, M.E., 2016. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), pp.891-927.
[8]Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham.
[9]Chandola, V., Banerjee, A. and Kumar, V., 2009. Anomaly detection: A survey. ACM computing surveys , 41(3), p.15.
[10]Chawla, S. and Chandola, V., 2011, Anomaly Detection: A Tutorial. Tutorial at ICDM 2011.
[11]Chen, J., Sathe, S., Aggarwal, C. and Turaga, D., 2017, June. Outlier detection with autoencoder ensembles. SIAM International Conference on Data Mining, pp. 90-98. Society for Industrial and Applied Mathematics.
[12]Dang, X.H., Assent, I., Ng, R.T., Zimek, A. and Schubert, E., 2014, March. Discriminative features for identifying and interpreting outliers. In International Conference on Data Engineering (ICDE). IEEE.
[13]Das, S., Islam, M.R., Jayakodi, N.K. and Doppa, J.R., 2019. Active Anomaly Detection via Ensembles: Insights, Algorithms, and Interpretability. arXiv preprint arXiv:1901.08930.
[14]Ding, K., Li, J. and Liu, H., 2019, January. Interactive anomaly detection on attributed networks. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 357-365. ACM.
[15]Djenouri, Y. and Zimek, A., 2018, June. Outlier detection in urban traffic data. In Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics. ACM.
[16]Falcão, F., Zoppi, T., Silva, C.B.V., Santos, A., Fonseca, B., Ceccarelli, A. and Bondavalli, A., 2019, April. Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, (pp. 318-327). ACM.
[17]Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G. and Vázquez, E., 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1-2), pp.18-28.
[18]Goldstein, M. and Uchida, S., 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS one, 11(4), p.e0152173.
[19]Gupta, M., Gao, J., Aggarwal, C.C. and Han, J., 2014. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), pp.2250-2267.
[20]Hodge, V. and Austin, J., 2004. A survey of outlier detection methodologies. Artificial intelligence review, 22(2), pp.85-126.
[21](1, 2) Hundman, K., Constantinou, V., Laporte, C., Colwell, I. and Soderstrom, T., 2018, July. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (pp. 387-395). ACM.
[22]Kriegel, H.P., Kröger, P. and Zimek, A., 2010. Outlier detection techniques. Tutorial at ACM SIGKDD 2010.
[23]Lazarevic, A., Banerjee, A., Chandola, V., Kumar, V. and Srivastava, J., 2008, September. Data mining for anomaly detection. Tutorial at ECML PKDD 2008.
[24]Lamba, H. and Akoglu, L., 2019, May. Learning On-the-Job to Re-rank Anomalies from Top-1 Feedback. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM), pp. 612-620. Society for Industrial and Applied Mathematics.
[25]Li, D., Chen, D., Shi, L., Jin, B., Goh, J. and Ng, S.K., 2019. MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks. arXiv preprint arXiv:1901.04997.
[26]Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining, pp. 413-422. IEEE.
[27]Liu, N., Shin, D. and Hu, X., 2017. Contextual outlier interpretation. In International Joint Conference on Artificial Intelligence (IJCAI-18), pp.2461-2467.
[28]Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M. and He, X., 2019. Generative Adversarial Active Learning for Unsupervised Outlier Detection. IEEE transactions on knowledge and data engineering.
[29]Macha, M. and Akoglu, L., 2018. Explaining anomalies in groups with characterizing subspace rules. Data Mining and Knowledge Discovery, 32(5), pp.1444-1480.
[30]Manzoor, E., Lamba, H. and Akoglu, L. Outlier Detection in Feature-Evolving Data Streams. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018.
[31]Micenková, B., McWilliams, B. and Assent, I., 2015. Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104.
[32]Gupta, N., Eswaran, D., Shah, N., Akoglu, L. and Faloutsos, C., Beyond Outlier Detection: LookOut for Pictorial Explanation. ECML PKDD 2018.
[33]Pang, G., Cao, L., Chen, L. and Liu, H., 2016, December. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Data Mining (ICDM), 2016 IEEE 16th International Conference on (pp. 410-419). IEEE.
[34]Pang, G., Cao, L., Chen, L. and Liu, H., 2017, August. Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2585-2591). AAAI Press.
[35](1, 2) Pang, G., Cao, L., Chen, L. and Liu, H., 2018. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018.
[36]Pelleg, D. and Moore, A.W., 2005. Active learning for anomaly and rare-category detection. In Advances in neural information processing systems, pp. 1073-1080.
[37]Radovanović, M., Nanopoulos, A. and Ivanović, M., 2015. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE transactions on knowledge and data engineering, 27(5), pp.1369-1382.
[38]Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM SIGMOD Record, 29(2), pp. 427-438.
[39]Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C. and Samatova, N.F., 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3), pp.223-247.
[40]Ro, K., Zou, C., Wang, Z. and Yin, G., 2015. Outlier detection for high-dimensional data. Biometrika, 102(3), pp.589-599.
[41]Salehi, Mahsa & Rashidi, Lida. (2018). A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]. ACM SIGKDD Explorations Newsletter. 20. 13-23.
[42]Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 2001. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), pp.1443-1471.
[43]Suri, N.R. and Athithan, G., 2019. Research Issues in Outlier Detection. In Outlier Detection: Techniques and Applications, pp. 29-51. Springer, Cham.
[44]Tang, G., Pei, J., Bailey, J. and Dong, G., 2015. Mining multidimensional contextual outliers from categorical relational data. Intelligent Data Analysis, 19(5), pp.1171-1192.
[45]Weller-Fahy, D.J., Borghetti, B.J. and Sodemann, A.A., 2015. A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Communications Surveys & Tutorials, 17(1), pp.70-91.
[46]Yu, R., He, X. and Liu, Y., 2015. GLAD: group anomaly detection in social media analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(2), p.18.
[47]Yu, R., Qiu, H., Wen, Z., Lin, C. and Liu, Y., 2016. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter, 18(1), pp.1-14.
[48]Zhao, Y. and Hryniewicki, M.K., 2018, July. XGBOD: improving supervised outlier detection with unsupervised representation learning. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE.
[49]Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM), pp. 585-593. Society for Industrial and Applied Mathematics.
[50]Zimek, A., Schubert, E. and Kriegel, H.P., 2012. A survey on unsupervised outlier detection in high‐dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), pp.363-387.
[51]Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter, 15(1), pp.11-22.
[52]Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D. and Chen, H., 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. International Conference on Learning Representations (ICLR).

About

Anomaly detection related books, papers, videos, and toolboxes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%