Skip to content

Effectiveness of various methods of detecting global, conditional, and collective outliers in minute interval stock prices of Nikkei 225 constituents over two years

Notifications You must be signed in to change notification settings

JamesSullivan/stock_intraday_anomaly_detection_comparison

Repository files navigation

Comparison of methods for Anomaly Detection in Intraday Individual Stock Prices

This project compares the effectiveness of various methods of Anomaly Detection for intraday individual stock prices. Methods contrasted include statistical such as ARIMA and Histogram; Proximity, Clustering, and angle-based methods such as k-NN Distance, Local Outlier Factor, and Clustering-Based Local Outlier Factor, and Angle-based Outlier Detection; classification based methods such as Isolation Forest and One-class SVM Detector; the dimensionality reduction method of Principal Component Analysis; and finally neural network related methods such as Autoencoders and Deep Support Vector Data Description.  

In stock price time series, “normal” or “outlier” labeled results for training and validation are generally unavailable, necessitating an unsupervised learning method. Unsupervised anomaly detection methods implicitly assume that the normal objects are somewhat “clustered.” Given the unsupervised nature of stock price anomalies, we try to detect all three types of (global, conditional, and collective) outliers in entirely artificially constructed situations. We then inject outliers into the actual stock prices of Nikkei 225 constituents using minute intervals over two years. The PDF file AnomalyDetectionIntraDayStockPrices.pdf contains a comparison of the effectiveness of the different anomaly detection methods.

 

folder structure

data/prices/ - raw minute price data in csv format (not included due to licensing)

Date HIGH LOW OPEN CLOSE COUNT VOLUME
2020-11-24 00:04:00 9430 9392 9415 9410 397 729900
2020-11-24 00:05:00 9453 9409 9409 9446 320 169200

data/results/ - anomaly detection results in csv format

Anomaly Model Stock Accuracy Precision Recall F1 Score
Collective cluster 9432.T 0.9993 0.8477 1 0.9176
Collective DeepSVDD 6098.T 0.9992 0.8249 1 0.9040

data/Nikkei225.csv - Nikkei 225 Constituents & TSI

Instrument Organization Name TSE33 Subsector name
0 2802.T Ajinomoto Co Inc Foods
1 9202.T ANA Holdings Inc Air Transportation

data/TSE33.csv - TOPIX Sector Indices (TSI) and Market Data Codes

TSI QUICK_PR QUICK_TR BBG_TR BBG_PR REFINITIV_PR REFINITIV_TR
Fishery, Agriculture & Forestry 321 S321/TSX TPFISH TPXDFISH .IFISH.T .IFISHDV.T
Foods 322 S322/TSX TPFOOD TPXDFOOD .IFOOD.T .IFOODDV.T

data/eikon_api.py - Class for downloading and storing data from Refinitiv Eikon api (requires license) in data/prices/ folder

data/jpx.py - class for preprocessing and exposing Japan Exchange Goup price data (data must already be in data folder see data/eikon_api.py)

   

utils/outlier.py - Code for injecting outliers into lists, series, and dataframes

   

data_examination.ipynb - investigating stock and TSI price data

main_test.ipynb - main notebook for generating results from 10 anomaly detection methods

arima.ipynb - investigating and running ARIMA anomaly detection method

results.txt - results generated from main_test.ipynb and arima.ipynb. Should be appended manually to results in data/results/results.txt.

visualization_2d_and_3dTSNE.ipynb - 2d visualizations of model results as well as t-SNE plot to view high dimensional data

visualization_3d_interactive.ipynb - interactive 3-D Visualization of Anomaly Detection of Global, Contextual, and Collective Outliers

About

Effectiveness of various methods of detecting global, conditional, and collective outliers in minute interval stock prices of Nikkei 225 constituents over two years

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published