Skip to content

Application for preprocessing, analysis and anomaly detection of US stock market data on multiple clusters using Apache Spark

License

Notifications You must be signed in to change notification settings

anjanatiha/Distributed-Big-Data-Application-for-Large-Scale-US-Stock-Market-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Processing and Analysis of US Stock Market Data

Domain : Big Data, Data Analytics, Distributed Computing, Financial Data Analytics, Finance.
Sub-Domain : Distributed Big Data Analysis, Distributed Big Data Analytics, Multi-market Analysis, Anomaly Detection.
Techniques : Distributed Computing, Anomaly Detection.
Task : Preprocessinng Financial Data, Multi-market Analysis, Anomaly Detection.

Description

  1. Developed architecture for preprocessing and analysis of 7 years of historical US stock market data (50 TB).
  2. Extracted information from raw data files based on field length specifications available on US Stock Exchange for multiple years and file formats with nanosecond granularity.
  3. Preprocessed and analyzed data on multiple clusters with Apache Spark for reducing time complexity.
  4. Conducted stock market data analysis (multi-market analysis for market dominance) and anomaly detection (Flash crash day - May 6, 2010, and August 24, 2015) and generated visualization and report.
  5. Proposed using unsupervised learning/clustering on large-scale unlabeled stock market data for anomaly detection and general market analysis in absence of labels.
Languages : JAVA, Apache Spark
Tools/IDE : IntelliJ Idea, JAVA SDK/ JDK, JAVA JRE, Git, Maven, Linux
Libraries : Apache Spark, PySpark
Duration : May - July 2017

Current Version : v1.0.0.0

Last Update : 07.31.2017

About

Application for preprocessing, analysis and anomaly detection of US stock market data on multiple clusters using Apache Spark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages