Blog : anshgandhi.wordpress.com
We concluded the project and used Random Forest of ML on DataFrames - PySpark. Another implementation that we did, was using Isolation Forests instead of Random Forests. Isolation Forest is a type of Unsupervised Learning technique. PySpark does not have an Isolation Forest built into it so we implemented it in R.
The data set that we used is avaliable at:
Webpage
Dataset
Isolation Forest Link