Skip to content

Kyziridis/Distributed-Data-Mining-on-Amazon-Data-

Repository files navigation

Distributed-Data-Mining

Amazon Data

Estimate and predict the rating according to the comments of each item in distributed fashion.
All scripts are under Amazon Project where is the report as well (Predicting Ratings of Amazon Products Using Review Texts.pdf).
It is basically a benchmark between some classification and regression algorithms.
The benchmark was tested in a cluster of 20 old-machines of LIACS. There is also a file Dask_Demo.ipynb which is a mini tutorial about DASK, an open source python library for Distributed Data Analysis.

Methods

  • TF-IDF for feature extraction in distributed fahsion using 'pdsh'
  • Logistic Regression
  • Linear Regression
  • Multilayer Perceptron
  • Naive Bayes Classifier

Download and Enjoy
Support GNU/Linux >_