Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 667 Bytes

README.md

File metadata and controls

13 lines (11 loc) · 667 Bytes

Guide to Modifying code for Various Modelling Methods

Regardless of attributes used to modeling, the overall flow should be the same. Overall process should be: read in data, process into modeling format, split into training and testing, convert to LabeledPoint object, pass to SVM.

Dependencies

  • Apache Spark
  • NumPy(for Spark's SVM to run)
  • YARN(optional)

Other Notes

  • Make sure all the CSV files are in the same directory on HDFS
  • Changes the code that reads from CSV files to point to the correct directory on HDFS
  • Every code chunk has a different level of data granularity. Edit the code depending on what you feel is the correct granularity/set