- Process data into Weka ARFF format
- Train a classifier using Weka
- Test your classifier against test data
- Download training data tarball from http://dojo.v.wc1.atti.com/data/train
- Weka accepts ARFF files: http://www.cs.waikato.ac.nz/ml/weka/arff.html
- Each training or test file becomes a vector of features
- You can play around with the values or ranges of values for your features in the ARFF file itself
- POST to http://dojo.v.wc1.atti.com/classify/train with the following params:
- name (your name)
- classifier (weka classifier name, see list below)
- featuresTrain (your ARFF file)
- See https://github.com/a34729t/coding-dojo-ml/blob/master/client.rb for a sample client
- You get back a JSON response with statistics on how well you performed
- Check your performance vs others at http://dojo.v.wc1.atti.com/rank
- After a certain period of time we will allow you to access the training data
- Download test data tarball from http://dojo.v.wc1.atti.com/data/test
- Process the test data with the same features you use for training data
- POST to http://dojo.v.wc1.atti.com/classify/train with params:
- name
- classifier
- featuresTrain
- featuresTest
- I would recommend using Naive Bayes and the Decision Tree first as they give you good feedback about which features are useful
- Machine Learning is about quality data and features—what model you use is much less important
- weka.classifiers.bayes.NaiveBayes
- weka.classifiers.trees.J48
- weka.classifiers.functions.Logistic
- weka.classifiers.lazy.kstar
- weka.classifiers.rules.JRip
- weka.classifiers.functions.SMO