New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Issue #144 - Re-implement classifiers using SMILE library #146
Conversation
}); | ||
String modelFilePath = outputPath + "/pageclassifier.model"; | ||
ArffParser arffParser = new ArffParser(); | ||
arffParser.setResponseIndex(7486); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this magic number 7486?
// instanceWeka.setDataset(instances); | ||
// double[] prob = classifier.distributionForInstance(instanceWeka); | ||
double[] prob = new double[2]; | ||
int predictedValue = ((SVM) classifier).predict(values, prob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using SoftClassifier interface instead of Classifier, then you wouldn't need to do a cast, and the code would work with any classifier algorithm implementation. Down casting is usually a code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change that and test again.
vectorAtt.addElement(new weka.core.Attribute("class", classAtt)); | ||
Instances insts = new Instances("link_classification", vectorAtt, 1); | ||
insts.setClassIndex(attributes.length); | ||
// weka.core.FastVector vectorAtt = new weka.core.FastVector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all old, comment code when the PR is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A left a few inline comments that need to be resolved. Besides these, there are only two more things left:
- The docs need to be updated (http://ache.readthedocs.io/en/latest/page-classifiers.html#pageclassifier-weka)
- The code changes are using the formatting is using tabs instead of 4 spaces. Need to use 4 spaces for tabs and format according the google style guide (https://github.com/ViDA-NYU/ache#contributing).
Classifier classifier = new weka.classifiers.functions.SMO(); | ||
classifier.setOptions(weka.core.Utils.splitOptions("-C 1.0 -L 0.0010 -P 1.0E-12 -N 0 -M -V -1 -W 1 -K \"weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0\" -no-cv")); | ||
classifier.buildClassifier(data); | ||
classifier.learn(smileInput.x(), y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Platt's scalling needs to be trained here too, otherwise scores will not be valid!
weka.core.Instance instanceWeka = new weka.core.Instance(1, values); | ||
instanceWeka.setDataset(instances); | ||
result = classifier.distributionForInstance(instanceWeka); | ||
((SVM<double[]>) classifier).predict(values, result); //predict returns int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast to SVM needs to be removed too!
@@ -39,7 +39,8 @@ dependencies { | |||
compile 'org.apache.tika:tika-parsers:1.14' | |||
compile 'com.syncthemall:boilerpipe:1.2.2' | |||
compile 'net.sourceforge.nekohtml:nekohtml:1.9.22' | |||
compile 'nz.ac.waikato.cms.weka:weka-stable:3.6.13' | |||
//compile 'nz.ac.waikato.cms.weka:weka-stable:3.6.13' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to remove this commented line.
Replaces all classifiers from Weka by SMILE. Fixes #144 .