Skip to content

peterwilliams97/weka_tools

Repository files navigation

Set of Jython tools to perform data mining tasks using Weka

http://bit.ly/weka_tools

Needs Jython and Weka.

Uses UCI Michalski and Chilausky soybean data set

Originally developed for a class assignment.

Summary

  1. ** setup.bat** Shows how to set up classpath to use WEKA from Jython
  2. preprocess_soybeans.py Pre-processes the soybean data set
  3. find_best_attributes.py Finds subset of attributes that give best classification accuracy for a given algorithm and data set
  4. arff.py Weka .arff file reader and writer
  5. split_data.py Splits a WEKA .arff file to preserve class distribution and maximize or minimize aggregate accuracy of a set of classifiers. Output is 2 WEKA .arff files
  6. **find_soybean_split.bat / find_soybean_split.sh ** Shows how to run split_data.py on a pre-processed soybean .arff file

Results are in the data directory.

Example use of split_data.py

The batch/shell file find_soybean_split.bat / find_soybean_split.sh runs split_data.py on soybean-large.data.missing.values.replaced.arff to create the training and test files soybean-large.data.missing.values.replaced.best.train.arff and soybean-large.data.missing.values.replaced.best.test.arff which give the classification results soybean.split.results.txt whose summary is

Classifier Correct (out of 60) Percentage Correct
NaiveBayes 57 95 %
J48 58 96.67 %
BayesNet 59 98.33 %
RandomForest 59 98.33 %
JRip 60 100 %
KStar 60 100 %
SMO 60 100 %
MLP 60 100 %

About

Some Weka-based tools written in Jython

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published