Set of Jython tools to perform data mining tasks using Weka

Needs Jython and Weka.

Uses UCI Michalski and Chilausky soybean data set

Originally developed for a class assignment.

Summary

** setup.bat** Shows how to set up classpath to use WEKA from Jython
preprocess_soybeans.py Pre-processes the soybean data set
find_best_attributes.py Finds subset of attributes that give best classification accuracy for a given algorithm and data set
arff.py Weka .arff file reader and writer
split_data.py Splits a WEKA .arff file to preserve class distribution and maximize or minimize aggregate accuracy of a set of classifiers. Output is 2 WEKA .arff files
**find_soybean_split.bat / find_soybean_split.sh ** Shows how to run split_data.py on a pre-processed soybean .arff file

Results are in the data directory.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
README.markdown		README.markdown
arff.py		arff.py
combine_classes.py		combine_classes.py
csv.py		csv.py
explore_unimelb.py		explore_unimelb.py
find_best_attributes.py		find_best_attributes.py
find_correlations.py		find_correlations.py
find_duplicate_attributes.py		find_duplicate_attributes.py
find_soybean_split.bat		find_soybean_split.bat
find_soybean_split.sh		find_soybean_split.sh
ga.py		ga.py
get_attribute_subset.py		get_attribute_subset.py
misc.py		misc.py
preprocess_soybeans.py		preprocess_soybeans.py
remove_attributes.py		remove_attributes.py
setup.bat		setup.bat
setup.sh		setup.sh
split_data.py		split_data.py
unimelb_find_correlations.bat		unimelb_find_correlations.bat
unimelb_train.bat		unimelb_train.bat
weka_classifiers.py		weka_classifiers.py