Python version: 3.7
- numpy 1.16.4 (for general (vectorized) computations)
- matplotlib 3.0.3 (for the plots)
- tkinterx 0.0.9 (for the GUI)
- scikit-learn 0.23.1 (for comparing to an existing implementation - not used for own computing)
- Execute
main.py
- A GUI should open
- Select a data directory containing the datasets in the same format as the provided ones (by default
./data
is assumed) - Now you can adjust various parameters (the maximum value of k to be tried, the partition count l, the search algorithm) for the classification algorithm
- Then either select a single dataset and run the classification algorithm for it (the algorithm has finished once a dialog opens announcing that it did complete) or run the classification for all datasets - then the results will be print on the console (otherwise you'll get multiple plots and some data in the GUI).
- Available nearest neighbor searches are brute sorting and k-d tree, however the latter is not sufficiently optimized.
We implemented a special feature for 2D datasets with the brute-sort algorithm, which calls the function generated by the algorithm not only on the test data, but on a 100x100 grid in [0, 1] x [0, 1]. This allows the user to better understand how the function generated by the program looks like. Note that if you enable the grid, it'll be displayed instead of the test data-plot. In the multi-classification the grid setting is ignored.