KNN With Spark

Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https://archive.ics.uci.edu/ml/datasets/Fertility). The data was first normalized, also using PySpark. Euclidean Distance was used as the similarity measure. The optimal k found for both datasets was 5. The iris dataset had a test accuracy of 97% and the fertility dataset had a test accuracy of 88%.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
knn_pyspark.py		knn_pyspark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNN With Spark

About

Releases

Packages

Languages

ZachPetroff/KNN-With-Spark

Folders and files

Latest commit

History

Repository files navigation

KNN With Spark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages