Skip to content

bugracoskun/K-NN

Repository files navigation

K-NN

Investigating the k-NN performance in Postgres and MongoDB on New York City's taxi dataset.

About Project

In this project, it is aimed to apply k-NN query in two different databases, PostgreSQL and MongoDB. A Python class is created to facilitate the performance analysis between the database management systems. New York taxi data set is used. It is available data set New York taxi data. The accuracy of the results compared with Haversine and Vincenty formulas. These formulas are used for distance calculation between two points on earth.
Also a sample data set has uploaded as GeoJSON file. This data set can be import to MongoDB directly and can be run through the Python class.

The implementation of this project and the results are submitted to the "International Workshop on Collaborative Crowdsourced Cloud Mapping and Geospatial Big Data"

Publication: Coşkun, İ. B., Sertok, S., and Anbaroğlu, B.: K-NEAREST NEIGHBOUR QUERY PERFORMANCE ANALYSES ON A LARGE SCALE TAXI DATASET: POSTGRESQL VS. MONGODB, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W13, 1531-1538, https://doi.org/10.5194/isprs-archives-XLII-2-W13-1531-2019, 2019.

Haversine and Vincenty

Haversine formula determines earth as a great-circle and calcualetes distance between two points on a sphere. Vincenty formula determines earth as an ellipsoid. Parameters can change according to reference ellipsoid. In this project WGS84 ellipoid parameters are used.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published