# Comparison of Four Geocoders: Nominatim, GoogleV3, ArcGis and AzureMaps
----

## Why I setup this comparison:
In another application, I was using the New York City boroughs bounding boxes to impute missing borough names for records with geolocation information. 
The most expert GIS users among you would certainly predict scattershot results from such a "corner-cutting" approach, but initially I thought mine was a great way to prevent over 85,000 requests... Of course, once I discovered the official territorial boundaries (shapefiles), I trashed the box solution!  
Yet, in the intervening time I had checked several services for speed and limits and I found out differences between some geocoders response given an identical query, so I investigated!

The [**Procedures notebook** in the ./notebooks folder/](./notebooks/Procedures.iynb) shows how to call the functions to retrieve the data tables and maps.

## Impementation details:

### Geocoding services (via Geopy):  
Obtaining the geolocation coordinates of a location specified by query string can be achieved using calls to geocoding APIs directly in a browser address box, or with
a wrapping library such as [geopy](https://geopy.readthedocs.io/en/stable/).

#### Here are the links to the geocoders geopy documentation and their respective service providers:
*  [Nominatim](https://geopy.readthedocs.io/en/stable/#nominatim): [OpenStreetMaps](https://wiki.openstreetmap.org/wiki/Nominatim)
*  [GoogleV3](https://geopy.readthedocs.io/en/stable/#googlev3): [Google Map & Places API](https://developers.google.com/maps/documentation/geocoding/start)
*  [ArcGis](https://geopy.readthedocs.io/en/stable/#ArcGis): [ERSI ArcGIS API](https://developers.arcgis.com/rest/geocode/api-reference/overview-world-geocoding-service.htm)
*  [AzureMaps](https://geopy.readthedocs.io/en/stable/#azuremaps): [Microsoft Azure Maps API](https://docs.microsoft.com/en-us/azure/azure-maps/index)

### Geocoder class setup (as per `GeocoderComparison/comparison.get_geo_data()` function; tout: 5 sec timeout):
```
g = Nominatim(user_agent='this_app', country_bias='USA', timeout=tout)
g = GoogleV3(api_key=GOOGLE_KEY, timeout=tout)
g = ArcGIS(user_agent='this_app', timeout=tout)
g = AzureMaps(user_agent='ths_app', subscription_key=AZURE_KEY, timeout=tout)
```
To be fair, I should have removed the `country_bias` parameter for Nominatim, but my latest report is now a temporal comparison of the data I obtained in September 2018 and those I 'refreshed' in April 2019. On the other hand, all query strings contain "USA", so the geocoders should parse the country info indentically (and presumable, use it to filter the search).

#### The strings that were be passed to each geocoder:
```
0. 'New York City, NY, USA'
1. "Cleopatra's needle, Central Park, New York, NY, USA"
2. 'Bronx county, NY, USA'
3. 'Kings county, NY, USA'
4. 'New York county, NY, USA'
5. 'Queens county, NY, USA'
6. 'Richmond county, NY, USA'
7. 'Boston, MA, USA'
```
I have to admit, the choice of queries was driven out of curiosity, i.e. what are the answers to the following questions: "How is NYC geographically defined?", "Is a (relatively, or locally) well-known monument present in all geocoders databases?", "Why would geocoders disagree on territorial boundaries, while the official, legal definitions (shapefiles) are available?"

#### Shapefile sources:
* [New York City](https://data.cityofnewyork.us/City-Government/Borough-Boundaries-Water-Areas-Included-/tv64-9x69)
* [Boston](https://data.boston.gov/dataset/city-of-boston-boundary2)  


## The main conclusions from this comparison:

* Who is the best of all four?
 1. **Nominatim**: ⭐️Star of the glorious open-source community (see the data on Cleopatra's Needle in Central Park);
 2. **GoogleV3**: not OS, but similar results to Nominatim
 3. ArcGis: the least wrong of the laggards
 4. AzureMaps: Oh come on! &#128534;: "Extreme fuzziness is just extreme", me thinks.  
  
* No hedging!  
Depending on the geolocating service used AND the location queried, the geolocation coordinates will be WRONG; better not switch service!  


### *End of story?*
Out of curiousity, I wonder how AzureMaps would fare against all geocoders...Speaking of which:  
At the time of this report, **April 2019**, there are 21 geocoding services available in geopy (excluding [What3Words](https://geopy.readthedocs.io/en/stable/#what3words)):  
* The number of pairwise comparisons needed is 210.
* This would require **51 more reports like this one**, which uses the python code in GeocoderComparison that compares only four geocoders.


## Following is the complete report identifying the differences between the old and new data:

In [11]:
from IPython.display import HTML
HTML(filename = "./embedReport.html")