## NER and Nominatim

In [18]:
import os
from app.core.graphhopper import GraphHopper

# Problem


The query "buffer of 20 meters around daly city" does not work because the NER model does not recognize "Daly City".

## Current system:

- SpaCy NER model tags chunks as a Named Place (entities of type `LOC`, `GPE`, `ORG`, `FAC`) or unnamed Place.
- If unnamed Place:

  - Search OSM for matching values, return all matching values
  
- If Named Place:


  - Search Nominatim and return the top value if one exists
  - If no nominatim result, search OSM directly for matching values using some sort of text similarity, return the top value.

## Alternative:

- Search nominatim and return 
- If no results from nominatim, then try searching OSM 


## Pros and Cons

- Current system:

  - If NER model is wrong may return incorrect results because it misses a match
  - however it is likely that the named place will be returned as one of the set of results.
  
  
- Alternative Fallback:

  - Nominatim will almost always return results and there is no uncertainty that can be used to filter top results

## Proposal

- Keep current system
- Consider another type of fallback


## Evaluation

In [13]:
client = GraphHopper(os.environ["APP_GRAPH_HOPPER__API_KEY"])

In [16]:
client.geocode("daly city")

{'hits': [{'point': {'lat': 37.6904826, 'lng': -122.47267},
   'extent': [-122.5008215, 37.6485198, -122.4051391, 37.708269],
   'name': 'Daly City, California, United States of America',
   'country': 'United States',
   'city': 'Daly City',
   'state': 'California',
   'county': 'San Mateo County',
   'osm_id': 112271,
   'osm_type': 'R',
   'osm_value': 'city'}],
 'locale': 'en'}

In [15]:
client.geocode("benches")

{'hits': [{'point': {'lat': 42.4416101, 'lng': -76.4985196},
   'extent': [-76.4986547, 42.4415577, -76.4983846, 42.4416948],
   'name': 'benches, City of Ithaca, NY, United States of America',
   'country': 'United States',
   'city': 'City of Ithaca',
   'state': 'New York',
   'county': 'Tompkins County',
   'street': 'benches',
   'osm_id': 995640278,
   'osm_type': 'W',
   'osm_value': 'road'}],
 'locale': 'en'}

In [17]:
client.geocode("park")

{'hits': [{'point': {'lat': 39.1089299, 'lng': -105.7561639},
   'extent': [-106.210206, 38.690659, -105.3288612, 39.5681797],
   'name': 'Park County, Colorado, United States of America',
   'country': 'United States',
   'state': 'Colorado',
   'county': 'Park County',
   'osm_id': 439376,
   'osm_type': 'R',
   'osm_value': 'county'}],
 'locale': 'en'}