New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated batch_geocode for dictionaries #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your taking a stab at this!
Looking it over, there are some untested cases with the new data structure which led to some bugs. Other than those, the approach of adding a new data structure is fine to me.
What might be more work is that the client entry-point function geocode
needs to also support a batch dict query vs just a list. It can get kinda complicated, as not only can a user query with a dict of address strings (that structurally resembles a singular components dict), but also a dict of dicts of components.
Maybe the change to geocode
could look something like this:
@protect_fields
def geocode(self, address_data=None, components_data=None, **kwargs):
"""
Returns geocoding data for <NEEDS_FIXING>
Provides a single point of access for end users.
"""
if address_data is None and components_data is None:
return None
use_components = components_data is not None and address_data is None
param_data = components_data if use_components else address_data
if isinstance(param_data, list) or (use_components and isinstance(param_data, dict) and all(
isinstance(c, dict) for c in param_data.values())):
return self.batch_geocode(param_data, **kwargs)
else:
param_key = 'components' if use_components else 'address'
kwargs.update({param_key: param_data})
return self.geocode_address(**kwargs)
All in all, there's an argument to be made that using a dict for your request with arbitrary key strings doesn't add any extra value to the lookup functionality of the original query address string or components dict that's already present.
Thanks for the feedback! Added the suggested changes above. To note - reverse geocoding functionality hasn't been touched although it should be updated accordingly if you want to include these changes. |
I originally mentioned changing it to work similarly for reverse geocoding, but a rough scan of the Geocod.io API docs shows that batch reverse geocoding doesn't support a dict of reference keys to coordinate pairs. So it should be alright to leave it unchanged, unless I missed something there... |
Fair point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few minor requested changes + 1 medium-sized change (supporting default
in the get
function).
In addition to those, make sure to run flake8 + tox and fix any formatting + backwards compatibility issues. Ideally this can be setup as a precommit hook at some point...
geocodio/client.py
Outdated
param_key = 'address' if address_data is not None else 'components' | ||
kwargs.update({param_key: param_data}) | ||
return self.geocode_address(**kwargs) | ||
if (address_data is not None) == (components_data is not None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be and
vs ==
. If this was straight from my comment - whoops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the behavior is that we want to return None
when either both of these inputs are None
or both of these inputs are not None
. If this is and
then we proceed even when both address_data
and components_data
is None
geocodio/data.py
Outdated
key = json.dumps(key) | ||
elif key in self.keys(): | ||
return super().get(key) | ||
|
||
return self[self.lookups[key]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we're still supporting supporting Python 2.7 as of right now. The expression super()
isn't compatible with Python 2.7 as mentioned here, so for backwards compatibility, it'd need to be super(LocationCollectionDict, self)
instead. You can test backwards compatibility and any breaking changes using tox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 2.7 support is now dropped
geocodio/data.py
Outdated
@@ -171,32 +171,35 @@ def get(self, key): | |||
Returns an individual Location by query lookup, e.g. address, components dict, or point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized looking it over, but as mentioned below, this get
function "overrides" the default dict get
function, so it should support the default
parameter of the default dict get as function overloading isn't pythonic and not really supported. To support the default
parameter, you might want to reorder the logic of lines 188-191 so that you first check if key
is in self.lookups
, and if so, set key = self.lookups[key]
and then perform the super.get(key, default)
with a propagated default
parameter.
This will require additional get
tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with a default value of KeyError
, not sure if it's better to default to None
as the parent dict get
method does
3a337a0
to
8f8d81e
Compare
… get behavior when passed a dictionary key
… method by allowing the default parameter
8f8d81e
to
cc18252
Compare
) | ||
|
||
# Case sensitive on the specific query | ||
self.assertEqual(KeyError, locations.get("3101 Patterson Ave, richmond, va")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to update the existing get
tests to assertEquals
on KeyError
from assertRaises
, as the python dict get
method avoids raising a KeyError
by default.
super(LocationCollectionDict, self).__init__(results) | ||
self.order = order | ||
|
||
def get(self, key, default=KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocationCollection
extends list
and has no "expected behavior" for its get
function, while LocationCollectionDict
has the expected behavior of returning default
if the key isn't there (and the default for default
is None
). Now, for parity and consistency, the __getitem__
magic method can be overridden for both LocationCollection
and LocationCollectionDict
in order to have that be the use-case of throwing a KeyError if a key isn't present, otherwise returning the value (note that for LocationCollection
, a numerical index would be a valid key in __getitem__
).
The usage would be:
x = LocationCollectionDict(...)
x[some_key] # -> KeyError if missing or value if not
# AND
y = LocationCollection(...)
y[some_key] # -> KeyError if missing or value if not
# OR
x.get(some_key, some_default) # -> some_default if key is missing, otherwise value
# AND
y.get(some_key, some_default) # -> some_default if key is missing, otherwise value
This way LocationCollectionDict
maintains the expected behavior of a dict, and LocationCollection has some parity with LocationCollectionDict
in how you can access elements using a key. I think this is the more intuitive approach for handling this. Other opinions are welcome.
This addresses #27
This is admittedly a very blunt approach to the problem: LocationCollectionDict is a version of LocationCollection that inherits from dictionary instead of list.
The batch geocoder endpoint for geocod.io accepts a JSON object.
Currently
batch_geocode
only accepts lists of addresses. When a python dictionary of keys and addresses is passed tobatch_geocode
, the geocod.io interprets it as a JSON object and returns a JSON object in its response. Theresults
field is now a JSON object rather than an array, which python interprets as a dictionary.LocationCollection expects a list from
response.json()["results"]
and understandably breaks. If the response is instead a dictionary, it gets rerouted to LocationCollectionDict to create a comparable dictionary object. LocationCollectionDict changes the list methods in LocationCollection to dictionary methods