Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated batch_geocode for dictionaries #42

Closed
wants to merge 6 commits into from

Conversation

liufran1
Copy link

This addresses #27

This is admittedly a very blunt approach to the problem: LocationCollectionDict is a version of LocationCollection that inherits from dictionary instead of list.

The batch geocoder endpoint for geocod.io accepts a JSON object.
Currently batch_geocode only accepts lists of addresses. When a python dictionary of keys and addresses is passed to batch_geocode, the geocod.io interprets it as a JSON object and returns a JSON object in its response. The results field is now a JSON object rather than an array, which python interprets as a dictionary.

LocationCollection expects a list from response.json()["results"] and understandably breaks. If the response is instead a dictionary, it gets rerouted to LocationCollectionDict to create a comparable dictionary object. LocationCollectionDict changes the list methods in LocationCollection to dictionary methods

Copy link
Collaborator

@Unix-Code Unix-Code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your taking a stab at this!

Looking it over, there are some untested cases with the new data structure which led to some bugs. Other than those, the approach of adding a new data structure is fine to me.

What might be more work is that the client entry-point function geocode needs to also support a batch dict query vs just a list. It can get kinda complicated, as not only can a user query with a dict of address strings (that structurally resembles a singular components dict), but also a dict of dicts of components.

Maybe the change to geocode could look something like this:

@protect_fields
def geocode(self, address_data=None, components_data=None, **kwargs):
    """
    Returns geocoding data for <NEEDS_FIXING>

    Provides a single point of access for end users.
    """
    if address_data is None and components_data is None:
        return None
    
    use_components = components_data is not None and address_data is None
    param_data = components_data if use_components else address_data
    if isinstance(param_data, list) or (use_components and isinstance(param_data, dict) and all(
            isinstance(c, dict) for c in param_data.values())):
        return self.batch_geocode(param_data, **kwargs)
    else:
        param_key = 'components' if use_components else 'address'
        kwargs.update({param_key: param_data})
        return self.geocode_address(**kwargs)

All in all, there's an argument to be made that using a dict for your request with arbitrary key strings doesn't add any extra value to the lookup functionality of the original query address string or components dict that's already present.

geocodio/data.py Outdated Show resolved Hide resolved
tests/test_data.py Show resolved Hide resolved
tests/response/batch_components_dict.json Outdated Show resolved Hide resolved
geocodio/data.py Outdated Show resolved Hide resolved
geocodio/data.py Outdated Show resolved Hide resolved
@liufran1
Copy link
Author

Thanks for the feedback! Added the suggested changes above. To note - reverse geocoding functionality hasn't been touched although it should be updated accordingly if you want to include these changes.

@Unix-Code
Copy link
Collaborator

Thanks for the feedback! Added the suggested changes above. To note - reverse geocoding functionality hasn't been touched although it should be updated accordingly if you want to include these changes.

I originally mentioned changing it to work similarly for reverse geocoding, but a rough scan of the Geocod.io API docs shows that batch reverse geocoding doesn't support a dict of reference keys to coordinate pairs. So it should be alright to leave it unchanged, unless I missed something there...

@liufran1
Copy link
Author

I originally mentioned changing it to work similarly for reverse geocoding, but a rough scan of the Geocod.io API docs shows that batch reverse geocoding doesn't support a dict of reference keys to coordinate pairs. So it should be alright to leave it unchanged, unless I missed something there...

Fair point

Copy link
Collaborator

@Unix-Code Unix-Code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few minor requested changes + 1 medium-sized change (supporting default in the get function).

In addition to those, make sure to run flake8 + tox and fix any formatting + backwards compatibility issues. Ideally this can be setup as a precommit hook at some point...

param_key = 'address' if address_data is not None else 'components'
kwargs.update({param_key: param_data})
return self.geocode_address(**kwargs)
if (address_data is not None) == (components_data is not None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be and vs ==. If this was straight from my comment - whoops.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the behavior is that we want to return None when either both of these inputs are None or both of these inputs are not None. If this is and then we proceed even when both address_data and components_data is None

geocodio/data.py Outdated
key = json.dumps(key)
elif key in self.keys():
return super().get(key)

return self[self.lookups[key]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're still supporting supporting Python 2.7 as of right now. The expression super() isn't compatible with Python 2.7 as mentioned here, so for backwards compatibility, it'd need to be super(LocationCollectionDict, self) instead. You can test backwards compatibility and any breaking changes using tox.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2.7 support is now dropped

geocodio/data.py Outdated
@@ -171,32 +171,35 @@ def get(self, key):
Returns an individual Location by query lookup, e.g. address, components dict, or point.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized looking it over, but as mentioned below, this get function "overrides" the default dict get function, so it should support the default parameter of the default dict get as function overloading isn't pythonic and not really supported. To support the default parameter, you might want to reorder the logic of lines 188-191 so that you first check if key is in self.lookups, and if so, set key = self.lookups[key] and then perform the super.get(key, default) with a propagated default parameter.

This will require additional get tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a default value of KeyError, not sure if it's better to default to None as the parent dict get method does

)

# Case sensitive on the specific query
self.assertEqual(KeyError, locations.get("3101 Patterson Ave, richmond, va"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to update the existing get tests to assertEquals on KeyError from assertRaises, as the python dict get method avoids raising a KeyError by default.

super(LocationCollectionDict, self).__init__(results)
self.order = order

def get(self, key, default=KeyError):
Copy link
Collaborator

@Unix-Code Unix-Code Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocationCollection extends list and has no "expected behavior" for its get function, while LocationCollectionDict has the expected behavior of returning default if the key isn't there (and the default for default is None). Now, for parity and consistency, the __getitem__ magic method can be overridden for both LocationCollection and LocationCollectionDict in order to have that be the use-case of throwing a KeyError if a key isn't present, otherwise returning the value (note that for LocationCollection, a numerical index would be a valid key in __getitem__).

The usage would be:

x = LocationCollectionDict(...)
x[some_key] # -> KeyError if missing or value if not
# AND
y = LocationCollection(...)
y[some_key] # -> KeyError if missing or value if not

# OR

x.get(some_key, some_default) # -> some_default if key is missing, otherwise value
# AND
y.get(some_key, some_default) # -> some_default if key is missing, otherwise value

This way LocationCollectionDict maintains the expected behavior of a dict, and LocationCollection has some parity with LocationCollectionDict in how you can access elements using a key. I think this is the more intuitive approach for handling this. Other opinions are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants