-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Make get_multiple_*
methods pass tests by not halting for one bad input
#128
Conversation
Instead of halting/raising exceptions, the get_multiple_* methods now continue to work when encountering bad inputs. This works by catching exceptions raised by its respective get_*_air methods. Why? Because get_multiple_* (I think) is supposed to be a convenient wrapper for multiple inputs at once. It's not really convenient to break the entire execution for one bad input. For consistency, the output dataframe will have exactly one row per one input, where bad inputs get all-nan rows apart from its 'city' or 'lat'-'lon' columns.
@lahdjirayhan Absolutely! This is something that definitely occurred to me before, but I didn't give it enough thought. The use case that you mentioned of dozens of cities (or more) is certainly one that would be annoying without this. I think it's a badly needed improvement. Excited that the tests are helping so much! Thanks again for making such a robust set of tests. |
self.get_coordinate_air(loc[0], loc[1], df=df, params=params) | ||
) | ||
except Exception: | ||
# NOTE: If we have custom exception we can catch it instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm overall I do think we should have custom exceptions.
However, it's a good idea to add custom exceptions for the entirety of Ozone in one go (I'm sure there are other sections that need them). This way we can come up with a consistent set of exceptions for the package.
# The third row should have the "nonexistent city name" intact ... | ||
assert result.at[2, "city"] == "a definitely nonexistent city" | ||
|
||
# ... and nothing else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to go!
As laid out by the tests,
get_multiple_*_air
methods shouldn't halt when there is one bad input. One bad input (which in turn will make backend return error/bad response, which will make an exception raised in our part) shouldn't halt the execution of other, likely good, inputs. This is not convenient. For what it's worth, I've always treatedget_multiple_*
methods as convenience wrappers for their singleget_*_air
counterparts (otherwise I could just loop them myself).The example in my mind is this: you are mass-collecting data for a hundred cities. One city turns out to not have AQI station and subsequently raise exception (you likely won't know this in advance). It will be annoying to have the other 99 cities halted.
I realize that this will make an odd difference between e.g.
get_city_air
(raises exception on bad response) andget_multiple_cities_air
(returns nan row on bad response). I don't have really much to say either supporting or against.Is this a good or common enough use case? Should I implement things differently? I'm curious to hear more.