-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catch empty return values and error on them #157
Conversation
Takin' a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. My main small comment is just that verify_data
could be more generic.
From a bigger perspective, two thoughts:
- I'm not a fan of us returning
NULL
when something goes wrong. We really should raise errors that terminate flow, so that functions down the line don't all have to handleNULL
inputs gracefully. However, I think this is a broader design decision? - I'm seeing a lot of duplicate wrapper code, essentially of the format:
def get_analysis(...):
try:
data = <get some data>
verify_data(data)
return data
except plpy.SPIError, err:
plpy.error('Analysis failed: %s', %err)
I think we should make a method decorator that we apply to each of these methods, so that they can be reduced to:
def get_analysis(...):
return <get some data>
That should remove a couple dozen lines of boilerplate, plus make our code way easier to update if we decide to change our approach to (1) above.
"FROM ({subquery}) As a " | ||
"WHERE {geom_col} IS NOT NULL").format(**params) | ||
"WHERE \"{geom_col}\" IS NOT NULL").format(**params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found that these statements can be written more readably by using triple quotes:
'''
SELECT "{your_fancy_column}"
FROM "{your_fancy_table}"
WHERE "{other_column}" = '{val}'
'''.format(**params)
'for null values and fill in appropriately.') | ||
|
||
|
||
def verify_data(n_rows): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to have a more generic error checking capability, I think that verify_data
should receive the actual rows, and perform a len
on them itself.
That way, you could easily adjust the function to handle cases where a non-zero number of rows are returned, but something is wrong with the rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 agree. Since there is only one check now I was thinking it'd save a bit of time on passing around all the data that could be in data
, but that timing is probably pretty minimal compared to other things in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the data exchange is 100% in Python, it shouldn't make a difference -- you're just passing around a reference to the array, none of this could should result in a copy of the data ever being made.
Thanks for the review @talos ! |
@talos, I added basic usage of decorators here that reduces the redundancy and increases the maintainability of the code :) Please take a look and let me know what you think. Do you still think that I should take a look at |
It looks like the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the ambiguous return value (commented) looks 👍
plpy.error('Analysis failed: %s' % err) | ||
return data | ||
except Exception, err: | ||
plpy.error('Analysis failed: {}'.format(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a fallback outside the try/except block that's something like return pu.empty_zipped_array(2)
? Or is the implicit return None
of Python OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I guess returning []
would be a good option because the number of values in the tuples is variable depending on the get_
function run.
@talos, anything else before I hand this PR over? |
👍 |
This fixes a problem where empty lists of data were passed to algorithms. Before the data analysis provider was written, the default was to return
[(None, None, ...)]
(an empty tuple with the correct number of columns). With this PR, I'm choosing a noisy error if the input data has so many null that the data cannot be analyzed.@talos, could you do a CR, please? Specifically, I'm interested in your opinion of the
verify_data
data function, and whether I should be raising an exception and handling it in thetry, except
block instead.closes #143, closes #156