Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch empty return values and error on them #157

Merged
merged 16 commits into from
Jan 9, 2018

Conversation

andy-esch
Copy link
Contributor

@andy-esch andy-esch commented Jan 10, 2017

This fixes a problem where empty lists of data were passed to algorithms. Before the data analysis provider was written, the default was to return [(None, None, ...)] (an empty tuple with the correct number of columns). With this PR, I'm choosing a noisy error if the input data has so many null that the data cannot be analyzed.

@talos, could you do a CR, please? Specifically, I'm interested in your opinion of the verify_data data function, and whether I should be raising an exception and handling it in the try, except block instead.

closes #143, closes #156

@andy-esch andy-esch added the WIP label Jan 10, 2017
@talos
Copy link

talos commented Jan 10, 2017

Takin' a look!

Copy link

@talos talos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK. My main small comment is just that verify_data could be more generic.

From a bigger perspective, two thoughts:

  1. I'm not a fan of us returning NULL when something goes wrong. We really should raise errors that terminate flow, so that functions down the line don't all have to handle NULL inputs gracefully. However, I think this is a broader design decision?
  2. I'm seeing a lot of duplicate wrapper code, essentially of the format:
def get_analysis(...):
  try:
    data = <get some data>
    verify_data(data)
    return data
  except plpy.SPIError, err:
    plpy.error('Analysis failed: %s', %err)

I think we should make a method decorator that we apply to each of these methods, so that they can be reduced to:

def get_analysis(...):
  return <get some data>

That should remove a couple dozen lines of boilerplate, plus make our code way easier to update if we decide to change our approach to (1) above.

"FROM ({subquery}) As a "
"WHERE {geom_col} IS NOT NULL").format(**params)
"WHERE \"{geom_col}\" IS NOT NULL").format(**params)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that these statements can be written more readably by using triple quotes:

'''
SELECT "{your_fancy_column}"
FROM "{your_fancy_table}"
WHERE "{other_column}" = '{val}'
'''.format(**params)

'for null values and fill in appropriately.')


def verify_data(n_rows):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to have a more generic error checking capability, I think that verify_data should receive the actual rows, and perform a len on them itself.

That way, you could easily adjust the function to handle cases where a non-zero number of rows are returned, but something is wrong with the rows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 agree. Since there is only one check now I was thinking it'd save a bit of time on passing around all the data that could be in data, but that timing is probably pretty minimal compared to other things in the code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the data exchange is 100% in Python, it shouldn't make a difference -- you're just passing around a reference to the array, none of this could should result in a copy of the data ever being made.

@andy-esch
Copy link
Contributor Author

Thanks for the review @talos !
For 1. above, I may not have said it very clearly, but this PR's goal is to get rid of returning nulls in favor of terminating the flow, so all good here.

@andy-esch andy-esch mentioned this pull request Jan 12, 2017
2 tasks
@andy-esch
Copy link
Contributor Author

andy-esch commented Jan 12, 2017

@talos, I added basic usage of decorators here that reduces the redundancy and increases the maintainability of the code :) Please take a look and let me know what you think. Do you still think that I should take a look at functools?

@andy-esch
Copy link
Contributor Author

It looks like the functools.wrap just carries over the function metadata? In this case, that doesn't seem like an important property.

Copy link

@talos talos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the ambiguous return value (commented) looks 👍

plpy.error('Analysis failed: %s' % err)
return data
except Exception, err:
plpy.error('Analysis failed: {}'.format(err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a fallback outside the try/except block that's something like return pu.empty_zipped_array(2)? Or is the implicit return None of Python OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess returning [] would be a good option because the number of values in the tuples is variable depending on the get_ function run.

@andy-esch
Copy link
Contributor Author

@talos, anything else before I hand this PR over?

@talos
Copy link

talos commented Jan 17, 2017

👍

@andy-esch andy-esch removed the WIP label Jan 17, 2017
@andy-esch andy-esch mentioned this pull request Jan 8, 2018
14 tasks
@andy-esch andy-esch merged commit e5a03fc into develop Jan 9, 2018
@andy-esch andy-esch deleted the add-errors-on-null-only branch January 9, 2018 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spatial markov not failing for invalid table or column names Execute Moran without geometries should fail?
2 participants