catch empty return values and error on them #157

andy-esch · 2017-01-10T18:58:13Z

This fixes a problem where empty lists of data were passed to algorithms. Before the data analysis provider was written, the default was to return [(None, None, ...)] (an empty tuple with the correct number of columns). With this PR, I'm choosing a noisy error if the input data has so many null that the data cannot be analyzed.

@talos, could you do a CR, please? Specifically, I'm interested in your opinion of the verify_data data function, and whether I should be raising an exception and handling it in the try, except block instead.

closes #143, closes #156

talos · 2017-01-10T18:58:55Z

Takin' a look!

talos

Looks OK. My main small comment is just that verify_data could be more generic.

From a bigger perspective, two thoughts:

I'm not a fan of us returning NULL when something goes wrong. We really should raise errors that terminate flow, so that functions down the line don't all have to handle NULL inputs gracefully. However, I think this is a broader design decision?
I'm seeing a lot of duplicate wrapper code, essentially of the format:

def get_analysis(...):
  try:
    data = <get some data>
    verify_data(data)
    return data
  except plpy.SPIError, err:
    plpy.error('Analysis failed: %s', %err)

I think we should make a method decorator that we apply to each of these methods, so that they can be reduced to:

def get_analysis(...):
  return <get some data>

That should remove a couple dozen lines of boilerplate, plus make our code way easier to update if we decide to change our approach to (1) above.

talos · 2017-01-10T19:32:48Z

src/py/crankshaft/crankshaft/analysis_data_provider.py

                 "FROM ({subquery}) As a "
-                 "WHERE {geom_col} IS NOT NULL").format(**params)
+                 "WHERE \"{geom_col}\" IS NOT NULL").format(**params)


I've found that these statements can be written more readably by using triple quotes:

''' SELECT "{your_fancy_column}" FROM "{your_fancy_table}" WHERE "{other_column}" = '{val}' '''.format(**params)

talos · 2017-01-10T19:33:57Z

src/py/crankshaft/crankshaft/analysis_data_provider.py

+                    'for null values and fill in appropriately.')
+
+
+def verify_data(n_rows):


If the goal is to have a more generic error checking capability, I think that verify_data should receive the actual rows, and perform a len on them itself.

That way, you could easily adjust the function to handle cases where a non-zero number of rows are returned, but something is wrong with the rows.

👍 agree. Since there is only one check now I was thinking it'd save a bit of time on passing around all the data that could be in data, but that timing is probably pretty minimal compared to other things in the code.

Since the data exchange is 100% in Python, it shouldn't make a difference -- you're just passing around a reference to the array, none of this could should result in a copy of the data ever being made.

andy-esch · 2017-01-10T20:16:27Z

Thanks for the review @talos !
For 1. above, I may not have said it very clearly, but this PR's goal is to get rid of returning nulls in favor of terminating the flow, so all good here.

andy-esch · 2017-01-12T22:18:21Z

@talos, I added basic usage of decorators here that reduces the redundancy and increases the maintainability of the code :) Please take a look and let me know what you think. Do you still think that I should take a look at functools?

andy-esch · 2017-01-12T23:25:24Z

It looks like the functools.wrap just carries over the function metadata? In this case, that doesn't seem like an important property.

talos

Except for the ambiguous return value (commented) looks 👍

talos · 2017-01-13T15:55:55Z

src/py/crankshaft/crankshaft/analysis_data_provider.py

-            plpy.error('Analysis failed: %s' % err)
+                return data
+        except Exception, err:
+            plpy.error('Analysis failed: {}'.format(err))


Should there be a fallback outside the try/except block that's something like return pu.empty_zipped_array(2)? Or is the implicit return None of Python OK?

Good point. I guess returning [] would be a good option because the number of values in the tuples is variable depending on the get_ function run.

andy-esch · 2017-01-17T13:39:30Z

@talos, anything else before I hand this PR over?

talos · 2017-01-17T15:13:57Z

👍

catch empty return values and error on them

d679975

andy-esch added the WIP label Jan 10, 2017

fix typo on error return

10ce109

talos reviewed Jan 10, 2017

View reviewed changes

andy-esch added 4 commits January 10, 2017 14:38

add condition on null-valued geometries, ref: #143

c114cce

update verify_data to get full data reference

ca7a2d6

remove unnecessary code / tests

50f6ef0

removes unneeded function / multilines some queries

e456158

andy-esch mentioned this pull request Jan 11, 2017

[crankshaft] deploy new functions and fixes #159

Closed

classes to inherit from objects

7322931

andy-esch mentioned this pull request Jan 12, 2017

[work in progress] release request #161

Closed

2 tasks

andy-esch added 4 commits January 12, 2017 17:03

adds decorators to reduce boilerplate code

4b3481b

standardizing naming conventions in code

04bd067

adds mock error function

ddd69bb

removes print line

be2bf19

talos suggested changes Jan 13, 2017

View reviewed changes

add default return value on verify_data wrapper

8e4bbb8

talos approved these changes Jan 17, 2017

View reviewed changes

andy-esch removed the WIP label Jan 17, 2017

andy-esch mentioned this pull request Jan 8, 2018

[meta] Open PRs #194

Open

14 tasks

andy-esch added 3 commits January 9, 2018 10:23

updates error syntax

77e73db

Merge branch 'develop' into add-errors-on-null-only

b4ddfa1

Merge branch 'develop' into add-errors-on-null-only

72260d0

adds missing decorator for gwr_predict

32bb3b1

andy-esch merged commit e5a03fc into develop Jan 9, 2018

andy-esch deleted the add-errors-on-null-only branch January 9, 2018 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catch empty return values and error on them #157

catch empty return values and error on them #157

andy-esch commented Jan 10, 2017 •

edited

Loading

talos commented Jan 10, 2017

talos left a comment

talos Jan 10, 2017

talos Jan 10, 2017

andy-esch Jan 10, 2017

talos Jan 10, 2017

andy-esch commented Jan 10, 2017

andy-esch commented Jan 12, 2017 •

edited

Loading

andy-esch commented Jan 12, 2017

talos left a comment

talos Jan 13, 2017

andy-esch Jan 13, 2017

andy-esch commented Jan 17, 2017

talos commented Jan 17, 2017

		'for null values and fill in appropriately.')


		def verify_data(n_rows):

catch empty return values and error on them #157

catch empty return values and error on them #157

Conversation

andy-esch commented Jan 10, 2017 • edited Loading

talos commented Jan 10, 2017

talos left a comment

Choose a reason for hiding this comment

talos Jan 10, 2017

Choose a reason for hiding this comment

talos Jan 10, 2017

Choose a reason for hiding this comment

andy-esch Jan 10, 2017

Choose a reason for hiding this comment

talos Jan 10, 2017

Choose a reason for hiding this comment

andy-esch commented Jan 10, 2017

andy-esch commented Jan 12, 2017 • edited Loading

andy-esch commented Jan 12, 2017

talos left a comment

Choose a reason for hiding this comment

talos Jan 13, 2017

Choose a reason for hiding this comment

andy-esch Jan 13, 2017

Choose a reason for hiding this comment

andy-esch commented Jan 17, 2017

talos commented Jan 17, 2017

andy-esch commented Jan 10, 2017 •

edited

Loading

andy-esch commented Jan 12, 2017 •

edited

Loading