New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort input validation #1517

Merged
merged 5 commits into from Jun 7, 2016

Conversation

Projects
None yet
3 participants
@kwmsmith
Member

kwmsmith commented Jun 2, 2016

No description provided.

@kwmsmith kwmsmith added this to the 0.11 milestone Jun 2, 2016

@coveralls

This comment has been minimized.

coveralls commented Jun 3, 2016

Coverage Status

Coverage increased (+0.002%) to 89.248% when pulling e9d66c4 on kwmsmith:bugfix/sort-input-validation into 0cca30a on blaze:master.

@@ -100,10 +100,32 @@ def sort(child, key=None, ascending=True):
ascending : bool, optional
Determines order of the sort
"""
if ascending not in (True, False):

This comment has been minimized.

@llllllllll

llllllllll Jun 3, 2016

Member

why are we not just calling bool on the object?

This comment has been minimized.

@kwmsmith

kwmsmith Jun 6, 2016

Member

To guard against calls like:

t.sort('a', 'b')

When the user should have done

t.sort(['a', 'b'])

This comment has been minimized.

@llllllllll

llllllllll Jun 6, 2016

Member

Ah, this is really helpful, I have tried to do that in the past. Can you just put a small comment in the source here that explains why we are doing this so I remember in the future?

key = None
if key is None or isinstance(key, _strtypes):
# Handle this case separately.
return Sort(child, None, ascending)

This comment has been minimized.

@llllllllll

llllllllll Jun 3, 2016

Member

why is the key None here?

This comment has been minimized.

@kwmsmith

kwmsmith Jun 6, 2016

Member

When the datashape is array-like and .sort() is called with a string value for key, I've found no straightforward way to validate that the string is the name of the column being sorted. We can pass in key here instead of None, but Sort() doesn't do any validation here either.

For example:

In [5]: t
Out[5]: <`t` symbol; dshape='var * {name: string}'>

In [6]: t.name.dshape
Out[6]: dshape("var * string")  # not a record dshape, no `name` column.

In [7]: t.name.sort('name')
Out[7]: t.name.sort(None, ascending=True)

In [8]: t.name.sort()
Out[8]: t.name.sort(None, ascending=True)

In [9]: t.name.sort('foobar')
Out[9]: t.name.sort(None, ascending=True)

This comment has been minimized.

@llllllllll

llllllllll Jun 6, 2016

Member

if it is not a record should we just raise if we sort with a key, like, TypeError('Cannot sort a non-record by a column name')?

This comment has been minimized.

@kwmsmith

kwmsmith Jun 6, 2016

Member

I thought of doing that and I see your case. What made me go this route was the many spots in the testsuite where we sort like:

t.name.sort('name')

And if that now gives an exception like the above, I can imagine it would be surprising to users. I'm undecided here. Ideally we'd be able to extract the column name from child and compare to key and validate, but that seems like it would require a lot of work with expression dshapes, and would probably break other code that depends on t.name being non-record dshape. Thoughts?

This comment has been minimized.

@llllllllll

llllllllll Jun 6, 2016

Member

The code you showed is sort of ambigious with a dshape like: var * {name: var * {name: A}}. Do you know if there is any code in the wild that does this with sort?

Side note, it looks like pd.Series.sort just ignores the first argument (though it is named 'axis'.

This comment has been minimized.

@kwmsmith

kwmsmith Jun 7, 2016

Member

Do you know if there is any code in the wild that does this with sort?

I've not come across any, no.

it looks like pd.Series.sort just ignores the first argument

It's raising ValueError for me, pandas version 0.18.0.

I updated the docstring to describe the single-column semantics.

@kwmsmith kwmsmith merged commit b9e8f98 into blaze:master Jun 7, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.06%) to 89.301%
Details

@kwmsmith kwmsmith deleted the kwmsmith:bugfix/sort-input-validation branch Jun 7, 2016

@coveralls

This comment has been minimized.

coveralls commented Jun 7, 2016

Coverage Status

Coverage increased (+0.06%) to 89.301% when pulling 75e2488 on kwmsmith:bugfix/sort-input-validation into 0cca30a on blaze:master.

@coveralls

This comment has been minimized.

coveralls commented Jun 7, 2016

Coverage Status

Coverage increased (+0.06%) to 89.301% when pulling 0f04ae8 on kwmsmith:bugfix/sort-input-validation into 0cca30a on blaze:master.

@kwmsmith kwmsmith modified the milestones: 0.11, 0.10.2 Jun 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment