Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in _sorteddict.KeysView #41

Closed
pkch opened this issue Mar 31, 2012 · 4 comments
Closed

bug in _sorteddict.KeysView #41

pkch opened this issue Mar 31, 2012 · 4 comments
Labels
Milestone

Comments

@pkch
Copy link

pkch commented Mar 31, 2012

Example:

import blist
d = blist.sorteddict((1,1),(2,2))
e = blist.sorteddict((1,1),(3,3))
d.keys() & e.keys()

fails with AttributeError. The culprit is the following method in _sorteddict.KeysView:

def _from_iterable(cls, it):
    return sortedset(key=self._mapping._sortedkeys.key)

First, this method is probably meant to be @classmethod. Second, the call to sortedset needs it as the first argument. Third, _mapping is only available from an instance, which we don't have here.

After the (trivial) fix to address the first two issues, the code I showed above works fine.

However, if sorteddict instance uses a custom sort key, it is lost. I don't know how to fix this. Possibly it shouldn't be fixed; what sort key should be used if the two sorteddict have different sort keys? Using the one from the LHS is quite arbitrary, even if we could retrieve it. I really don't know what the right approach is here.

@DanielStutzbach
Copy link
Owner

Thanks for finding this problem and digging into it. Python allows a classmethod to be overridden with an instance method. Would the following work?

def _from_iterable(self, it):
  return sortedset(it, key=self._mapping.sortedkeys.key)

I agree that it's not clear what the caller expects when mixing sort keys. Probably it should raise an exception? In that case, it would be necessary to override __or__ etc. and then _from_iterable could be removed entirely.

@pkch
Copy link
Author

pkch commented Apr 1, 2012

I would not like an override for 2 reasons.

  1. _from_iterable used frequently by collections.ABC, and so changing its
    semantics from needing just a class to needing an instance feels
    inappropriate.
  2. you wouldn't know when the LHS and RHS sortedsets don't have the same
    key. In that case, using LHS key is really bad since it breaks the expected
    symmetry of and / or .

An alternative that doesn't suffer from these problems is to make
_from_iterable @classmethod, and use default key. Unfortunately, it results
in another unpleasant outcome: two sorted sets with the same key result in
a new set with the default key. Very unexpected and annoying to users.

So I like your second proposal the most: remove _from_iterable entirely,
and override and/or/etc. directly. While doing that, raise
exception if the keys for LHS and RHS are not the same. Apart from a bit
extra code, I see one major issue: how can you tell if key1 and key2 are
the same?

x = sortedset((1,2,3), key = lambda x : -x)
y = sortedset((3,4,5), key = lambda x : -x)
print(x & y) # raises exception since lambda x: -x != lambda x: -x

If no good answer to that is available, perhaps just let this be a
limitation of blist.sortedset - not a dealbreaker, IMO.

I'm asking a question about it on SO:
http://stackoverflow.com/questions/9963155/comparing-functions

If this can be done, ideally it should be done at construction (i.e.,
merging identical key functions), to avoid wasting time every time someone
needs to do a set operation. But unfortunately, unlike lambdas, regular
functions may change (I believe) if someone does crazy stuff and modifies
the function's source code. So I suppose I'd recommend assuming that
different named functions are certainly different, regardless of the code
in them; and lambdas could actually be checked for equality by reviewing
the source code at construction.

Max

On Sat, Mar 31, 2012 at 11:29 AM, Daniel Stutzbach <
reply@reply.github.com

wrote:

Thanks for finding this problem and digging into it. Python allows a
classmethod to be overridden with an instance method. Would the following
work?

def _from_iterable(self, it):
return sortedset(it, key=self._mapping.sortedkeys.key)

I agree that it's not clear what the caller expects when mixing sort keys.
Probably it should raise an exception? In that case, it would be
necessary to override or etc. and then _from_iterable could be removed
entirely.


Reply to this email directly or view it on GitHub:
#41 (comment)

@DanielStutzbach
Copy link
Owner

Checking if two functions have identical functionality is undecidable. It'd be safest and easiest to just check if the two key functions are the same function using the "is" operator.

@pkch
Copy link
Author

pkch commented Apr 2, 2012

Well, it's undecidable in general, but it can be easily decided for the
cases that you would care about. I think if the key argument is given by a
named function, the standard "is" operator is sufficient; even if two
different named functions happen to define the same order, the programmer
thought of them as separate functions, and it's fair to raise an exception
if the programmer decides to calculate a union of two sortedsets with such
keys:

def f(x):
  return x['id']

def g(x):
  return x['id']

s1 = blist.sortedset(key = f)
# ...
s2 = blist.sortedset(key = g)
# ...
s = s1 & s2
# probably a bug: if s1 and s2 were intended to be compatible, 
# the programmer would likely have used function f as the key for both

The only time I might recommend doing anything other than "is" operator is
for lambda functions:

s1 = blist.sortedset(lambda x : x['id'])
#...
s2 = blist.sortedset(lambda x : x['id'])
# ...
s = s1 & s2
# probably not a bug; the programmer might well have meant
# the two sorted sets to be compatible
# note that Python would report them as not equal

Finally, the sort key function could be a callable function, rather than a function; e.g., itemgetter('id'). I would hope that for a callable object, the author defined a meaningful equality operator, if appropriate (otherwise the operator is would be fine). (For some reason, itemgetter's equality defaults to is; that might be just a minor oversight.)

For the time being, I wouldn't worry about that, and just like you suggest, use == for everything (which would default to is for all functions).

Max

On Apr 1, 2012 8:38 AM, "Daniel Stutzbach" <
reply@reply.github.com>
wrote:

Checking if two functions have identical functionality is undecidable.
It'd be safest and easiest to just check if the two key functions are the
same function using the "is" operator.


Reply to this email directly or view it on GitHub:
#41 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants