# Lexicographical comparisons

Suppose we don't need to support arbitrary weak orderings, but only total orderings. For many operations, including sorting, merging, and binary search, the most efficient way to implement them is still by doing only one kind of comparison: either by using only `<` and/or `>`, or by using only `<=` and/or `>=`, and in no case using `==` or `!=`.

This does not usually affect *asymptotic* efficiency. But by having an algorithm only using one kind of comparison either all or most of the times it needs to compare objects, we can often achieve a factor of ~2 speedup compared to a more obvious yet more complicated design&mdash;if we take full advantage of the information obtained each time objects are compared. Another way of saying this is that when it is tempting to use more than one kind of comparison, or even to do something that is most naturally done with more than one kind of comparison, it is sometimes a sign that we are not making the most of our comparisons.

**For example, consider `recursion.merge_two_alt`.** There is no need to check for equality specifically, and doing so would only add another operation that doesn't need to be done. Instead, if the right-side element needs to come before the left-side element, then it is merged in first, and otherwise the left-side element is merged in first. Stability is achieved not by checking for equal elements and making sure to do the right thing, but instead by writing the code in such a way that it never asks for the information it would need to do the wrong thing. (This is also how `recursion.merge_two` works, though it is a bit harder to see.)

**As another example, consider `recursion.binary_search`.** As implemented, it is no faster than if it used both `<` and `==`, because for each pair of objects compared, those same objects are compared twice. This performs two comparisons to divide the search space into three parts: elements to the left of the element being exained, elements to the right of the element being examined, and the one element being examined. That is, one of the comparisons excludes half, or almost half, of the remaining elements from the search space, and then the other comparison excludes one, or all but one, element from the search space. This achieves a best-case time complexity of O(1), but at the cost of making the average and worst-case running times slower by a factor of ~2.

This is okay for the binary search implementations in `recursion.py`, since we are only concerned with optimal *asymptotic* performance. But production quality binary search implementations avoid checking if the current element is a match until the very end, when the search space consists of just that element. Alternatively, they may not bother checking this at all, but just return an index to the beginning or end of where matching elements, if any, *would have to be*. Hence `bisect.bisect_left` and `bisect_bisect.right` in Python, and `std::lower_bound` and `std::upper_bound` in C++. (Reimplementing `bisect_left` and `bisect_right` will be the topic of some future exercises.)

**In the Python standard library, supporting arbitrary weak orderings is not actually a design goal.** Instead, efficiency is a goal, so most comparison-based algorithms use only `<`, and documenting the minimum necessary interface to support comparison is a goal, so we can rely on `sorted`, `list.sort`, and algorithms in the `bisect` module using only `<`. Furthermore, `sorted` and `list.sort` guarantee stability, which they automatically achieve by using only `<`, including when a key selector function is used. Supporting arbitrary weak orderings is not a design goal, but it follows from other design goals and their documented solutions, and we can rely on it.

**But some kinds of operations *slow down* by a factor of ~2, instead of speeding up, if only one kind of comparison is used.** Lexicographic comparison is such an operation. Suppose `a` and `b` are both lists, or tuples, or some other sequence supporting lexicograpic comparison. To compute `a < b` using only `<` would require that each pair of corresponding elements be compared twice before moving on to the next pair of elements. (Make sure you understand why.)

In lexicographic comparison, the number of comparisons therefore decreases by about half if `==` is used until a mismatch is detected, and then `<` is used to determine which element is greater. This is actually still a situation where, to achieve best performance, almost all comparison work should be done by one kind of comparison, but here it is the opposite of the situation in binary search: in lexicogrphic comparison, all but one of the comparisons of sequence elements should be `==` instead of `<`. **So Python does it this way, and lexicographic comparison does not work properly with arbitary weak orderings on elements of the objects being compared.** To use lexicographic comparison, order comparisons on your elements should be totally ordered. This is explored and demonstrated more concretely below.

## Lexicographical `<` uses `==`

So lexicographical comparisons work when elements are totally ordered but not for arbitrary weak orderings.

In [1]:
import logging

In [2]:
class Noisy:
    """A wrapper that logs rich comparisons and hashing."""
    
    __slots__ = ('_value',)
    
    def __init__(self, value):
        """Create a new Noisy wrapper for a given object."""
        self._value = value
    
    def __repr__(self):
        """Representation of this object for debugging."""
        return f'{type(self).__name__}({self._value!r})'
    
    def __str__(self):
        """Informal string representation, same as the wrapped object's."""
        return str(self._value)
    
    def __eq__(self, other):
        """Delegate to equality comparison for the wrapped object."""
        logging.info(f'{self!r}.__eq__({other!r})')
        
        if isinstance(other, type(self)):
            if hasattr(type(self._value), '__eq__'):
                return self._value.__eq__(other._value)
        
        return NotImplemented
    
    def __ne__(self, other):
        """Delegate to not-equal comparison for the wrapped object."""
        logging.info(f'{self!r}.__ne__({other!r})')
        
        if isinstance(other, type(self)):
            if hasattr(type(self._value), '__ne__'):
                return self._value.__ne__(other._value)
        
        return NotImplemented
    
    def __lt__(self, other):
        """Delegate to less-than comparison for the wrapped object."""
        logging.info(f'{self!r}.__lt__({other!r})')
        
        if isinstance(other, type(self)):
            if hasattr(type(self._value), '__lt__'):
                return self._value.__lt__(other._value)
        
        return NotImplemented
    
    def __gt__(self, other):
        """Delegate to greater-than comparison for the wrapped object."""
        logging.info(f'{self!r}.__gt__({other!r})')
        
        if isinstance(other, type(self)):
            if hasattr(type(self._value), '__gt__'):
                return self._value.__gt__(other._value)
        
        return NotImplemented

    def __le__(self, other):
        """Delegate to less-or-equal comparison for the wrapped object."""
        logging.info(f'{self!r}.__le__({other!r})')
        
        if isinstance(other, type(self)):
            if hasattr(type(self._value), '__le__'):
                return self._value.__le__(other._value)
        
        return NotImplemented
    
    def __ge__(self, other):
        """Delegate to greater-or-equal comparison for the wrapped object."""
        logging.info(f'{self!r}.__ge__({other!r})')
        
        if isinstance(other, type(self)):
            if hasattr(type(self._value), '__ge__'):
                return self._value.__ge__(other._value)
        
        return NotImplemented
    
    def __hash__(self):
        """Delegate to the wrapped object's __hash__ and log the call."""
        logging.info(f'{self!r}.__hash__()')
    
    @property
    def value(self):
        """The value this Noisy wrapper holds."""
        return self._value

In [3]:
logging.getLogger().setLevel(logging.INFO)

In [4]:
a = [*(Noisy(i) for i in range(1, 6)), Noisy(8)]
b = [*(Noisy(i) for i in range(1, 6)), Noisy(9)]

In [5]:
a

[Noisy(1), Noisy(2), Noisy(3), Noisy(4), Noisy(5), Noisy(8)]

In [6]:
b

[Noisy(1), Noisy(2), Noisy(3), Noisy(4), Noisy(5), Noisy(9)]

### Lists&lsquo; `__eq__` and `__ne__` use elements&lsquo; `__eq__`.

No surprise here.

In [7]:
a == b  # Uses __eq__. Anything different would be astonishing.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))


False

In [8]:
a != b  # Also uses __eq__, but that's still fine and unsurprising.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))


True

In [9]:
a.__eq__(b)  # Same result as with == when we call __eq__ directly.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))


False

In [10]:
a.__ne__(b)  # Same result as with != when we call __ne__ directly.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))


True

### Lists&rsquo; `__lt__`, `__gt__`, `__le__`, `__ge__` use elements&rsquo; `__eq__`!

Lexicographical order comparisons use `==` until they find differing elements, or until at least one sequence is exhausted, then they use the order-comparison operator you used to find out which direction the disparity is in.

This is the case both for the strict (`<` and `>`) and non-strict (`<=` and `>=`) order comparison operators.

In [11]:
a < b

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__lt__(Noisy(9))


True

In [12]:
a > b

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__gt__(Noisy(9))


False

In [13]:
a <= b

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__le__(Noisy(9))


True

In [14]:
a >= b

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__ge__(Noisy(9))


False

In [15]:
a.__lt__(b)  # Same result as with < when we call __lt__ directly.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__lt__(Noisy(9))


True

In [16]:
a.__gt__(b)  # Same result as with > when we call __gt__ directly.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__gt__(Noisy(9))


False

In [17]:
a.__le__(b)  # Same result as with <= when we call __le__ directly.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__le__(Noisy(9))


True

In [18]:
a.__ge__(b)  # Same result as with <= when we call __ge__ directly.

INFO:root:Noisy(1).__eq__(Noisy(1))
INFO:root:Noisy(2).__eq__(Noisy(2))
INFO:root:Noisy(3).__eq__(Noisy(3))
INFO:root:Noisy(4).__eq__(Noisy(4))
INFO:root:Noisy(5).__eq__(Noisy(5))
INFO:root:Noisy(8).__eq__(Noisy(9))
INFO:root:Noisy(8).__ge__(Noisy(9))


False

### So you can&rsquo;t use it with arbitrary weak orderings.

Lexicographical order comparison in Python uses equality comparison to find the first position (if any) where objects differ, only then doing the order comparison. This always works with total orderings, but it would not usually work with non-total weak orderings.

In [19]:
from compare import OrderIndistinct

In [20]:
c = [OrderIndistinct('C'), 10]
d = [OrderIndistinct('D'), 20]

In [21]:
c < d  # We would want this to be True, since 10 < 20.

False

In [22]:
c > d  # False, but not for the reason we want it to be.

False

The second element never got compared, as this reveals:

In [23]:
cc = [OrderIndistinct('CC'), Noisy(10)]
dd = [OrderIndistinct('DD'), Noisy(20)]

In [24]:
c < d

False

In [25]:
c > d

False

`Noisy` didn&rsquo;t log anything, so in neither case were the second elements compared.