# Comparisons and Ordering

## Partial Ordering

If $a$ comes before $b$, and $b$ comes before $c$, then $a$ comes before $c$.

If $x$ comes before $y$, then $y$ does not come before $x$.

## Weak Ordering

A partial ordering where "is neither less than nor greater than" is transitive.

Take $x \sim y$ to mean that neither $x < y$ nor $y < x$.

Then a weak ordering is a partial ordering where $\sim$ is transitive:

If $a \sim b$ and $b \sim c$, then $a \sim c$.

## Some partial orderings aren't weak orderings.

### Sets are not even weakly ordered.

#### Example 1:

Take $A = \{1, 2\}$, $B = \{1, 3\}$, and $C = \{1, 2, 5\}$.

$A \sim B$ and $B \sim C$, but $A \subset C$ (thus $A \not\sim C$).

#### Example 2:

Take $A = \{2\}$, $B = \{3\}$, and $C = \{1, 2\}$.

$A \sim B$ and $B \sim C$, but $A \subset C$ (thus $A \not\sim C$).

Suppose we want to build an sorted list of $C$, $B$, and $A$.

Starting with $C$, we have:

$[C]$

Then we receive $B$. We can simply append it, since it can appear in any order with respect to $C$:

$[C, B]$

Then we receive $A$. Comparing it to $B$, they can appear in any order, so we would put it at the end:

$[C, B, A]$

But this is not sorted, because $A < C$.

What was the wrong assumption we made in apply this *insertion sort* algorithm to input that was not even a weak ordering?

We assumed that because the list we were building, $[C, B]$, was sorted, and $A$ was permitted to appear after the last element of that list, that $A$ would be permitted to appear after *all* elements of that list.

This assumption is what makes most sorting algorithms work. It is guaranteed for any weak ordering. It is not guaranteed for arbitrary partial orderings, and "is a proper subset of" on sets is an example of a partial ordering for which it does not hold.

## Total Ordering (a.k.a. Strong Ordering)

A partial ordering where $x < y$ or $x = y$ or $x > y$.

In other words, either $x$ comes before $y$, or $y$ comes before $x$, or $x$ and $y$ are equal.

## Relationship between weak and total (strong) ordering

In a weak ordering, $\sim$ is transitive: if $x \sim y$ and $y \sim z$, then $x \sim z$.

Note that it is also symmetric: if $x \sim y$ then $y \sim x$. And reflexive: $x \sim x$.

Because of this, elements clump together with those they are neither less nor greater than.

Across separate clumps, we always have $<$ or $>$.

*This is to say that the clumps are totally ordered.*

In math lingo, a relation that is symmetric, reflexive, and transitive is said to be an *equivalence relation*. The clumps are called *equivalence classes*.

What we mean when we say an equivalence class $S$ comes before an equivalence class $T$ in the induced total ordering of equivlance classes is that given any $x \in S$ and $y \in T$, $x$ comes before $y$ in the weak ordering.

## Why define non-total weak orderings?

`sorted` (and `list.sort`) suports an optional keyword-only argument `key=` for passing a custom key-selector function. The keys this function returns must themselves obey weak ordering.

Suppose we know any objects that may appear in some iterable, `items`, are comparable, and that this comparison obeys weak ordering. Then `sorted(items)` is a sorted list of the objects in `items`. But since weak orderings induce a total ordering on their equivalence classes (under the $\sim$ relation, as defined above), in principle there is always a totally ordered key selector function that could be used: one that, when passed an object $x$, selects a key that behaves, under order comparisons, like the equivalence class containing $x$ under the induced comparison of equivalence classes.

Given the above&mdash;and given that non-total weak orderings can be nonintuitive&mdash;the question arises of when, if ever, we should design a type so that its instances are weakly but not totally ordered.

Often this is subjective. Even when defining order comparison operators facilites more compact code, the effect of those comparisons may be less clear than if an explicit key selector function were used. With the ordering comparison operators defined, the code that uses them is smaller, faster to read, and less likely to contain a bug. But if the relationships are not intuitive, this may make the whole system harder to reason about. If the operators are implemented only for convenience, and are not part of what it *means* to be an instance of the type that provides them, then the single responsibility principle has been violated, and the system may be harder to maintain and test.

In making these design decisions, two closely releated quesions to ask are:

1. Does defining order comparison operators make it so that the code that is easy and natural to write is correct code (so that correctness is easier to achieve and maintain)? Or are there foreseeable situations where the presence of those operators would lead to confusion and subtly wrong code?

2. Does defining order comparison operators make it so that other functionality just works, using facilities of the language? Or does it, instead, cause things that used to be errors to *look* reasonable while still, really, being errors?

Really, these are the questions to ask when deciding whether to supply custom behavior for *any* operator.

### Example: Nested lexicographical comparisons

Most sequences, including lists, have lexicographic order comparisons. This is structural (i.e., recursive): it works with lists of arbitrarily great nesting, eventually recursing down to comparison of the lowermost (i.e., leaf) elements.

In [1]:
from pprint import pprint

In [2]:
nested = [[[1, 3], [2, 3, 2]],
          [[1, 3, 2], [1, 1, 2]],
          [[1, 2, 1], [2, 1, 2], [2, 2]]]

In [3]:
pprint(sorted(nested), width=40)

[[[1, 2, 1], [2, 1, 2], [2, 2]],
 [[1, 3], [2, 3, 2]],
 [[1, 3, 2], [1, 1, 2]]]


This works because of how `<` works on lists. List comparison in Python does not accept a key selector, it just recurses, using `<` at every level of nesting. If you want to sort lists lexicographically with respect to some custom ordering, but you don't implement `<` for the types of elements you store in the lists, then this is much more cumbersome to express&mdash;though of course you still *can* express it. If you find that instances of a type you are designing should always work this way, when they appear as leaves in nested sequences, that's a strong sign&mdash;though not a decisive one&mdash;that you should override `__lt__` (and related methods) accordingly.