# Advanced Sorting Patterns

This notebook extends basic `sorted()` with:
- key functions (`lambda`, `operator.itemgetter/attrgetter`)
- multi-key sorts (tuples), and per-key ascending/descending
- None/NaN-safe keys
- custom orders (e.g., weekday order, severity)
- locale/case-folded comparisons
- DSU (decorate-sort-undecorate) for performance/clarity
- grabbing top-k with `heapq` (often faster than sorting all)
- sorting custom classes

Remember: Python’s sort is **stable** (TimSort), so equal keys keep original order.

In [2]:
from operator import itemgetter, attrgetter
from functools import total_ordering
from math import isnan
import heapq
import locale

locale.setlocale(locale.LC_COLLATE, '')  # best effort; may vary by system

'Bulgarian_Bulgaria.1251'

## 1) Multi-key sorts with tuples
Supply a tuple from the key function. Sort is lexicographic on that tuple.

In [3]:
people = [('Ana', 30), ('Bo', 19), ('Ana', 31)]
sorted(people, key=lambda p: (p[0], p[1]))  # by name, then age

[('Ana', 30), ('Ana', 31), ('Bo', 19)]

Per-key ascending/descending: transform individual keys. Example: name asc, age **desc**.

In [4]:
sorted(people, key=lambda p: (p[0], -p[1]))  # minus flips numeric order

[('Ana', 31), ('Ana', 30), ('Bo', 19)]

## 2) `itemgetter` / `attrgetter` for clarity & speed
Great for lists of dicts/tuples/objects.

In [5]:
rows = [
  {'name': 'Bo', 'age': 19},
  {'name': 'Ana', 'age': 40},
  {'name': 'Ana', 'age': 30},
]
sorted(rows, key=itemgetter('name','age'))

[{'name': 'Ana', 'age': 30},
 {'name': 'Ana', 'age': 40},
 {'name': 'Bo', 'age': 19}]

In [6]:
data = ['Z','z','X','x','á']
sorted(data, key=locale.strxfrm)  # locale-aware collation (system dependent)

['á', 'x', 'X', 'z', 'Z']

Case-insensitive (and unicode-aware) with `str.casefold()`:

In [7]:
letters = ['Z','a','A','z','x','X']
sorted(letters, key=str.casefold)

['a', 'A', 'x', 'X', 'Z', 'z']

## 3) None/NaN-safe keys
Sorting with missing values often needs explicit placement (e.g., Nones last).

In [8]:
vals = [None, 2, 1, None, 3]
none_last = lambda x: (x is None, x)
sorted(vals, key=none_last)  # (False/True, value) -> Nones at end

[1, 2, 3, None, None]

In [9]:
nums = [float('nan'), 2.0, 1.0]
nan_last = lambda x: (isnan(x), x)
sorted(nums, key=nan_last)

[1.0, 2.0, nan]

## 4) Custom orders via key mapping
Map each value to its rank in a bespoke order.

In [10]:
severity = ['DEBUG','INFO','WARN','ERROR','CRITICAL']
order = {lvl:i for i,lvl in enumerate(severity)}
vals = ['INFO','ERROR','DEBUG','CRITICAL','WARN']
sorted(vals, key=order.get)

['DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']

Weekday order, regardless of locale/alphabetical:

In [11]:
wk = ['Sun','Mon','Tue','Wed','Thu','Fri','Sat']
rank = {d:i for i,d in enumerate(['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])}
sorted(wk, key=rank.get)

['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

## 5) DSU (Decorate-Sort-Undecorate)
Precompute expensive keys, sort once on the precomputed value, then strip the decoration.

In [12]:
words = ['Sorting','python','is','fun!']
decorated = [ (w.casefold(), len(w), w) for w in words ]  # compute once
decorated.sort()                                        # in-place
undecorated = [ w for _,_,w in decorated ]
undecorated

['fun!', 'is', 'python', 'Sorting']

## 6) Partial descending per key (composite)
Name asc, price **desc**, rating asc. Transform keys individually.

In [13]:
items = [('B',10,5), ('A',10,4), ('B',30,3), ('A',20,5)]  # (name, price, rating)
sorted(items, key=lambda t: (t[0], -t[1], t[2]))

[('A', 20, 5), ('A', 10, 4), ('B', 30, 3), ('B', 10, 5)]

## 7) `heapq` for top-k (avoid full sort)
When you only need the largest/smallest *k* elements.

In [14]:
scores = [12,87,45,98,23,67,90]
heapq.nlargest(3, scores)

[98, 90, 87]

In [15]:
players = [{'name':'a','score':90},{'name':'b','score':12},{'name':'c','score':98}]
heapq.nlargest(2, players, key=itemgetter('score'))

[{'name': 'c', 'score': 98}, {'name': 'a', 'score': 90}]

## 8) Sorting custom classes
Prefer providing a `key` instead of defining all rich comparisons, but both are shown.

In [16]:
class Person:
    def __init__(self, name, age):
        self.name, self.age = name, age
    def __repr__(self):
        return f"{self.name}({self.age})"

ps = [Person('Ana',31), Person('Bo',19), Person('Ana',30)]
sorted(ps, key=lambda p: (p.name, p.age))  # attrgetter('name','age') also works

[Ana(30), Ana(31), Bo(19)]

In [17]:
@total_ordering
class PersonOrd:
    def __init__(self, name, age):
        self.name, self.age = name, age
    def __lt__(self, other):
        return (self.name, self.age) < (other.name, other.age)
    def __eq__(self, other):
        return (self.name, self.age) == (other.name, other.age)
    def __repr__(self):
        return f"{self.name}({self.age})"

ps2 = [PersonOrd('Ana',31), PersonOrd('Bo',19), PersonOrd('Ana',30)]
sorted(ps2)

[Ana(30), Ana(31), Bo(19)]

## 9) Stable sort tricks: secondary then primary
Because sorting is stable, you can sort multiple times from **least** significant key to most significant.

In [18]:
people = [('Ana',31), ('Bo',19), ('Ana',30)]
people_sorted = sorted(people, key=itemgetter(1))      # age asc (secondary)
sorted(people_sorted, key=itemgetter(0))               # name asc (primary)

[('Ana', 30), ('Ana', 31), ('Bo', 19)]

## 10) Sorting strings naturally (simple approach)
For strings with numbers (e.g., `file2`, `file10`) a quick split into digit/non-digit helps. (For full natural sort across locales, consider specialized libraries; here is a lightweight key.)

In [19]:
import re
def natural_key(s: str):
    parts = re.split(r'(\d+)', s)
    return [int(p) if p.isdigit() else p.casefold() for p in parts]

files = ['file10','file2','file1']
sorted(files, key=natural_key)

['file1', 'file2', 'file10']