# Advanced Filtering Patterns

This notebook extends basic filtering with powerful, Pythonic patterns:
- truthy filtering (`filter(None, ...)`)
- `itertools.filterfalse`, `compress`, `dropwhile`, `takewhile`
- predicate **combinators** and partials
- filtering dicts / list of dicts
- stateful predicates (closures/classes)
- deduplicating while filtering (`unique_by`)
- resilient filtering that **won't crash** on bad rows
- SQL-style `where` helper

*All examples return lazy iterators unless explicitly materialized.*

In [1]:
from __future__ import annotations
from itertools import filterfalse, compress, dropwhile, takewhile
from operator import itemgetter, gt, ge, lt, le
from functools import partial
from typing import Any, Callable, Dict, Iterable, Iterator, List, Tuple, TypeVar, Optional

T = TypeVar('T')
Row = Dict[str, Any]


## 1) Truthy/Falsy filtering with `filter(None, iterable)`
Keep elements that evaluate to `True` — useful for cleaning `None`, empty strings, zeros you don't want, etc.

In [2]:
dirty = ['a', '', None, 'b', 0, 3]
list(filter(None, dirty))  # removes '', None, 0

['a', 'b', 3]

If zeros are meaningful, write a tiny predicate instead:

In [3]:
def not_none(x):
    return x is not None
list(filter(not_none, dirty))

['a', '', 'b', 0, 3]

## 2) `filterfalse` and `compress`
- `filterfalse(pred, it)`: keep items where `pred(item)` is **False**.
- `compress(data, selectors)`: keep `data[i]` where `selectors[i]` is truthy.


In [4]:
nums = [1,2,3,4,5]
is_even = lambda n: n % 2 == 0
print('filterfalse odds ->', list(filterfalse(is_even, nums)))

data = ['keep', 'drop', 'also keep']
selectors = [1, 0, 1]
print('compress ->', list(compress(data, selectors)))

filterfalse odds -> [1, 3, 5]
compress -> ['keep', 'also keep']


## 3) `dropwhile` / `takewhile`
Useful for ordered streams: skip a prefix while a condition holds; or take a prefix while it holds.

In [5]:
ascending = [0,1,2,3,2,1]
lt3 = lambda x: x < 3
print('dropwhile (<3) ->', list(dropwhile(lt3, ascending)))
print('takewhile (<3) ->', list(takewhile(lt3, ascending)))

dropwhile (<3) -> [3, 2, 1]
takewhile (<3) -> [0, 1, 2]


## 4) Predicate **combinators**
Build complex filters by composing tiny predicates.

In [6]:
def p_and(*preds: Callable[[T], bool]) -> Callable[[T], bool]:
    def combined(x: T) -> bool:
        return all(p(x) for p in preds)
    return combined

def p_or(*preds: Callable[[T], bool]) -> Callable[[T], bool]:
    def combined(x: T) -> bool:
        return any(p(x) for p in preds)
    return combined

def p_not(pred: Callable[[T], bool]) -> Callable[[T], bool]:
    def neg(x: T) -> bool:
        return not pred(x)
    return neg


In [7]:
gt5 = lambda x: x > 5
even = lambda x: x % 2 == 0
list(filter(p_and(gt5, even), range(1, 11)))  # >5 AND even

[6, 8, 10]

Partial + `itemgetter` helps build clear predicates for structured data.

In [8]:
people: List[Row] = [
    {"name": "Ana", "age": 30, "city": "Sofia"},
    {"name": "Bo", "age": 19, "city": "Varna"},
    {"name": "Chad", "age": 17, "city": "Plovdiv"},
]
age = itemgetter('age')
is_adult = lambda r: age(r) >= 18
list(filter(is_adult, people))

[{'name': 'Ana', 'age': 30, 'city': 'Sofia'},
 {'name': 'Bo', 'age': 19, 'city': 'Varna'}]

## 5) Filtering tuples with `itemgetter` / stocks example
Revisit your `(symbol, open, high, low, close, volume)` tuples with robust helpers.

In [9]:
SYMBOL, OPEN, HIGH, LOW, CLOSE, VOL = range(6)
closed_up = lambda rec: rec[CLOSE] > rec[OPEN]
big_volume = lambda rec: rec[VOL] >= 250_000
up_and_big = p_and(closed_up, big_volume)

quotes = [
    ('AAPL', 317.99, 319.57, 316.75, 317.13, 12901800),
    ('ACOM', 25.2, 26.6, 24.9, 26.56, 265300),
    ('ACWX', 44.49, 44.66, 44.36, 44.6, 55500),
    ('ABAX', 25.26, 25.49, 25.04, 25.42, 73700),
    ('ABIO', 3.96, 4, 3.88, 4, 38500),
    ('AAWW', 60.89, 61.44, 60.5, 61.19, 272800),
]


In [10]:
list(filter(up_and_big, quotes))

[('ACOM', 25.2, 26.6, 24.9, 26.56, 265300),
 ('AAWW', 60.89, 61.44, 60.5, 61.19, 272800)]

## 6) Stateful predicates
Sometimes the predicate needs *memory*. Use a closure or a small class.

In [11]:
def every_nth(n: int) -> Callable[[Any], bool]:
    count = -1
    def pred(_):
        nonlocal count
        count += 1
        return count % n == 0
    return pred

data = list(range(10))
list(filter(every_nth(3), data))  # keep 0th, 3rd, 6th, 9th

[0, 3, 6, 9]

Class variant (useful when you want a reusable object with `__call__`).

In [12]:
class WindowReject:
    """Reject any value that repeats within the last k items."""
    def __init__(self, k: int):
        self.k = k
        self.buf: List[Any] = []
    def __call__(self, x: Any) -> bool:
        if x in self.buf:
            accepted = False
        else:
            accepted = True
        self.buf.append(x)
        if len(self.buf) > self.k:
            self.buf.pop(0)
        return accepted

vals = [1,2,1,2,3,4,5,6,7,8]
list(filter(WindowReject(k=2), vals))

[1, 2, 3, 4, 5, 6, 7, 8]

## 7) Deduplicate while filtering: `unique_by`
Keep only the first occurrence per key (lazy, stable).

In [13]:
def unique_by(iterable: Iterable[T], key: Callable[[T], Any]) -> Iterator[T]:
    seen = set()
    for item in iterable:
        k = key(item)
        if k not in seen:
            seen.add(k)
            yield item

rows = [
    {"id": 1, "name": "A"},
    {"id": 2, "name": "B"},
    {"id": 1, "name": "A1 (dup id)"},
]
list(unique_by(rows, key=itemgetter('id')))

[{'id': 1, 'name': 'A'}, {'id': 2, 'name': 'B'}]

## 8) Resilient filtering: ignore bad rows
When a predicate may raise (e.g., bad types, missing keys), wrap it.

`safe_filter(pred, iterable, on_error=None)` keeps items where `pred` is True; if `pred` raises, it returns `on_error` (default `False`).

In [14]:
def safe_filter(pred: Callable[[T], bool], iterable: Iterable[T], *, on_error: bool = False) -> Iterator[T]:
    for item in iterable:
        try:
            if pred(item):
                yield item
        except Exception:
            if on_error:  # choose whether to keep or drop on error
                yield item

rows = [{"age": 21}, {"age": "N/A"}, {"age": 18}, {}]
is_18p = lambda r: r["age"] >= 18
list(safe_filter(is_18p, rows))  # drops error rows

[{'age': 21}, {'age': 18}]

Keep rows that error (for later inspection) by setting `on_error=True`.

In [15]:
list(safe_filter(is_18p, rows, on_error=True))

[{'age': 21}, {'age': 'N/A'}, {'age': 18}, {}]

## 9) SQL-style `where`
A tiny helper that builds a predicate from simple comparisons on dict keys.

Supported ops: `==`, `!=`, `>`, `>=`, `<`, `<=`.

In [16]:
Op = Callable[[Any, Any], bool]
OPS: Dict[str, Op] = {">": gt, ">=": ge, "<": lt, "<=": le, "==": lambda a,b: a==b, "!=": lambda a,b: a!=b}

def where(**spec: Tuple[str, Any]) -> Callable[[Row], bool]:
    """
    Build a predicate from clauses like:
        where(age=('>=', 18), city=('==', 'Sofia'))
    """
    checks: List[Callable[[Row], bool]] = []
    for field, (op_sym, value) in spec.items():
        op = OPS[op_sym]
        checks.append(lambda row, f=field, o=op, v=value: o(row.get(f), v))
    return p_and(*checks)


In [17]:
rows = [
    {"name": "Ana", "age": 30, "city": "Sofia"},
    {"name": "Bo", "age": 19, "city": "Varna"},
    {"name": "Chad", "age": 17, "city": "Plovdiv"},
]
pred = where(age=(">=", 18), city=("==", "Sofia"))
list(filter(pred, rows))

[{'name': 'Ana', 'age': 30, 'city': 'Sofia'}]

## 10) Filter pipelines: map → filter → map
Compose transformations lazily for large data.

In [18]:
def pct_change(rec):
    return (rec[CLOSE] - rec[OPEN]) / rec[OPEN]

symbols = map(itemgetter(SYMBOL), filter(p_and(closed_up, big_volume), quotes))
list(symbols)

['ACOM', 'AAWW']

## 11) Testing quick wins
Tiny regression tests for key helpers.

In [19]:
# Predicate combinators
assert list(filter(p_or(lambda x: x<2, lambda x: x>8), range(0,11))) == [0,1,9,10]
assert list(filter(p_not(lambda x: x%2==0), range(6))) == [1,3,5]

# unique_by
u = list(unique_by([{"k":1},{"k":2},{"k":1}], key=itemgetter('k')))
assert [r['k'] for r in u] == [1,2]

# where
pred = where(a=('>=',2), b=('==','x'))
rows = [{"a":1,"b":"x"},{"a":2,"b":"y"},{"a":2,"b":"x"}]
assert list(filter(pred, rows)) == [{"a":2,"b":"x"}]

# safe_filter keeps on_error when asked
bad = [{"v":1},{"v":"NaN"}]
is_pos = lambda r: r['v']>0
assert list(safe_filter(is_pos, bad, on_error=True)) == bad

print('✅ All tests passed.')

✅ All tests passed.
