# Custom Functions — Advanced Practice (Just Beyond the Basics)

Tackle these focused, real-world style problems to sharpen your function design skills.

## Guidelines / Best Practices
- Prefer **pure functions** (no printing, return values instead).
- Use **type hints** and **docstrings** that clearly state behavior and edge cases.
- Validate inputs and **raise** `ValueError`/`TypeError` with actionable messages.
- Avoid mutable default arguments; prefer factories or sentinels.
- Keep helpers private (prefix with `_`) when appropriate.
- Write small tests with `assert`—provided below each task.

**How to use**: Implement code in the `# TODO` blocks. Then run the test cell(s) below each task.

---
## Task 1 — Normalize Whitespace
Write `normalize_whitespace(text: str) -> str` that:
- Collapses all runs of any whitespace (spaces, tabs, newlines) into a single space.
- Strips leading/trailing whitespace.
- Leaves non-whitespace characters untouched.

Edge cases: empty string, only whitespace, mixed tabs/newlines.

In [1]:
import re
from typing import Iterable, Callable, TypeVar, List, Tuple, Dict, Any, Optional, Iterator

def normalize_whitespace(text: str) -> str:
    """Return `text` with all consecutive whitespace collapsed to a single space.

    Examples
    --------
    >>> normalize_whitespace('  a\t b\n c  ')
    'a b c'
    """
    if not isinstance(text, str):
        raise TypeError("text must be a str")
    # TODO: implement using regex or manual scan
    return re.sub(r"\s+", " ", text).strip()  # <- starter; refine if desired

In [2]:
# Tests
assert normalize_whitespace('') == ''
assert normalize_whitespace('   ') == ''
assert normalize_whitespace('a   b') == 'a b'
assert normalize_whitespace('a\tb\n c') == 'a b c'
try:
    normalize_whitespace(None)  # type: ignore
except TypeError:
    pass
else:
    raise AssertionError('Expected TypeError for non-str input')

---
## Task 2 — Chunk an Iterable
Write `chunk(iterable, size)` that yields fixed-size **tuples** from any iterable:
- `size` must be an `int > 0` (validate).
- Last chunk may be shorter.
- Do **not** materialize the entire iterable.

Bonus: Support iterables with unknown length efficiently.

In [3]:
T = TypeVar('T')

def chunk(iterable: Iterable[T], size: int) -> Iterator[Tuple[T, ...]]:
    """Yield tuples of at most `size` from `iterable` lazily.

    Raises
    ------
    ValueError: if size <= 0
    """
    if not isinstance(size, int):
        raise TypeError("size must be an int")
    if size <= 0:
        raise ValueError("size must be > 0")
    it = iter(iterable)
    # TODO: implement lazily (no full list)
    buf: List[T] = []
    for item in it:
        buf.append(item)
        if len(buf) == size:
            yield tuple(buf)
            buf.clear()
    if buf:
        yield tuple(buf)

In [4]:
# Tests
assert list(chunk([1,2,3,4,5], 2)) == [(1,2),(3,4),(5,)]
assert list(chunk(iter(range(5)), 3)) == [(0,1,2),(3,4)]
try:
    list(chunk([1,2], 0))
except ValueError:
    pass
else:
    raise AssertionError('Expected ValueError for size <= 0')

---
## Task 3 — Safe Nested Lookup
Write `safe_get(mapping, path, default=None, sep='.')`:
- `path` is a dot-separated string of keys for nested dicts.
- If any key is missing, return `default`.
- If `default` is a **callable**, call it with no args to compute the fallback (useful to avoid eager work).
- Do not modify the input.

Example: `safe_get({'a': {'b': 2}}, 'a.b') -> 2`

In [5]:
def safe_get(mapping: Dict[str, Any], path: str, default: Any = None, sep: str = '.') -> Any:
    """Safely traverse a nested dict by a dotted path.

    If `default` is callable, it will be invoked (no args) when needed.
    """
    if not isinstance(mapping, dict):
        raise TypeError("mapping must be a dict")
    if not isinstance(path, str):
        raise TypeError("path must be a str")
    current: Any = mapping
    # TODO: implement traversal; handle empty path (return mapping)
    if path == "":
        return current
    for key in path.split(sep):
        if isinstance(current, dict) and key in current:
            current = current[key]
        else:
            return default() if callable(default) else default
    return current

In [6]:
# Tests
cfg = {"a": {"b": {"c": 42}}}
assert safe_get(cfg, "a.b.c") == 42
assert safe_get(cfg, "a.x.c", default=0) == 0
assert safe_get(cfg, "", default=1) == cfg
flag = {"called": False}
def _compute():
    flag["called"] = True
    return "fallback"
assert safe_get(cfg, "a.missing", default=_compute) == "fallback"
assert flag["called"] is True

---
## Task 4 — Frequency Count with Options
Write `frequency_count(seq, *, key=None, normalize=False)`:
- Counts occurrences in `seq`.
- If `key` is provided, map each item through `key(item)` before counting.
- If `normalize=True`, return probabilities that sum to 1.0 (float).
- Return an **ordered dict** (regular `dict` in Python 3.7+ preserves insertion order) in first-seen order.
- Avoid external libs; you may use `collections`.

Edge cases: empty input, non-hashable after key mapping (raise `TypeError`).

In [7]:
from collections import OrderedDict

U = TypeVar('U')

def frequency_count(seq: Iterable[T], *, key: Optional[Callable[[T], U]] = None,
                    normalize: bool = False) -> "OrderedDict[Any, float] | OrderedDict[Any, int]":
    """Count items in `seq` with optional key mapping and normalization.

    Returns an OrderedDict in order of first appearance.
    """
    counts: "OrderedDict[Any, int]" = OrderedDict()
    for item in seq:
        k: Any = key(item) if key else item
        try:
            hash(k)
        except Exception as e:
            raise TypeError("items must be hashable after key mapping") from e
        if k not in counts:
            counts[k] = 0
        counts[k] += 1
    if not normalize:
        return counts
    total = sum(counts.values())
    out: "OrderedDict[Any, float]" = OrderedDict((k, v/total if total else 0.0) for k, v in counts.items())
    return out

In [8]:
# Tests
from math import isclose
assert list(frequency_count('ababc').items()) == [('a',2),('b',2),('c',1)]
assert list(frequency_count(["A","a","B"], key=str.lower).items()) == [('a',2),('b',1)]
probs = frequency_count('aab', normalize=True)
assert isclose(sum(probs.values()), 1.0)
assert list(probs.items())[0][0] == 'a'

---
## Task 5 — Flatten Nested Sequences
Write `flatten(nested)` that takes arbitrarily nested lists/tuples and returns a **generator** of leaf values in left-to-right order.
- Treat only `list` and `tuple` as nestable; leave other iterables (like `str`, `bytes`) as atomic.
- Implement **iteratively** with an explicit stack (avoid recursion depth issues).

In [9]:
def flatten(nested: Any) -> Iterator[Any]:
    """Yield leaves from arbitrarily nested lists/tuples, left-to-right.
    Strings/bytes are treated as atomic values.
    """
    stack: List[Any] = [nested]
    while stack:
        curr = stack.pop()
        if isinstance(curr, (list, tuple)):
            # push in reverse to process left-to-right
            for el in reversed(curr):
                stack.append(el)
        else:
            yield curr

In [10]:
# Tests
assert list(flatten([1,(2,3),[4,[5]]])) == [1,2,3,4,5]
assert list(flatten('abc')) == ['abc']  # atomic
assert list(flatten([])) == []
assert list(flatten([[],[[]]])) == []

---
## Task 6 — Function Composition
Write `compose(*funcs)` that returns a function applying `funcs` **right-to-left**:
`compose(f, g, h)(x) == f(g(h(x)))`.
- Validate that all inputs are callables.
- If no functions are provided, return identity `lambda x: x`.
- Preserve `__name__` meaningfully (e.g., `'compose(f,g,h)'`).

In [11]:
def compose(*funcs: Callable[[Any], Any]) -> Callable[[Any], Any]:
    """Compose functions right-to-left. If empty, return identity.
    Example: compose(f, g, h)(x) == f(g(h(x)))
    """
    for fn in funcs:
        if not callable(fn):
            raise TypeError("all arguments must be callables")
    if not funcs:
        return lambda x: x

    def _composed(arg: Any) -> Any:
        val = arg
        for fn in reversed(funcs):
            val = fn(val)
        return val

    names = ",".join(getattr(f, "__name__", "<lambda>") for f in funcs)
    _composed.__name__ = f"compose({names})"
    return _composed

In [12]:
# Tests
def inc(x): return x + 1
def dbl(x): return 2 * x
h = compose(inc, dbl, abs)
assert h(-3) == inc(dbl(abs(-3))) == 7
id_fn = compose()
assert id_fn('x') == 'x'
try:
    compose(1)  # type: ignore
except TypeError:
    pass
else:
    raise AssertionError('Expected TypeError for non-callable inputs')

---
## Task 7 — Filtering with Multiple Predicates
Write `select(data, *predicates, mode='all')` that returns a **list** of items filtered by predicates:
- If `mode='all'`, include item only if **all** predicates return truthy.
- If `mode='any'`, include item if **any** predicate returns truthy.
- If no predicates are provided, return `list(data)`.
- Validate `mode` and that predicates are callables.

Tip: Use short-circuit logic and avoid calling predicates more than necessary per item.

In [13]:
def select(data: Iterable[T], *predicates: Callable[[T], bool], mode: str = 'all') -> List[T]:
    """Filter `data` by multiple predicates.
    mode: 'all' or 'any'. If no predicates, returns list(data).
    """
    if mode not in {'all', 'any'}:
        raise ValueError("mode must be 'all' or 'any'")
    for p in predicates:
        if not callable(p):
            raise TypeError("predicates must be callables")
    if not predicates:
        return list(data)
    out: List[T] = []
    for item in data:
        if mode == 'all':
            ok = True
            for p in predicates:
                if not p(item):
                    ok = False
                    break
        else:  # any
            ok = False
            for p in predicates:
                if p(item):
                    ok = True
                    break
        if ok:
            out.append(item)
    return out

In [14]:
# Tests
nums = list(range(10))
is_even = lambda x: x % 2 == 0
gt5 = lambda x: x > 5
assert select(nums, is_even, gt5, mode='all') == [6,8]
assert select(nums, is_even, gt5, mode='any') == [0,2,4,6,7,8,9]
assert select(nums) == nums
try:
    select(nums, mode='bad')
except ValueError:
    pass
else:
    raise AssertionError('Expected ValueError for bad mode')

---
## Task 8 — Matrix Generator with Safe Defaults
Write `gen_matrix(rows, cols, fill=None, *, factory=None)` that returns a **new** `rows × cols` matrix.
- If `factory` is provided, call it to produce each element.
- Else use `fill` (a scalar copied into each cell).
- Ensure there is **no shared mutable state** between rows (classic pitfall: `[[[]]*n]*m`).
- Validate arguments and raise helpful errors.

Examples:
- `gen_matrix(2, 3, fill=0)` → `[[0,0,0],[0,0,0]]`
- `gen_matrix(2, 2, factory=list)` → `[ [[], []], [[], []] ]` with **distinct** lists.

In [15]:
def gen_matrix(rows: int, cols: int, fill: Any = None, *, factory: Optional[Callable[[], Any]] = None) -> List[List[Any]]:
    """Create a rows×cols matrix with either scalar `fill` or `factory()` per cell.
    Avoids shared row references.
    """
    if not isinstance(rows, int) or not isinstance(cols, int):
        raise TypeError("rows and cols must be ints")
    if rows < 0 or cols < 0:
        raise ValueError("rows and cols must be >= 0")
    if factory is not None and not callable(factory):
        raise TypeError("factory must be callable or None")
    out: List[List[Any]] = []
    for _ in range(rows):
        row: List[Any] = []
        for __ in range(cols):
            row.append(factory() if factory else fill)
        out.append(row)
    return out

In [16]:
# Tests
m = gen_matrix(2,3, fill=0)
assert m == [[0,0,0],[0,0,0]]
m2 = gen_matrix(2,2, factory=list)
assert m2 == [[[],[]],[[],[]]]
m2[0][0].append(1)
assert m2[1][0] == []  # distinct cells
try:
    gen_matrix(-1, 2)
except ValueError:
    pass
else:
    raise AssertionError('Expected ValueError for negative size')

---
## Task 9 — Sortedness Check (Streaming)
Write `is_sorted(iterable, *, key=None, reverse=False)` that returns `True` if the iterable is sorted according to `key` and `reverse`.
- Do not materialize the iterable; compare adjacent elements only once.
- `key` works like `sorted`.
- Empty or 1-element iterables are sorted by definition.

In [17]:
def is_sorted(iterable: Iterable[T], *, key: Optional[Callable[[T], Any]] = None, reverse: bool = False) -> bool:
    """Return True if `iterable` is sorted ascending (default) or descending.
    Uses streaming comparison without materializing the full iterable.
    """
    it = iter(iterable)
    try:
        prev = next(it)
    except StopIteration:
        return True
    k = (lambda x: x) if key is None else key
    prev_k = k(prev)
    for curr in it:
        curr_k = k(curr)
        if reverse:
            if curr_k > prev_k:
                return False
        else:
            if curr_k < prev_k:
                return False
        prev_k = curr_k
    return True

In [18]:
# Tests
assert is_sorted([1,2,2,3])
assert not is_sorted([3,1,2])
assert is_sorted([3,2,2,1], reverse=True)
assert is_sorted([], reverse=True)
words = ["a","aa","aaa"]
assert is_sorted(words, key=len)
assert not is_sorted(["aa","a"], key=len)

---
## Task 10 — Simple Timing Decorator
Write `timed(fn=None, *, unit='s')` decorator that measures call duration and returns `(result, elapsed)`.
- Supports usage as `@timed()` and `@timed`.
- `unit` in `{ 's', 'ms' }`.
- Do not print; return the elapsed time.
- Preserve `__name__`/`__doc__` with `functools.wraps`.

Example:
```python
@timed(unit='ms')
def work():
    ...
res, ms = work()
```

In [19]:
import time
import functools

def timed(fn: Optional[Callable[..., Any]] = None, *, unit: str = 's'):
    """Decorator measuring duration; returns (result, elapsed).
    unit: 's' seconds or 'ms' milliseconds.
    Supports @timed and @timed().
    """
    if unit not in {'s','ms'}:
        raise ValueError("unit must be 's' or 'ms'")

    def _decorate(func: Callable[..., Any]):
        @functools.wraps(func)
        def wrapper(*args: Any, **kwargs: Any):
            start = time.perf_counter()
            result = func(*args, **kwargs)
            elapsed = time.perf_counter() - start
            if unit == 'ms':
                elapsed *= 1_000
            return result, elapsed
        return wrapper
    return _decorate if fn is None else _decorate(fn)

In [20]:
# Tests
@timed
def _slow_sum(n: int = 30_000):
    s = 0
    for i in range(n):
        s += i
    return s

res, elapsed = _slow_sum(10_000)
assert isinstance(res, int) and elapsed >= 0

@timed(unit='ms')
def _noop():
    return 123

res2, ms = _noop()
assert res2 == 123 and ms >= 0
try:
    timed(unit='minutes')  # type: ignore
except ValueError:
    pass
else:
    raise AssertionError('Expected ValueError for bad unit')

---
## (Optional) Stretch — Lightweight Memoization
Write `memoize(fn)` decorator that caches results by args/kwargs **only if all are hashable**.
- Provide `cache_clear()` on the wrapped function.
- Raise `TypeError` if called with unhashable arguments (to keep it simple).

In [21]:
def memoize(fn: Callable[..., Any]) -> Callable[..., Any]:
    cache: Dict[Tuple[Any, Tuple[Tuple[str, Any], ...]], Any] = {}
    @functools.wraps(fn)
    def wrapper(*args: Any, **kwargs: Any):
        try:
            key = (args, tuple(sorted(kwargs.items())))
            hash(key)
        except Exception as e:
            raise TypeError("arguments must be hashable for memoize") from e
        if key not in cache:
            cache[key] = fn(*args, **kwargs)
        return cache[key]
    def cache_clear() -> None:
        cache.clear()
    wrapper.cache_clear = cache_clear  # type: ignore[attr-defined]
    return wrapper

# Quick test
@memoize
def fib(n: int) -> int:
    return n if n < 2 else fib(n-1) + fib(n-2)
assert fib(10) == 55
fib.cache_clear()  # type: ignore[attr-defined]