# `query_df` — A Practical Tutorial

This notebook shows how to use the **generic** `query_df` utility from `rosamllib.utils` to filter arbitrary pandas DataFrames using:
- Exact matches, wildcards (`*`, `?`) with escaping
- Ranges (`gte`, `lte`, `gt`, `lt`, `eq`, `neq`)
- Regex / inverse regex
- IN / NOT IN (`in`, `nin`)
- Existence checks (`exists`, `missing`)
- Container-aware queries (e.g., lists inside cells)
- Dot-paths into nested dicts/lists inside a cell (e.g., `Meta.series.desc`, `Meta.angles[0]`)
- Approximate numeric equality (`approx`)
- Callable, per-cell predicates
- Custom operators via `register_op`

> The function is **domain-agnostic**. Any DICOM-specific helpers should live in higher-level code (e.g., your `DICOMLoader.query`).


## 0) Setup

Import the function and (optionally) the registry helpers so you can add custom operators later.

In [None]:
import sys, pandas as pd
from rosamllib.utils import query_df, register_op, _OPS
print('Python:', sys.version)
print('pandas:', pd.__version__)
print('query_df available ops:', sorted(_OPS.keys()))

## 1) Create a toy dataset
We'll create a small DataFrame that mixes strings, numbers, lists, and nested dicts/lists.

In [None]:
import numpy as np
from datetime import date

df = pd.DataFrame([
    {
        'PatientID': 'P001',
        'Modality': 'CT',
        'Age': 42,
        'Score': 1.000001,
        'StudyDate': '2023-01-01',
        'Tags': ['head', 'contrast'],
        'Meta': {'series': {'desc': 'Head CT w/ contrast'}, 'angles': [0.0, 90.0]},
    },
    {
        'PatientID': 'P002',
        'Modality': 'MR',
        'Age': 35,
        'Score': 0.999999,
        'StudyDate': '2023-02-15',
        'Tags': ['knee'],
        'Meta': {'series': {'desc': 'Knee MR T1'}, 'angles': [15.0]},
    },
    {
        'PatientID': 'PX03',
        'Modality': 'CT',
        'Age': 60,
        'Score': 1.25,
        'StudyDate': '2023-03-01',
        'Tags': [],
        'Meta': {'series': {'desc': 'Chest CT'}, 'angles': []},
    },
    {
        'PatientID': 'P004',
        'Modality': None,
        'Age': 29,
        'Score': 1.0,
        'StudyDate': '2023-04-20',
        'Tags': ['head', 'noncontrast'],
        'Meta': {'series': {'desc': 'Head CT plain'}, 'angles': [0.0]},
    },
])

df

## 2) Exact and Wildcard Matching
- Exact: `{'column': 'value'}`
- Wildcards: `*` (many), `?` (one). Escape literal `*`/`?` with `\*`/`\?`.

In [None]:
query_df(df, PatientID='P00*')[['PatientID']]

## 3) Regex & Case-Insensitive Matching
- Use `{'RegEx': pattern}` or `{'NotRegEx': pattern}`.
- Set `case_insensitive=True` to fold string comparisons and regex matches.

In [None]:
query_df(df, case_insensitive=True, Modality={'RegEx': '^c(t|oncomputed tomography)?$'})[['PatientID','Modality']]

## 4) Ranges / Comparison Operators
Supported: `gte`, `lte`, `gt`, `lt`, `eq`, `neq`.

In [None]:
query_df(df, Age={'gte': 40, 'lte': 60})[['PatientID','Age']]

## 5) IN / NOT IN, Exists / Missing

In [None]:
display(query_df(df, Modality={'in': ['CT','PT']})[['PatientID','Modality']])
display(query_df(df, Modality={'nin': ['CT','MR']})[['PatientID','Modality']])
display(query_df(df, Modality={'exists': True})[['PatientID','Modality']])
display(query_df(df, Modality={'missing': True})[['PatientID','Modality']])

## 6) Container-Aware Queries
For list-like cells (e.g., `Tags`), `contains` checks membership. A plain `eq` with a scalar also falls back to membership when the cell is a container.

In [None]:
display(query_df(df, Tags={'contains': 'head'})[['PatientID','Tags']])
display(query_df(df, Tags='contrast')[['PatientID','Tags']])  # scalar membership fallback

## 7) Approximate Numeric Equality
Use `approx` with `value` plus `atol`/`rtol`.

In [None]:
query_df(df, Score={'approx': {'value': 1.0, 'atol': 1e-6, 'rtol': 1e-6}})[['PatientID','Score']]

## 8) Callable Predicates
Pass a function that takes a single cell and returns `True`/`False`.

In [None]:
query_df(df, Age=lambda a: a is not None and a % 5 == 0)[['PatientID','Age']]

## 9) Dot-Paths into Nested Dicts / Lists in a Cell
You can reference nested content like `Meta.series.desc` or `Meta.angles[0]`. The dot-path walker supports sequence indices and `[*]` to iterate all list items.

In [None]:
display(query_df(df, **{'Meta.series.desc': {'RegEx': 'Head.*'}})[['PatientID','Meta']])
display(query_df(df, **{'Meta.angles[0]': {'gte': 0, 'lte': 1}})[['PatientID','Meta']])

## 10) OR Across Multiple Conditions for the Same Column
Pass a list of conditions — they’re OR’ed **within the same column**; AND’ed across different columns.

In [None]:
query_df(df, Modality=['CT', {'RegEx': '^M'}])[['PatientID','Modality']]

## 11) Combine Multiple Filters (AND semantics across columns)

In [None]:
query_df(
    df,
    Modality='CT',
    Tags={'contains': 'head'},
    **{'Meta.series.desc': {'RegEx': '.*contrast.*'}},
)[['PatientID','Modality','Tags','Meta']]

## 12) Custom Operators via `register_op`
You can add domain-specific operators without modifying `query_df`. An operator receives the column *Series*, the user-supplied value, and a context dict; it must return a boolean mask.

In [None]:
def op_near_int(series, target, ctx):
    """True if value is within 1 of target integer (handles NaN)."""
    try:
        tgt = int(target)
    except Exception:
        return pd.Series(False, index=series.index)
    return (series.astype('float64') - tgt).abs().le(1)

register_op('near_int', op_near_int)
print('Registered ops:', sorted(_OPS.keys()))
query_df(df, Age={'near_int': 35})[['PatientID','Age']]

## 13) Notes & Tips
- **Dot-paths** only traverse within a single cell (e.g., a `dict`/`list` stored in a column). They do not join across rows/tables.
- For **date**-like strings, prefer converting your DataFrame columns to `datetime64[ns]` first for efficient range queries.
- Use `case_insensitive=True` to normalize string comparisons and regex.
- For **float** comparisons, prefer `approx` over `eq` to avoid surprise misses due to precision.
- You can add any number of custom operators with `register_op(name, fn)`.

That’s it — happy querying!