Skip to content

Add .explain() DSL predicate for node-level explanations#968

Merged
SkBlaz merged 4 commits intomasterfrom
copilot/add-explain-dsl-predicate
Dec 31, 2025
Merged

Add .explain() DSL predicate for node-level explanations#968
SkBlaz merged 4 commits intomasterfrom
copilot/add-explain-dsl-predicate

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 30, 2025

Adds .explain() to the DSL query API for attaching explanations (community membership, top neighbors, layer footprint) to result nodes. Explanations are computed post-filtering for efficiency and expand via to_pandas(expand_explanations=True).

Changes

DSL API

  • QueryBuilder.explain(): Dual-mode method
    • No args: returns execution plan (backward compatible)
    • With args: attaches explanations to results (new feature)
  • ExplainSpec dataclass in AST for configuration
  • Supports include/exclude lists and per-block configuration

Explanation Engine (py3plex/dsl/explain.py)

  • Community block: community_id, community_size from network partition
  • Top neighbors block: Ranked by weight or degree, layer-aware filtering
  • Layer footprint block: layers_present, n_layers_present for multilayer nodes
  • Configurable neighbor selection: metric (weight/degree), scope (layer/global), direction
  • Built-in caching for neighbor lookups

Result Integration

  • Explanations stored as regular attributes in QueryResult
  • to_pandas(expand_explanations=True) expands complex structures to JSON strings
  • Executor runs explanation step after LIMIT/SORT (only on final rows)

Usage

from py3plex.dsl import Q, L

result = (
    Q.nodes()
    .from_layers(L["social"])
    .compute("degree", "betweenness")
    .limit(20)
    .explain(
        neighbors_top=10,
        include=["community", "top_neighbors", "layer_footprint"]
    )
    .execute(network)
)

df = result.to_pandas(expand_explanations=True)
# df columns: id, layer, degree, betweenness, community_id, 
#             community_size, top_neighbors, layers_present, n_layers_present

Testing

  • 26 new tests covering all explanation blocks, per-layer grouping, and configuration options
  • All 67 DSL v2 tests pass (backward compatibility maintained)
  • Example demonstrating 5 usage patterns
Original prompt

This section details on the original issue you should resolve

<issue_title>explain()</issue_title>
<issue_description>
Goal

  • Add a new DSL predicate/operator: .explain(...) that can be chained before .execute(network)

  • It attaches “explanations” to each resulting row/entity (typically nodes), enabling:
    result.to_pandas(expand_explanations=True) # or expand_explanations="columns"

  • Must support the flagship usage:

    Q.nodes()...limit(20).explain(
    neighbors_top=10,
    include=["community", "top_neighbors", "layer_footprint"]
    ).execute(network)

and then:

df = result.to_pandas(expand_uncertainty=True, expand_explanations=True)
df[["id","layer",...,"top_neighbors"]]

High-level behavior

  • .explain() does NOT change which rows are returned; it adds metadata per row.
  • Explanations should be computed efficiently post-selection (only for returned rows).
  • Works with per-layer grouping output: if the result rows contain “layer”, neighbors should be computed within that layer when possible.

Deliverables

  1. DSL API: Query.explain(...) method
  2. Execution plan support for an “ExplainStep” (like ComputeStep/MutateStep)
  3. Explanation engine functions for nodes (phase 1) with extensibility for edges/communities later
  4. Result object storage + to_pandas(expand_explanations=True) support
  5. Tests + docs + one example snippet in README/gallery

TODO 0 — Locate architecture touchpoints

  • TODO: Identify the DSL query class (e.g., py3plex/dsl/query.py) that implements chainable ops like:
    • .where(), .compute(), .mutate(), .sort(), .limit(), .execute()
  • TODO: Identify how a query is represented internally (list of “steps”? AST nodes? pipeline ops?)
  • TODO: Identify the result wrapper returned from .execute(network):
    • likely has .to_pandas(expand_uncertainty=...) already
  • TODO: Identify how node ids + layer info are stored in results (columns? internal schema?).

TODO 1 — Define .explain() public API
Implement a method on the Query object:

def explain(
    self,
    neighbors_top: int = 10,
    include: list[str] | None = None,
    exclude: list[str] | None = None,
    neighbors: dict | None = None,
    community: dict | None = None,
    layer_footprint: dict | None = None,
    cache: bool = True,
    as_columns: bool = True,
    prefix: str = "",
) -> "Query":

Semantics

  • include: which explanation blocks to compute.
    Default: ["community", "top_neighbors", "layer_footprint"]
  • exclude: remove any from include
  • neighbors_top: max neighbors returned in top_neighbors explanation
  • neighbors: optional config dict (metric, weight handling, direction, layer behavior)
  • as_columns: if True, store structured objects in a dedicated explanations field but expose
    also as top-level columns when expand_explanations=True.
  • prefix: optionally prefix explanation columns (e.g., "explain__")

Validation rules

  • TODO: validate include keys against supported set: {"community","top_neighbors","layer_footprint"}.
  • TODO: if both include and exclude provided, apply exclude after include resolution.
  • TODO: neighbors_top must be >= 1
  • TODO: ensure explain() can be called only once OR allow multiple explain() steps to merge config:
    • choose one:
      A) multiple calls merge includes and override config
      B) raise if explain already present
      Prefer A for ergonomics, but implement carefully.

Return value

  • Returns a new Query instance (immutable) OR mutates self depending on your DSL style.
    TODO: follow existing patterns for other predicates (e.g., .mutate returns new query?).

TODO 2 — Add pipeline “ExplainStep” to DSL execution plan

  • TODO: Introduce a new step type/class similar to ComputeStep/MutateStep:
    • name: ExplainStep
    • stores: resolved include list + options (neighbors_top, configs, prefix, cache, etc.)
  • TODO: Ensure query serialization / repr includes explain step for debugging.
  • TODO: Ensure .execute(network) recognizes ExplainStep in the step pipeline:
    • Recommended: run ExplainStep after LIMIT/SORT and after all row-reducing steps
      so only final rows are explained.

Execution ordering

  • TODO: Decide where ExplainStep runs relative to .per_layer grouping:
    • After end_grouping() results exist (rows with layer field)
    • Must be able to compute explanations per row, potentially using row["layer"].

TODO 3 — Explanation engine (Phase 1: node explanations)
Implement in a new module, e.g. py3plex/dsl/explain.py or py3plex/analysis/explain.py.

Public entrypoint
def explain_rows(
network,
rows: list[dict] | pandas.DataFrame,
*...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits December 30, 2025 21:19
- Added ExplainSpec dataclass to AST for explanation configuration
- Implemented explain() method in QueryBuilder with validation
- Created py3plex/dsl/explain.py with explanation engine for nodes
  - Community explanation (community_id, community_size)
  - Top neighbors explanation (ranked by weight or degree)
  - Layer footprint explanation (layers_present, n_layers_present)
- Integrated explanation step into executor (runs after LIMIT)
- Added expand_explanations parameter to QueryResult.to_pandas()
- Explanations support per-layer grouping context

Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
- Updated explain() to handle two modes:
  1. No args: returns ExplainQuery (execution plan, backward compat)
  2. With args: attaches explanations to results (new feature)
- Removed duplicate explain() method
- Added comprehensive tests covering all features
- All 26 explain tests pass
- Backward compatibility maintained (67 DSL v2 tests pass)

Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
- Created example_dsl_explain.py demonstrating all features
- Shows basic usage, custom explanation blocks, flagship pattern
- Includes neighbor ranking options and exclusion patterns
- All examples run successfully with correct output

Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
Copilot AI changed the title [WIP] Add new DSL predicate operator .explain(...) Add .explain() DSL predicate for node-level explanations Dec 30, 2025
Copilot AI requested a review from SkBlaz December 30, 2025 21:29
@SkBlaz SkBlaz marked this pull request as ready for review December 31, 2025 08:45
@SkBlaz SkBlaz merged commit 6c904a4 into master Dec 31, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

explain()

2 participants