Skip to content

feat: @bv.expr symbolic Python frontend (v0.1 capstone) #56

@petrpan26

Description

@petrpan26

Replaces eight namespace-shaped scalar-op issues (#36-#43, all closed) with one architectural direction: ship the symbolic Python frontend as the user-facing surface for transformations and filtering.

Users write plain Python in @bv.expr functions. An AST rewriter + operator-overload tracer in the SDK lowers it to the existing JSON expression IR. The Phase 4 evaluator in crates/beava-core/src/expr.rs runs the IR per event on the apply path. No subprocess fallback; strict register-time errors on unsupported constructs.

Canonical example

Demonstrates: type inference (omit return annotation), Optional schema fields, null-aware control flow (if x is None), composition into the existing e.with_columns(...) / e.group_by(...).agg(...) shape.

@bv.event
class Click:
    user_id: str
    email: str | None         # Optional schema field — nullable in wire format
    referrer: str
    dwell_ms: int
    ts: int

# Return type inferred from the IR tree — annotation optional
@bv.expr
def email_domain(email: str | None):
    if email is None:
        return None
    parts = email.split("@")
    return parts[1].lower() if len(parts) == 2 else None

@bv.expr
def host(url: str):
    if not url.startswith(("http://", "https://")):
        return ""
    parts = url.split("/")
    return parts[2].lower() if len(parts) >= 3 else ""

@bv.expr
def dwell_bucket(dwell_ms: int) -> int:
    if   dwell_ms < 1_000:   return 0
    elif dwell_ms < 10_000:  return 1
    elif dwell_ms < 60_000:  return 2
    else:                    return 3

def ClickFeatures(e: Click):
    e = e.with_columns(
        domain        = email_domain(e.email),       # str | None propagates
        referrer_host = host(e.referrer),
        dwell_bkt     = dwell_bucket(e.dwell_ms),
    )
    return e.group_by("user_id").agg(
        clicks_24h           = bv.count(window="24h"),
        # null-aware aggregations skip rows where the field resolves to None:
        distinct_domains_24h = bv.distinct_count("domain", window="24h"),
        unique_hosts_24h     = bv.n_unique("referrer_host", window="24h"),
        deep_count_1h        = bv.count(where=bv.col("dwell_bkt") == 3, window="1h"),
    )

Scope

SDK (~1000 LOC Python). @bv.expr decorator. AST rewriter for if/else / ternary / and/or / comparison chains / in / is None. Operator-overload tracer with _SymbolicCol. Type checker at call sites against event schema (parameter types required; return type inferred from IR). JSON IR emitter using the existing wire format. Reference: torch.fx _symbolic_trace.py.

Rust (~500 LOC). Extend crates/beava-core/src/expr.rs with IfElse, LetBinding, Compare, In nodes plus null-aware semantics for existing ops (three-valued logic). One representative op per family lands as a template contributors copy for additional ops.

Contribution template (~50 LOC docs). CONTRIBUTING-OPS.md walking one full op contribution (math.log1p is a good worked example) so first-time contributors have a templated PR pattern.

Cohort positioning

Once the framework lands, adding a new op = ~30-50 LOC across four files (ExprOp variant + eval arm + tracer table + golden test). Cohort contributors ship merged PRs in a real-time database in a weekend, with the framework supporting the indefinite supply.

Done when

  • The canonical ClickFeatures example above works end-to-end with Optional fields, return-type inference, and null-aware aggregations.
  • Integration tests under python/tests/v0/test_symbolic_frontend.py exercise: AST-rewritten if/else, is None patterns, function composition, closure capture, return-type inference, where= with derived predicates.
  • CONTRIBUTING-OPS.md lands. First per-op good first issue ticket merges through it as proof.

Out of scope (deferred)

  • Nested types (struct, list, vector) — Tier 2; separate v0.1 phase
  • Subprocess Python fallback — explicitly rejected
  • Variadic args, recursion, for loops with dynamic bounds — register-time errors
  • Cross-event joins, event-time, session windows — locked or tracked separately (design: event-time semantics for v0.1+ #51)

Cohort Track-1 capstone, ~2 weeks human / ~1-2 days CC.


Sub-issues

The native GitHub sub-issue links appear in the dedicated sidebar / sub-issue panel. Organizational grouping below is for readability:

Infrastructure (pure architecture, no domain: ml): #58 / #59 / #57 / #60 / #67

Representative op batches (template for indefinite per-op good first issue pipeline): #66 / #63 / #61 / #64

Integration + onboarding: #62 / #65

Metadata

Metadata

Assignees

Labels

area: sdk-pythonPython SDK under python/beava/area: serverRust server / core / runtime-core crates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions