You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replaces eight namespace-shaped scalar-op issues (#36-#43, all closed) with one architectural direction: ship the symbolic Python frontend as the user-facing surface for transformations and filtering.
Users write plain Python in @bv.expr functions. An AST rewriter + operator-overload tracer in the SDK lowers it to the existing JSON expression IR. The Phase 4 evaluator in crates/beava-core/src/expr.rs runs the IR per event on the apply path. No subprocess fallback; strict register-time errors on unsupported constructs.
Canonical example
Demonstrates: type inference (omit return annotation), Optional schema fields, null-aware control flow (if x is None), composition into the existing e.with_columns(...) / e.group_by(...).agg(...) shape.
@bv.eventclassClick:
user_id: stremail: str|None# Optional schema field — nullable in wire formatreferrer: strdwell_ms: intts: int# Return type inferred from the IR tree — annotation optional@bv.exprdefemail_domain(email: str|None):
ifemailisNone:
returnNoneparts=email.split("@")
returnparts[1].lower() iflen(parts) ==2elseNone@bv.exprdefhost(url: str):
ifnoturl.startswith(("http://", "https://")):
return""parts=url.split("/")
returnparts[2].lower() iflen(parts) >=3else""@bv.exprdefdwell_bucket(dwell_ms: int) ->int:
ifdwell_ms<1_000: return0elifdwell_ms<10_000: return1elifdwell_ms<60_000: return2else: return3defClickFeatures(e: Click):
e=e.with_columns(
domain=email_domain(e.email), # str | None propagatesreferrer_host=host(e.referrer),
dwell_bkt=dwell_bucket(e.dwell_ms),
)
returne.group_by("user_id").agg(
clicks_24h=bv.count(window="24h"),
# null-aware aggregations skip rows where the field resolves to None:distinct_domains_24h=bv.distinct_count("domain", window="24h"),
unique_hosts_24h=bv.n_unique("referrer_host", window="24h"),
deep_count_1h=bv.count(where=bv.col("dwell_bkt") ==3, window="1h"),
)
Scope
SDK (~1000 LOC Python).@bv.expr decorator. AST rewriter for if/else / ternary / and/or / comparison chains / in / is None. Operator-overload tracer with _SymbolicCol. Type checker at call sites against event schema (parameter types required; return type inferred from IR). JSON IR emitter using the existing wire format. Reference: torch.fx _symbolic_trace.py.
Rust (~500 LOC). Extend crates/beava-core/src/expr.rs with IfElse, LetBinding, Compare, In nodes plus null-aware semantics for existing ops (three-valued logic). One representative op per family lands as a template contributors copy for additional ops.
Contribution template (~50 LOC docs).CONTRIBUTING-OPS.md walking one full op contribution (math.log1p is a good worked example) so first-time contributors have a templated PR pattern.
Cohort positioning
Once the framework lands, adding a new op = ~30-50 LOC across four files (ExprOp variant + eval arm + tracer table + golden test). Cohort contributors ship merged PRs in a real-time database in a weekend, with the framework supporting the indefinite supply.
Done when
The canonical ClickFeatures example above works end-to-end with Optional fields, return-type inference, and null-aware aggregations.
Integration tests under python/tests/v0/test_symbolic_frontend.py exercise: AST-rewritten if/else, is None patterns, function composition, closure capture, return-type inference, where= with derived predicates.
CONTRIBUTING-OPS.md lands. First per-op good first issue ticket merges through it as proof.
Replaces eight namespace-shaped scalar-op issues (#36-#43, all closed) with one architectural direction: ship the symbolic Python frontend as the user-facing surface for transformations and filtering.
Users write plain Python in
@bv.exprfunctions. An AST rewriter + operator-overload tracer in the SDK lowers it to the existing JSON expression IR. The Phase 4 evaluator incrates/beava-core/src/expr.rsruns the IR per event on the apply path. No subprocess fallback; strict register-time errors on unsupported constructs.Canonical example
Demonstrates: type inference (omit return annotation), Optional schema fields, null-aware control flow (
if x is None), composition into the existinge.with_columns(...)/e.group_by(...).agg(...)shape.Scope
SDK (~1000 LOC Python).
@bv.exprdecorator. AST rewriter forif/else/ ternary /and/or/ comparison chains /in/is None. Operator-overload tracer with_SymbolicCol. Type checker at call sites against event schema (parameter types required; return type inferred from IR). JSON IR emitter using the existing wire format. Reference: torch.fx_symbolic_trace.py.Rust (~500 LOC). Extend
crates/beava-core/src/expr.rswithIfElse,LetBinding,Compare,Innodes plus null-aware semantics for existing ops (three-valued logic). One representative op per family lands as a template contributors copy for additional ops.Contribution template (~50 LOC docs).
CONTRIBUTING-OPS.mdwalking one full op contribution (math.log1pis a good worked example) so first-time contributors have a templated PR pattern.Cohort positioning
Once the framework lands, adding a new op = ~30-50 LOC across four files (
ExprOpvariant + eval arm + tracer table + golden test). Cohort contributors ship merged PRs in a real-time database in a weekend, with the framework supporting the indefinite supply.Done when
ClickFeaturesexample above works end-to-end with Optional fields, return-type inference, and null-aware aggregations.python/tests/v0/test_symbolic_frontend.pyexercise: AST-rewritten if/else,is Nonepatterns, function composition, closure capture, return-type inference,where=with derived predicates.CONTRIBUTING-OPS.mdlands. First per-opgood first issueticket merges through it as proof.Out of scope (deferred)
forloops with dynamic bounds — register-time errorsCohort Track-1 capstone, ~2 weeks human / ~1-2 days CC.
Sub-issues
The native GitHub sub-issue links appear in the dedicated sidebar / sub-issue panel. Organizational grouping below is for readability:
Infrastructure (pure architecture, no
domain: ml): #58 / #59 / #57 / #60 / #67Representative op batches (template for indefinite per-op
good first issuepipeline): #66 / #63 / #61 / #64Integration + onboarding: #62 / #65