-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
We have been refactoring Jawk''s eval/interpret runtime to reduce overhead for precompiled AWK expressions and make structured input first-class for embedders such as MetricsHub.
The main target is repeated execution of small precompiled expressions through Awk.eval(...) and Awk.interpret(...), especially when a host application executes thousands of small AWK snippets in tight loops.
Motivation
MetricsHub compiles connectors ahead of time and stores the resulting AwkTuples in a binary connector artifact. At runtime, the product should be able to deserialize thousands of precompiled AWK expressions and execute them immediately, without paying avoidable initialization costs.
Important use cases:
- evaluate precompiled expressions repeatedly
- evaluate against either plain text input or structured input sources
- avoid serializing/deserializing
$0when the host already has fields - keep the runtime boundary clean between AVM internals and JRT external interactions
Scope
This work stream covers the following runtime changes.
Runtime split and API cleanup
- keep
AVMfocused on opcode execution, variables, stacks, and pure runtime helpers - keep
JRTfocused on input/output interactions and AWK record state - simplify
Awk.eval(...)so callers configure separators through theAwkSettingsattached to theAwkinstance instead of per-call overrides - make
Awk.eval(...)andAwk.interpret(...)work naturally with precompiledAwkTuples
Structured input and lazy record state
- evolve
InputSourceso a source can provide:- record text only
- fields only
- both text and fields
- move JRT record handling behind a lazy record-state model
- keep
$0,$1..$NF, andNFsynchronized only on demand - support hosts that provide fields directly without forcing eager
$0materialization
Eval-path performance improvements
- support tuple-driven execution profiling in
AwkTuples - serialize execution metadata with precompiled tuples so deserialized tuples are execution-ready
- let
Awk.eval(...)choose the right execution path from tuple metadata instead of rescanning every time - make stateful eval thread-safe by using a fresh AVM per call
- reduce per-call setup for precompiled eval where possible
- omit redundant
SET_NUM_GLOBALSinitialization for eval tuples that never touch runtime-stack-backed variables or global metadata
Execution semantics and maintainability
- remove stale or redundant eval-specific APIs when the general eval path already handles read-only tuples
- simplify AVM preparation so eval and interpret follow the same preparation flow
- document supported reuse patterns:
- normal
Awk.eval(...) - repeated eval on precompiled tuples
- structured
InputSourceimplementations - advanced sequential
AVMreuse
- normal
Expected outcome
After this work, Jawk should provide a cleaner and faster embedding story for products such as MetricsHub:
- precompiled tuples can be deserialized and executed immediately
- field-based input sources can avoid unnecessary
$0rebuilds - repeated small eval calls pay much less setup overhead
- runtime responsibilities are clearer between
AVMandJRT
Notes
This is an optimization and architecture issue, not a request for a new public DSL/API. The goal is to improve the existing embedding APIs and runtime internals so high-throughput products can use Jawk efficiently.