Skip to content

Optimize eval/interpret runtime for precompiled tuples and structured input #398

@bertysentry

Description

@bertysentry

Summary

We have been refactoring Jawk''s eval/interpret runtime to reduce overhead for precompiled AWK expressions and make structured input first-class for embedders such as MetricsHub.

The main target is repeated execution of small precompiled expressions through Awk.eval(...) and Awk.interpret(...), especially when a host application executes thousands of small AWK snippets in tight loops.

Motivation

MetricsHub compiles connectors ahead of time and stores the resulting AwkTuples in a binary connector artifact. At runtime, the product should be able to deserialize thousands of precompiled AWK expressions and execute them immediately, without paying avoidable initialization costs.

Important use cases:

  • evaluate precompiled expressions repeatedly
  • evaluate against either plain text input or structured input sources
  • avoid serializing/deserializing $0 when the host already has fields
  • keep the runtime boundary clean between AVM internals and JRT external interactions

Scope

This work stream covers the following runtime changes.

Runtime split and API cleanup

  • keep AVM focused on opcode execution, variables, stacks, and pure runtime helpers
  • keep JRT focused on input/output interactions and AWK record state
  • simplify Awk.eval(...) so callers configure separators through the AwkSettings attached to the Awk instance instead of per-call overrides
  • make Awk.eval(...) and Awk.interpret(...) work naturally with precompiled AwkTuples

Structured input and lazy record state

  • evolve InputSource so a source can provide:
    • record text only
    • fields only
    • both text and fields
  • move JRT record handling behind a lazy record-state model
  • keep $0, $1..$NF, and NF synchronized only on demand
  • support hosts that provide fields directly without forcing eager $0 materialization

Eval-path performance improvements

  • support tuple-driven execution profiling in AwkTuples
  • serialize execution metadata with precompiled tuples so deserialized tuples are execution-ready
  • let Awk.eval(...) choose the right execution path from tuple metadata instead of rescanning every time
  • make stateful eval thread-safe by using a fresh AVM per call
  • reduce per-call setup for precompiled eval where possible
  • omit redundant SET_NUM_GLOBALS initialization for eval tuples that never touch runtime-stack-backed variables or global metadata

Execution semantics and maintainability

  • remove stale or redundant eval-specific APIs when the general eval path already handles read-only tuples
  • simplify AVM preparation so eval and interpret follow the same preparation flow
  • document supported reuse patterns:
    • normal Awk.eval(...)
    • repeated eval on precompiled tuples
    • structured InputSource implementations
    • advanced sequential AVM reuse

Expected outcome

After this work, Jawk should provide a cleaner and faster embedding story for products such as MetricsHub:

  • precompiled tuples can be deserialized and executed immediately
  • field-based input sources can avoid unnecessary $0 rebuilds
  • repeated small eval calls pay much less setup overhead
  • runtime responsibilities are clearer between AVM and JRT

Notes

This is an optimization and architecture issue, not a request for a new public DSL/API. The goal is to improve the existing embedding APIs and runtime internals so high-throughput products can use Jawk efficiently.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions