baml_language: implement baml run and baml pack (BEP-027)#3529
baml_language: implement baml run and baml pack (BEP-027)#3529codeshaunted wants to merge 1 commit into
baml run and baml pack (BEP-027)#3529Conversation
Adds standalone execution per BEP-027. `baml run` dispatches positional
namespace mains, `--function <name>`, `-e <expression>`, and `baml.toml`
`[scripts]` aliases through a shared dispatcher. `baml pack` bakes any
non-expression target into a self-contained executable via libsui.
Major pieces:
- baml_exec: shared dispatcher, auto-CLI flag derivation from function
signatures, JSON I/O routed through user-overridable
`baml.json.serialize` / `baml.json.deserialize` (so `to_json` /
`from_json` overrides are honored on both input and output)
- baml_pack_host: runtime host binary that extracts the bitcode-
serialized PackEnvelope embedded by `baml pack` and invokes the
baked-in target with the same dispatcher
- baml_cli/run_command.rs: rewrite delegating to baml_exec
- baml_cli/pack_command.rs: new `baml pack` entry point with
did-you-mean and target-triple cross-compilation
- baml.sys.exit(code) builtin + EngineError::Exit { code }
- BexEngine::set_argv / argv() and type-args threading
(FunctionCallContextBuilder::with_type_args) for native entry points
Spec-conformance fixes from the audit rounds:
- Mutex of --function / -e / positional / --json-args dispatch modes
- Reserved `help` param rejection (validate_help_param)
- Malformed `baml.toml` continues with empty script set + warning
- Auto-CLI rejects all types it can't faithfully represent (class /
list / map / union / media / engine-internal) with a `--json-args`
pointer; the previous catchall silently String-coerced them
- ExitCode::TargetError = 1, aligned across `baml run` and packed
binaries (BEP-027 §"Exit codes" only mandates non-zero; 1 is the
Unix convention and the packed runtime already used it)
- `--list` empty-targets case honors `--output-format json`, emitting
`{"scripts":[], "namespace_mains":[], "functions":[]}` instead of
the human-readable "No runnable targets found." text
- `baml pack -e '<expr>'` rejected with a clear "expression mode is
not packageable" message instead of a confusing clap parse error
- `--list` reports "Namespace mains" (namespaces with a `main`) as a
distinct section from "Functions"; both debug and JSON outputs
- Did-you-mean filters to function display names only
stow.toml: `baml_exec` and `baml_pack_host` join `baml_cli` as surface-
area crates that may use `anyhow` and depend directly on `bex_*`.
Tests: 139 across baml_cli + baml_exec, plus engine-level argv coverage.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR implements standalone executable packaging for BAML programs (BEP-027) by adding Serde serialization across the entire VM type system, introducing a ChangesBEP-027: Standalone Executable Packaging
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
baml_language/crates/bex_vm_types/src/types.rs (1)
619-625:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not serialize
Value::OmittedArg.Line 624 says this sentinel is only valid during argument binding and "must not be serialized or exposed to host code". Deriving serde for the whole enum turns it into a normal wire value. Please switch
Valueto the same proxy-pattern used forObject/FunctionKindand return a serde error forOmittedArg.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@baml_language/crates/bex_vm_types/src/types.rs` around lines 619 - 625, The Value enum derives Serialize/Deserialize but must not allow serializing the sentinel Value::OmittedArg; replace the derive with custom impls: implement Serialize and Deserialize for Value following the proxy-pattern used for Object/FunctionKind (create an internal serializable proxy representation for the valid variants and map to/from Value), and in both impls return a serde error if encountering Value::OmittedArg (during Serialize) or if the deserialized proxy would map to OmittedArg (during Deserialize); update/remove the #[derive(...)] on Value and ensure the impls reference the Value enum and its OmittedArg variant explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@baml_language/Cargo.toml`:
- Line 118: The dependency entry bitcode = { version = "0.6", features = [
"serde" ] } is enabling serde-based serialization which may conflict with the
crate's native Encode/Decode and hurt performance for VM type serialization;
decide whether to use bitcode's native encoding instead of serde, then update
the Cargo.toml entry for the bitcode dependency accordingly (either remove the
"serde" feature and enable the native encoding feature if provided by bitcode,
or keep "serde" if interoperability is required), and run unit tests and a
simple benchmark of VM type serialization paths (areas using Encode/Decode) to
verify behavior and performance; refer to the bitcode dependency line and the
crate's Encode/Decode usage to locate and change the configuration.
In `@baml_language/crates/baml_exec/src/auto_cli.rs`:
- Around line 65-73: In the option-value parsing branch inside auto_cli.rs where
(key, val_str) is assigned from raw or the next token, detect and reject a
following token that looks like a flag (starts with '-') instead of accepting it
as the value; specifically, after incrementing i and before using tokens[i],
check tokens[i].starts_with('-') and return an error (e.g.,
anyhow::bail!("Missing value for `--{raw}`")) to treat `--name --other=...` as a
missing-value error; update the logic around variables raw, i, tokens, key, and
val_str so the existing code path still uses tokens[i] when valid but fails fast
when the next token is a flag.
In `@baml_language/crates/baml_exec/src/dispatch.rs`:
- Around line 31-41: validate_help_param currently swallows lookup failures (if
let Ok(...)) and is called with the raw target_name in dispatch_target which can
be non-canonical; update callsite and function to use the resolved canonical
function name and propagate errors: in dispatch_target, resolve func_info first
(the existing func_info lookup) and then call validate_help_param(&engine,
func_info.name()) instead of validate_help_param(&engine, target_name); inside
validate_help_param use engine.function_params(function_name)? (or handle the
Err by returning it) rather than if let Ok(...) so missing/invalid targets don’t
skip validation, then check params.iter().any(|(name, _, _)| *name == "help")
and bail as before if found.
In `@baml_language/crates/bex_vm_types/src/heap_ptr.rs`:
- Around line 153-164: The current Serialize and Deserialize impls for HeapPtr
silently round-trip to a null pointer; instead, make both fail at the serde
boundary: in impl Serialize for HeapPtr, return
Err(serde::ser::Error::custom("HeapPtr is a runtime-only pointer and must not be
serialized")) rather than serializer.serialize_unit(); in impl<'de>
Deserialize<'de> for HeapPtr, return Err(D::Error::custom("HeapPtr cannot be
deserialized: runtime pointer leaked into serialized data")) rather than
producing HeapPtr::null(); reference the impl blocks for Serialize/Deserialize
and the HeapPtr::null symbol when making the changes.
In `@baml_language/crates/bex_vm/src/vm.rs`:
- Around line 360-364: pending_native_entry (Option<PendingNativeEntry>) holds
Vec<Value> outside stack/frames so GC can move/reclaim heap-backed args before
dispatch_native_entry; update BexVm::collect_roots and BexVm::forward_roots to
also walk self.pending_native_entry when Some(..), treating each Value as a
root/forwardable slot (mirror how stack/frames are handled), ensuring you
mark/visit and update any HeapPtr/heap-backed Values in that Vec; keep handling
of Option and ensure set_entry_point_with_type_args stores Values in
PendingNativeEntry consistently so exec/dispatch_native_entry reads forwarded
pointers.
- Around line 2207-2214: The early return when handling pending_native_entry
bypasses the standard VmError::InternalError → VmError::TracedInternalError
conversion; instead of doing `if let Some(entry) =
self.pending_native_entry.take() { return self.dispatch_native_entry(&entry);
}`, call `dispatch_native_entry` without returning immediately (e.g., take the
entry, invoke `self.dispatch_native_entry(&entry)` and assign its Result to a
local variable) and let the normal error-wrapping logic that follows run so
`$rust_function` entry points (including the unsupported `YieldToCall` case from
`dispatch_native_entry`) produce the same traced/internal error variant as
bytecode-dispatched entries.
- Around line 1019-1027: dispatch_native_entry currently treats
NativeCallResult::YieldToCall as an error, which makes yielding builtins invoked
via set_entry_point_with_type_args (e.g., baml.json.to_string<T>) fail at first
exec; fix by either (A) preventing yielding natives from being installed as
entry points in set_entry_point_with_type_args: detect native callees that can
yield (use the callee's may_yield/attribute or check if invoking the native can
return NativeCallResult::YieldToCall) and return an error/Refuse to set the
entry, or (B) extend dispatch_native_entry to handle YieldToCall for native
entry stubs by synthesizing an initial frame or scheduling the yielded call
chain so the VM can continue execution (i.e., convert YieldToCall into creating
the next call frame(s) instead of erroring). Update
set_entry_point_with_type_args and dispatch_native_entry to consistently enforce
the chosen approach and reference NativeCallResult::YieldToCall when
guarding/handling the case.
---
Outside diff comments:
In `@baml_language/crates/bex_vm_types/src/types.rs`:
- Around line 619-625: The Value enum derives Serialize/Deserialize but must not
allow serializing the sentinel Value::OmittedArg; replace the derive with custom
impls: implement Serialize and Deserialize for Value following the proxy-pattern
used for Object/FunctionKind (create an internal serializable proxy
representation for the valid variants and map to/from Value), and in both impls
return a serde error if encountering Value::OmittedArg (during Serialize) or if
the deserialized proxy would map to OmittedArg (during Deserialize);
update/remove the #[derive(...)] on Value and ensure the impls reference the
Value enum and its OmittedArg variant explicitly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3f1aab63-cb82-449a-9568-ce13fcb06e18
⛔ Files ignored due to path filters (2)
baml_language/Cargo.lockis excluded by!**/*.lockbaml_language/crates/baml_cli/src/snapshots/baml_cli__describe_command_tests__render_builtin_package_listing.snapis excluded by!**/*.snap
📒 Files selected for processing (39)
baml_language/Cargo.tomlbaml_language/crates/baml_base/Cargo.tomlbaml_language/crates/baml_base/src/attr.rsbaml_language/crates/baml_base/src/core_types.rsbaml_language/crates/baml_builtins2/baml_std/baml/ns_json/json.bamlbaml_language/crates/baml_builtins2/baml_std/baml/ns_panics/panics.bamlbaml_language/crates/baml_builtins2/baml_std/baml/ns_sys/sys.bamlbaml_language/crates/baml_builtins2_codegen/src/codegen.rsbaml_language/crates/baml_builtins2_codegen/src/codegen_io.rsbaml_language/crates/baml_cli/Cargo.tomlbaml_language/crates/baml_cli/src/commands.rsbaml_language/crates/baml_cli/src/lib.rsbaml_language/crates/baml_cli/src/pack_command.rsbaml_language/crates/baml_cli/src/run_command.rsbaml_language/crates/baml_exec/Cargo.tomlbaml_language/crates/baml_exec/src/auto_cli.rsbaml_language/crates/baml_exec/src/dispatch.rsbaml_language/crates/baml_exec/src/envelope.rsbaml_language/crates/baml_exec/src/json_coerce.rsbaml_language/crates/baml_exec/src/lib.rsbaml_language/crates/baml_exec/src/output.rsbaml_language/crates/baml_pack_host/Cargo.tomlbaml_language/crates/baml_pack_host/build.rsbaml_language/crates/baml_pack_host/src/main.rsbaml_language/crates/baml_type/Cargo.tomlbaml_language/crates/baml_type/src/lib.rsbaml_language/crates/baml_type/src/template.rsbaml_language/crates/bex_engine/src/function_call_context.rsbaml_language/crates/bex_engine/src/lib.rsbaml_language/crates/bex_engine/tests/host_argv.rsbaml_language/crates/bex_vm/src/errors.rsbaml_language/crates/bex_vm/src/package_baml/sys.rsbaml_language/crates/bex_vm/src/vm.rsbaml_language/crates/bex_vm_types/Cargo.tomlbaml_language/crates/bex_vm_types/src/bytecode.rsbaml_language/crates/bex_vm_types/src/heap_ptr.rsbaml_language/crates/bex_vm_types/src/indexable.rsbaml_language/crates/bex_vm_types/src/types.rsbaml_language/stow.toml
| aws-smithy-types = { git = "https://github.com/boundaryml/aws-sdk-rust.git", rev = "28d4f67bac1214320320905c1f6908ea32b6b0ac" } | ||
| aws-types = { git = "https://github.com/boundaryml/aws-sdk-rust.git", rev = "28d4f67bac1214320320905c1f6908ea32b6b0ac" } | ||
| axum = { version = "0.8.4", features = [ "ws" ] } | ||
| bitcode = { version = "0.6", features = [ "serde" ] } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify bitcode crate version and features
echo "=== Checking bitcode latest version ==="
cargo search bitcode --limit 1
echo -e "\n=== Checking bitcode 0.6 features ==="
cargo metadata --format-version=1 2>/dev/null | jq -r '.packages[] | select(.name == "bitcode" and (.version | startswith("0.6"))) | .features | keys[]' || echo "Package not in current Cargo.lock"Repository: BoundaryML/baml
Length of output: 533
🌐 Web query:
bitcode rust crate version 0.6 serde feature documentation
💡 Result:
For the Rust crate bitcode version 0.6.x, the serde support is controlled by the crate feature named "serde". A "serde" integration is gated behind the "serde" feature flag [1][2]. Version 0.6.9 feature set (docs.rs / crates.io): - Features present include: default, derive, std, arrayvec, glam, rust_decimal, serde, time, uuid [3]. - The docs for the serde feature explain it is gated behind "serde" [2]. - The crate-level feature table shows default enables "derive" and "std" [4], while "std" conditionally enables std features for several optional deps including serde via serde?/std [4]. What this means when using bitcode 0.6: - Enable serde integration by turning on the "serde" cargo feature for bitcode [1][2]. - Note that the serde integration is not the same as bitcode’s native Encode/Decode derive system; the serde wiki states that it is not compatible with the native bitcode::{Encode, Decode} format and is slower with slightly larger output [5]. If you’re specifically looking for the serde feature documentation for bitcode 0.6.9, start from the crate docs (they contain the serde section and gating statement) and the feature flags page (which lists the serde feature among others) [2][3].
Citations:
- 1: https://docs.rs/bitcode/latest/bitcode
- 2: https://crates.io/crates/bitcode/0.6.0
- 3: https://crates.io/crates/bitcode
- 4: https://docs.rs/crate/bitcode/0.4.0
- 5: https://www.reddit.com/r/rust/comments/1bfxqvh/bitcode_smallest_and_fastest_binary_serializer/
Verify bitcode's serde feature is the right serialization approach.
Version 0.6.9 is current and the serde feature is documented and available in bitcode 0.6.x. However, bitcode's serde integration is incompatible with the crate's native Encode/Decode serialization and incurs a performance penalty with larger output. For VM type serialization, confirm whether native bitcode encoding should be used instead of serde mode, or if serde is intentional for interoperability reasons.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/Cargo.toml` at line 118, The dependency entry bitcode = {
version = "0.6", features = [ "serde" ] } is enabling serde-based serialization
which may conflict with the crate's native Encode/Decode and hurt performance
for VM type serialization; decide whether to use bitcode's native encoding
instead of serde, then update the Cargo.toml entry for the bitcode dependency
accordingly (either remove the "serde" feature and enable the native encoding
feature if provided by bitcode, or keep "serde" if interoperability is
required), and run unit tests and a simple benchmark of VM type serialization
paths (areas using Encode/Decode) to verify behavior and performance; refer to
the bitcode dependency line and the crate's Encode/Decode usage to locate and
change the configuration.
| let (key, val_str) = if let Some(eq_pos) = raw.find('=') { | ||
| (&raw[..eq_pos], &raw[eq_pos + 1..]) | ||
| } else { | ||
| i += 1; | ||
| if i >= tokens.len() { | ||
| anyhow::bail!("Missing value for `--{raw}`"); | ||
| } | ||
| (raw, tokens[i].as_str()) | ||
| }; |
There was a problem hiding this comment.
Treat a following flag token as a missing value for --name value form.
--name --other=... is currently parsed as name="--other=..." instead of erroring. That silently misbinds args (especially for string) and hides the real CLI mistake.
Proposed fix
let (key, val_str) = if let Some(eq_pos) = raw.find('=') {
(&raw[..eq_pos], &raw[eq_pos + 1..])
} else {
i += 1;
- if i >= tokens.len() {
+ if i >= tokens.len() || tokens[i].starts_with("--") {
anyhow::bail!("Missing value for `--{raw}`");
}
(raw, tokens[i].as_str())
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let (key, val_str) = if let Some(eq_pos) = raw.find('=') { | |
| (&raw[..eq_pos], &raw[eq_pos + 1..]) | |
| } else { | |
| i += 1; | |
| if i >= tokens.len() { | |
| anyhow::bail!("Missing value for `--{raw}`"); | |
| } | |
| (raw, tokens[i].as_str()) | |
| }; | |
| let (key, val_str) = if let Some(eq_pos) = raw.find('=') { | |
| (&raw[..eq_pos], &raw[eq_pos + 1..]) | |
| } else { | |
| i += 1; | |
| if i >= tokens.len() || tokens[i].starts_with("--") { | |
| anyhow::bail!("Missing value for `--{raw}`"); | |
| } | |
| (raw, tokens[i].as_str()) | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/baml_exec/src/auto_cli.rs` around lines 65 - 73, In the
option-value parsing branch inside auto_cli.rs where (key, val_str) is assigned
from raw or the next token, detect and reject a following token that looks like
a flag (starts with '-') instead of accepting it as the value; specifically,
after incrementing i and before using tokens[i], check
tokens[i].starts_with('-') and return an error (e.g., anyhow::bail!("Missing
value for `--{raw}`")) to treat `--name --other=...` as a missing-value error;
update the logic around variables raw, i, tokens, key, and val_str so the
existing code path still uses tokens[i] when valid but fails fast when the next
token is a flag.
| pub fn validate_help_param(engine: &BexEngine, function_name: &str) -> Result<()> { | ||
| if let Ok(params) = engine.function_params(function_name) { | ||
| if params.iter().any(|(name, _, _)| *name == "help") { | ||
| anyhow::bail!( | ||
| "Target `{function_name}` declares a parameter named `help`, \ | ||
| which collides with the auto-derived `--help` flag. \ | ||
| Rename this parameter to be used as an entry point." | ||
| ); | ||
| } | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
help-parameter validation can be bypassed by non-canonical target names.
dispatch_target resolves func_info first, but then validates with raw target_name. Combined with the if let Ok(...) in validate_help_param, lookup failures skip validation entirely.
Proposed fix
- validate_help_param(&engine, target_name)?;
+ validate_help_param(&engine, &func_info.qualified_name)?; pub fn validate_help_param(engine: &BexEngine, function_name: &str) -> Result<()> {
- if let Ok(params) = engine.function_params(function_name) {
- if params.iter().any(|(name, _, _)| *name == "help") {
- anyhow::bail!(
- "Target `{function_name}` declares a parameter named `help`, \
- which collides with the auto-derived `--help` flag. \
- Rename this parameter to be used as an entry point."
- );
- }
+ let params = engine
+ .function_params(function_name)
+ .with_context(|| format!("Failed to load params for target `{function_name}`"))?;
+ if params.iter().any(|(name, _, _)| *name == "help") {
+ anyhow::bail!(
+ "Target `{function_name}` declares a parameter named `help`, \
+ which collides with the auto-derived `--help` flag. \
+ Rename this parameter to be used as an entry point."
+ );
}
Ok(())
}Also applies to: 72-72
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/baml_exec/src/dispatch.rs` around lines 31 - 41,
validate_help_param currently swallows lookup failures (if let Ok(...)) and is
called with the raw target_name in dispatch_target which can be non-canonical;
update callsite and function to use the resolved canonical function name and
propagate errors: in dispatch_target, resolve func_info first (the existing
func_info lookup) and then call validate_help_param(&engine, func_info.name())
instead of validate_help_param(&engine, target_name); inside validate_help_param
use engine.function_params(function_name)? (or handle the Err by returning it)
rather than if let Ok(...) so missing/invalid targets don’t skip validation,
then check params.iter().any(|(name, _, _)| *name == "help") and bail as before
if found.
| impl Serialize for HeapPtr { | ||
| fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> { | ||
| serializer.serialize_unit() | ||
| } | ||
| } | ||
|
|
||
| impl<'de> Deserialize<'de> for HeapPtr { | ||
| fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> { | ||
| <()>::deserialize(deserializer)?; | ||
| Ok(HeapPtr::null()) | ||
| } | ||
| } |
There was a problem hiding this comment.
Reject HeapPtr at the serde boundary instead of nulling it out.
This round-trip silently converts every serialized heap reference into HeapPtr::null(). If a runtime-only object ever leaks into a serialized Program, deserialization succeeds with bogus pointers and the first later deref becomes UB instead of a clean load-time failure. Failing fast here is much safer than manufacturing a placeholder pointer.
Suggested fix
impl Serialize for HeapPtr {
- fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
- serializer.serialize_unit()
+ fn serialize<S: serde::Serializer>(&self, _serializer: S) -> Result<S::Ok, S::Error> {
+ Err(serde::ser::Error::custom("HeapPtr cannot be serialized"))
}
}
impl<'de> Deserialize<'de> for HeapPtr {
- fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
- <()>::deserialize(deserializer)?;
- Ok(HeapPtr::null())
+ fn deserialize<D: serde::Deserializer<'de>>(_deserializer: D) -> Result<Self, D::Error> {
+ Err(serde::de::Error::custom("HeapPtr cannot be deserialized"))
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| impl Serialize for HeapPtr { | |
| fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> { | |
| serializer.serialize_unit() | |
| } | |
| } | |
| impl<'de> Deserialize<'de> for HeapPtr { | |
| fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> { | |
| <()>::deserialize(deserializer)?; | |
| Ok(HeapPtr::null()) | |
| } | |
| } | |
| impl Serialize for HeapPtr { | |
| fn serialize<S: serde::Serializer>(&self, _serializer: S) -> Result<S::Ok, S::Error> { | |
| Err(serde::ser::Error::custom("HeapPtr cannot be serialized")) | |
| } | |
| } | |
| impl<'de> Deserialize<'de> for HeapPtr { | |
| fn deserialize<D: serde::Deserializer<'de>>(_deserializer: D) -> Result<Self, D::Error> { | |
| Err(serde::de::Error::custom("HeapPtr cannot be deserialized")) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/bex_vm_types/src/heap_ptr.rs` around lines 153 - 164,
The current Serialize and Deserialize impls for HeapPtr silently round-trip to a
null pointer; instead, make both fail at the serde boundary: in impl Serialize
for HeapPtr, return Err(serde::ser::Error::custom("HeapPtr is a runtime-only
pointer and must not be serialized")) rather than serializer.serialize_unit();
in impl<'de> Deserialize<'de> for HeapPtr, return Err(D::Error::custom("HeapPtr
cannot be deserialized: runtime pointer leaked into serialized data")) rather
than producing HeapPtr::null(); reference the impl blocks for
Serialize/Deserialize and the HeapPtr::null symbol when making the changes.
| /// Set when the host invokes a `$rust_function` callee as the entry | ||
| /// point. Such callees have no bytecode body, so the exec loop's first | ||
| /// step dispatches this native call directly and produces a `Complete` | ||
| /// state with its return value instead of reading bytecode. | ||
| pending_native_entry: Option<PendingNativeEntry>, |
There was a problem hiding this comment.
Root deferred native-entry args through GC.
pending_native_entry stores Vec<Value> outside stack/frames, but BexVm::collect_roots() and BexVm::forward_roots() only walk the stack, watch state, and frames. If GC runs after set_entry_point_with_type_args() but before the first exec(), any heap-backed argument here can be moved or reclaimed, and dispatch_native_entry() will read stale HeapPtrs.
Suggested fix
+impl RootHaver for PendingNativeEntry {
+ fn collect_roots(&self, roots: &mut Vec<HeapPtr>) {
+ for value in &self.args {
+ if let Value::Object(ptr) = value {
+ roots.push(*ptr);
+ }
+ }
+ }
+
+ fn forward_roots(&mut self, roots: &HashMap<HeapPtr, HeapPtr>) {
+ for value in &mut self.args {
+ if let Value::Object(ptr) = value {
+ if let Some(&new_ptr) = roots.get(ptr) {
+ *ptr = new_ptr;
+ }
+ }
+ }
+ }
+}Also thread self.pending_native_entry through BexVm::collect_roots() and BexVm::forward_roots().
Also applies to: 367-379
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/bex_vm/src/vm.rs` around lines 360 - 364,
pending_native_entry (Option<PendingNativeEntry>) holds Vec<Value> outside
stack/frames so GC can move/reclaim heap-backed args before
dispatch_native_entry; update BexVm::collect_roots and BexVm::forward_roots to
also walk self.pending_native_entry when Some(..), treating each Value as a
root/forwardable slot (mirror how stack/frames are handled), ensuring you
mark/visit and update any HeapPtr/heap-backed Values in that Vec; keep handling
of Option and ensure set_entry_point_with_type_args stores Values in
PendingNativeEntry consistently so exec/dispatch_native_entry reads forwarded
pointers.
| /// Like [`Self::set_entry_point`], but seeds the entry frame's | ||
| /// `type_args` slot. Use when the host invokes a generic function | ||
| /// (e.g. `baml.json.to_string<T>`) and needs to thread `T` through. | ||
| /// | ||
| /// Native (`$rust_function`) callees have no bytecode body, so for | ||
| /// those we synthesize a single-frame stub that produces the native | ||
| /// call's return value on the first `exec()` step instead of pushing | ||
| /// a bytecode frame that would read an empty instruction stream. | ||
| pub fn set_entry_point_with_type_args( |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Native helpers that can yield into BAML:"
rg -n -C2 'NativeCallResult::YieldToCall' \
baml_language/crates/bex_vm/src/package_baml \
baml_language/crates/baml_builtins2_codegen/src \
baml_language/crates/baml_exec/src
echo
echo "JSON/native entrypoints referenced by this change:"
rg -n -C2 'baml\.json\.(to_string|from_string|serialize|deserialize)|to_json|from_json' \
baml_language/crates/baml_builtins2 \
baml_language/crates/baml_builtins2_codegen/srcRepository: BoundaryML/baml
Length of output: 27722
🏁 Script executed:
cd baml_language/crates/bex_vm/src && sed -n '2164,2192p' vm.rsRepository: BoundaryML/baml
Length of output: 1530
baml.json.to_string<T> and other yielding builtins fail when invoked as entry points.
set_entry_point_with_type_args() supports host-invoked generic builtins like baml.json.to_string<T>, but dispatch_native_entry() rejects any NativeCallResult::YieldToCall as an error. The JSON serialization path (baml.json.to_json<T>, which to_string<T> calls) is marked //baml:may_yield and returns YieldToCall for nested type dispatch. Similarly, Array<T>.to_json and Map.to_json yield internally. Any of these used as entry points will fail on first exec() with "native entry-point YieldToCall is not supported".
Restrict entry point support to non-yielding builtins, or refactor native entry points to handle yielding dispatches before the first frame.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/bex_vm/src/vm.rs` around lines 1019 - 1027,
dispatch_native_entry currently treats NativeCallResult::YieldToCall as an
error, which makes yielding builtins invoked via set_entry_point_with_type_args
(e.g., baml.json.to_string<T>) fail at first exec; fix by either (A) preventing
yielding natives from being installed as entry points in
set_entry_point_with_type_args: detect native callees that can yield (use the
callee's may_yield/attribute or check if invoking the native can return
NativeCallResult::YieldToCall) and return an error/Refuse to set the entry, or
(B) extend dispatch_native_entry to handle YieldToCall for native entry stubs by
synthesizing an initial frame or scheduling the yielded call chain so the VM can
continue execution (i.e., convert YieldToCall into creating the next call
frame(s) instead of erroring). Update set_entry_point_with_type_args and
dispatch_native_entry to consistently enforce the chosen approach and reference
NativeCallResult::YieldToCall when guarding/handling the case.
| // Native (`$rust_function`) entry point set by | ||
| // `set_entry_point_with_type_args` — no bytecode to interpret, so | ||
| // dispatch the native call here and surface its result as | ||
| // `Complete`. Errors flow through the panic/throw machinery just | ||
| // like a bytecode-dispatched native call would. | ||
| if let Some(entry) = self.pending_native_entry.take() { | ||
| return self.dispatch_native_entry(&entry); | ||
| } |
There was a problem hiding this comment.
Keep deferred native entries on the normal error-wrapping path.
The early return on Line 2212 bypasses the VmError::InternalError → VmError::TracedInternalError conversion immediately below. That makes $rust_function entry points report a different error variant than bytecode entry points, including the unsupported-YieldToCall case from dispatch_native_entry().
Suggested fix
- if let Some(entry) = self.pending_native_entry.take() {
- return self.dispatch_native_entry(&entry);
- }
-
- match self.exec_inner() {
+ let result = if let Some(entry) = self.pending_native_entry.take() {
+ self.dispatch_native_entry(&entry)
+ } else {
+ self.exec_inner()
+ };
+
+ match result {
Err(VmError::InternalError(err)) => {
let trace = self.capture_stack_trace();
Err(VmError::TracedInternalError { source: err, trace })
}
other => other,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Native (`$rust_function`) entry point set by | |
| // `set_entry_point_with_type_args` — no bytecode to interpret, so | |
| // dispatch the native call here and surface its result as | |
| // `Complete`. Errors flow through the panic/throw machinery just | |
| // like a bytecode-dispatched native call would. | |
| if let Some(entry) = self.pending_native_entry.take() { | |
| return self.dispatch_native_entry(&entry); | |
| } | |
| // Native (`$rust_function`) entry point set by | |
| // `set_entry_point_with_type_args` — no bytecode to interpret, so | |
| // dispatch the native call here and surface its result as | |
| // `Complete`. Errors flow through the panic/throw machinery just | |
| // like a bytecode-dispatched native call would. | |
| let result = if let Some(entry) = self.pending_native_entry.take() { | |
| self.dispatch_native_entry(&entry) | |
| } else { | |
| self.exec_inner() | |
| }; | |
| match result { | |
| Err(VmError::InternalError(err)) => { | |
| let trace = self.capture_stack_trace(); | |
| Err(VmError::TracedInternalError { source: err, trace }) | |
| } | |
| other => other, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@baml_language/crates/bex_vm/src/vm.rs` around lines 2207 - 2214, The early
return when handling pending_native_entry bypasses the standard
VmError::InternalError → VmError::TracedInternalError conversion; instead of
doing `if let Some(entry) = self.pending_native_entry.take() { return
self.dispatch_native_entry(&entry); }`, call `dispatch_native_entry` without
returning immediately (e.g., take the entry, invoke
`self.dispatch_native_entry(&entry)` and assign its Result to a local variable)
and let the normal error-wrapping logic that follows run so `$rust_function`
entry points (including the unsupported `YieldToCall` case from
`dispatch_native_entry`) produce the same traced/internal error variant as
bytecode-dispatched entries.
Binary size checks failed❌ 1 violations · ✅ 6 passed
Details & how to fixViolations:
Add/update baselines:
[artifacts.bridge_cffi]
file_bytes = 17671608
stripped_bytes = 17671600
gzip_bytes = 6594191Generated by |
Summary
baml runandbaml packper BEP-027 (standalone execution).baml rundispatches positional namespace mains,--function <name>,-e <expression>, andbaml.toml[scripts]aliases through a sharedbaml_execdispatcher.baml packbakes any non-expression target into a self-contained executable via libsui (host binarybaml-pack-hostwith a bitcode-serializedPackEnvelopeembedded in an OS-native section).--json-args) and output is routed through user-overridablebaml.json.serialize/baml.json.deserializesoto_json/from_jsonoverrides on user classes are honored at the CLI boundary.What's new
baml_execcrate — shared executor (auto-CLI flag parsing, JSON coercion, dispatch).baml_pack_hostcrate — runtime binary that decodes the embeddedPackEnvelopeand reuses the same dispatcher.baml.sys.exit(code)builtin +EngineError::Exit { code }.BexEngine::set_argv/argv()getter and type-args threading (FunctionCallContextBuilder::with_type_args,PendingNativeEntry) so native entry points work without bytecode.baml-cli packwith did-you-mean and cross-compilation via target triple (downloads matching host from the BAML GitHub release, sha256-verified).--listdistinguishes "Namespace mains" (namespaces that have amain) from "Functions" in both debug and JSON output.Spec-conformance fixes from the audit rounds
<target>/--function/-e/--json-argsdispatch modes.helpparameter rejection (validate_help_paramruns on bothbaml runandbaml pack).baml.tomlcontinues with an empty script set + warning rather than erroring.--json-argspointer — the previous catchall silently String-coerced these.ExitCode::TargetError = 1, aligned acrossbaml runand the packed runtime (BEP-027 §"Exit codes" mandates non-zero; 1 is the Unix convention).--listempty-targets case honors--output-format json, emitting{"scripts":[], "namespace_mains":[], "functions":[]}instead of the human-readable text.baml pack -e '<expr>'rejected with a clear "expression mode is not packageable" message instead of a confusing clap parse error.Test plan
cargo nextest r --no-fail-fast --no-default-features --features ring-crypto -p baml_cli -p baml_exec— 139 tests pass.cargo clippy --workspace --all-targets -- -D warningsclean.cargo stow --checkclean (addedbaml_execandbaml_pack_hostto the same surface-area allowlist asbaml_cli).baml run --list --output-format jsonon an empty project (returns parseable JSON, not text).baml pack -e '2+2'(exits 1 with the spec-cited rejection message).baml run --listshowing Namespace mains and Functions sections on a project with namespaces.argv[1]matches the spec.Summary by CodeRabbit
baml packcommand to create standalone executables from BAML targetsbaml.sys.exit(code)function for process termination controlserialize<T>anddeserialize<T>JSON utility functions--helpfor typed function entry points with example invocations