Skip to content

Port recent changes from zk-prove-any/sp1-runner to master#4521

Merged
pmikolajczyk41 merged 21 commits intomasterfrom
pmikolajczyk/nit-4671-port-changes-to-master
Mar 19, 2026
Merged

Port recent changes from zk-prove-any/sp1-runner to master#4521
pmikolajczyk41 merged 21 commits intomasterfrom
pmikolajczyk/nit-4671-port-changes-to-master

Conversation

@pmikolajczyk41
Copy link
Copy Markdown
Member

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.98%. Comparing base (b9d1452) to head (4a8530b).
⚠️ Report is 22 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4521      +/-   ##
==========================================
+ Coverage   32.67%   32.98%   +0.31%     
==========================================
  Files         497      497              
  Lines       58849    58849              
==========================================
+ Hits        19227    19414     +187     
+ Misses      36245    36036     -209     
- Partials     3377     3399      +22     

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

❌ 7 Tests Failed:

Tests completed Failed Passed Skipped
4560 7 4553 0
View the top 3 failed tests by shortest run time
TestPruningDBSizeReduction
Stack Traces | 0.000s run time
=== RUN   TestPruningDBSizeReduction
--- FAIL: TestPruningDBSizeReduction (0.00s)
TestRedisProduceComplex/one_producer,_all_consumers_are_active
Stack Traces | 1.400s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
�[36mDEBUG�[0m[03-19|13:01:07.490] Redis stream consuming                   �[36mconsumer_id�[0m=cc2cea91-276d-4ffe-a8b2-e3779109b7bf �[36mmessage_id�[0m=1773925266344-3
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: setting result                 �[36mcid�[0m=cc2cea91-276d-4ffe-a8b2-e3779109b7bf �[36mmsgIdInStream�[0m=1773925266344-3  �[36mresultKeyInRedis�[0m=result-key:stream:3114b94a-f1da-441b-b258-72e8bdc68fb1.1773925266344-3
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xack                           �[36mcid�[0m=9da7e6f1-7582-410e-abe5-29d7573e1f39 �[36mmessageId�[0m=1773925266344-5
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xdel                           �[36mcid�[0m=161df4cb-80c4-4e4a-8596-508ce0e00b0d �[36mmessageId�[0m=1773925266344-1
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xack                           �[36mcid�[0m=50e1b99f-9760-46ff-98b1-3c20270f2e55 �[36mmessageId�[0m=1773925266344-4
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xack                           �[36mcid�[0m=cc2cea91-276d-4ffe-a8b2-e3779109b7bf �[36mmessageId�[0m=1773925266344-3
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xdel                           �[36mcid�[0m=c2fb0146-83b5-4e5b-bac6-7aae0aefb833 �[36mmessageId�[0m=1773925266344-2
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xdel                           �[36mcid�[0m=9da7e6f1-7582-410e-abe5-29d7573e1f39 �[36mmessageId�[0m=1773925266344-5
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xdel                           �[36mcid�[0m=50e1b99f-9760-46ff-98b1-3c20270f2e55 �[36mmessageId�[0m=1773925266344-4
�[36mDEBUG�[0m[03-19|13:01:07.490] consumer: xdel                           �[36mcid�[0m=cc2cea91-276d-4ffe-a8b2-e3779109b7bf �[36mmessageId�[0m=1773925266344-3
�[36mDEBUG�[0m[03-19|13:01:07.493] consumer: xdel                           �[36mcid�[0m=349b4d3b-1418-4bc5-afb2-9204382a77d8 �[36mmessageId�[0m=1773925266317-3
�[36mDEBUG�[0m[03-19|13:01:07.562] checkResponses                           �[36mresponded�[0m=92 �[36merrored�[0m=0 �[36mchecked�[0m=97
�[36mDEBUG�[0m[03-19|13:01:07.567] redis producer: check responses starting
�[31mERROR�[0m[03-19|13:01:07.576] Error from XpendingExt in getting PEL for auto claim �[31merr�[0m="context canceled" �[31mpendingLen�[0m=0
�[36mDEBUG�[0m[03-19|13:01:07.576] checkResponses                           �[36mresponded�[0m=5  �[36merrored�[0m=0 �[36mchecked�[0m=5
�[31mERROR�[0m[03-19|13:01:07.576] Error from XpendingExt in getting PEL for auto claim �[31merr�[0m="context canceled" �[31mpendingLen�[0m=0
�[31mERROR�[0m[03-19|13:01:07.576] Error from XpendingExt in getting PEL for auto claim �[31merr�[0m="context canceled" �[31mpendingLen�[0m=0
�[31mERROR�[0m[03-19|13:01:07.576] Error from XpendingExt in getting PEL for auto claim �[31merr�[0m="context canceled" �[31mpendingLen�[0m=0
�[36mDEBUG�[0m[03-19|13:01:07.693] Error destroying a stream group          �[36merror�[0m="dial tcp 127.0.0.1:44085: connect: connection refused"
--- FAIL: TestRedisProduceComplex/one_producer,_all_consumers_are_active (1.40s)
TestRedisProduceComplex
Stack Traces | 21.030s run time
=== RUN   TestRedisProduceComplex
=== PAUSE TestRedisProduceComplex
=== CONT  TestRedisProduceComplex
--- FAIL: TestRedisProduceComplex (21.03s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

@pmikolajczyk41 pmikolajczyk41 marked this pull request as ready for review March 18, 2026 17:00
Copy link
Copy Markdown
Contributor

@bragaigor bragaigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall just a few minor comments

Comment on lines +42 to +44
if req.delayed_msg_nr != 0 && !req.delayed_msg.is_empty() {
delayed_messages.insert(req.delayed_msg_nr, req.delayed_msg.clone());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also be checking req.has_delayed_msg? Or would that be incorrect if delayed_msg_nr == 0?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I think that we should actually inspect has_delayed_msg instead of delayed_msg_nr - fixed in 7187218


impl ValidationInput {
/// Extract runtime data from a request for the given target architecture.
pub fn from_request(req: &ValidationRequest, target: &str) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a few unit tests for this from_request(...)? At least one that test the case from my comment below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 626bed6

Comment on lines +167 to +170
last_block_hash: Bytes32(env.input.large_globals[0]),
last_send_root: Bytes32(env.input.large_globals[1]),
inbox_position: env.input.small_globals[0],
position_within_message: env.input.small_globals[1],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: wondering if it would improve readability if we introduced const's for these magic indexes?

impl ValidationInput {
    pub const INBOX_POSITION: usize = 0;
    pub const POSITION_WITHIN_MESSAGE: usize = 1;
    pub const LAST_BLOCK_HASH: usize = 0;
    pub const LAST_SEND_ROOT: usize = 1;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, hard to say, as these indices are used everywhere (nitro, JIT, prover, SP1, validation data structures conversion, transfer protocol); I'm open to do this, but I'd prefer to do this in another PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!

bragaigor
bragaigor previously approved these changes Mar 19, 2026
Copy link
Copy Markdown
Contributor

@bragaigor bragaigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bragaigor bragaigor removed their assignment Mar 19, 2026
Copy link
Copy Markdown
Member

@eljobe eljobe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I do my own manual review, I wanted to get your opinion on the 8 ideas that Claude's pr-review feature came up with:

PR #4521 Review Summary

"Port recent changes from zk-prove-any/sp1-runner to master" — introduces ValidationInput intermediate struct, renames old ValidationInput to ValidationRequest, refactors
JIT/prover/validator crates.

Critical Issues (2 found)

  1. Delayed message semantics changed silently

crates/validation/src/lib.rs:47 | from_request

The new from_request checks req.has_delayed_msg && !req.delayed_msg.is_empty(), but the old continuous-mode path (wavmio.rs) only checked has_delayed_msg. If Go sends has_delayed_msg=true
with an empty body (plausible edge case), the new code silently drops it. The old prepare.rs also used a different condition (delayed_msg_nr != 0). This is now the unified path — it should
match the most permissive old behavior or log a warning when discarding.

Recommendation: Either trust has_delayed_msg alone, or add a warning log when has_delayed_msg == true && delayed_msg.is_empty().

  1. Arch validation missing in native mode

crates/validator/src/engine/execution.rs:45 | validate_native

The stylus arch mismatch check was moved to machine.rs feed_machine (continuous mode only). In native mode, validate_native calls ValidationInput::from_request(&input, local_target())
directly — if user_wasms contains programs for the wrong arch, from_request silently returns empty module_asms. The old native path would panic (user_wasms[local_target()].clone()), which is
bad but at least visible. Now it's silent.

Recommendation: Move the arch validation into from_request itself, or add it to validate_native too.

Important Issues (3 found)

  1. Removed "multiple delayed batches" wire protocol validation

crates/validation/src/transfer/receiver.rs | receive_inbox

The old receive_delayed_message enforced at-most-one delayed message at the protocol level. The new generic receive_inbox accepts any number silently. If a buggy sender sends multiple
delayed messages, they'll be silently accepted.

Recommendation: Add a post-receive check delayed_messages.len() <= 1 in receive_validation_input.

  1. Preimage type no longer validated on receive

crates/validation/src/transfer/receiver.rs | receive_preimages

Old code: PreimageType::try_from(read_u8(reader)?) — rejected unknown types. New code: accepts any u8. Invalid preimage types are carried through until lookup, where they produce eprintln! +
Ok(0) (a silent failure disguised as success).

Recommendation: Keep PreimageType::try_from validation in the receiver, or improve the lookup-site error handling.

  1. No test for relocated arch validation

crates/validator/src/engine/machine.rs

The test local_stylus_target_must_be_present_if_some_target_is_present was removed from transfer/tests.rs but no replacement was added for the arch check now in feed_machine. The
load_validation_input function is also untested.

Suggestions (3 found)

  1. load_validation_input clone-then-drain is wasteful

crates/jit/src/machine.rs:337 — Clones all WASM binaries then immediately drains them. Consider taking ValidationInput by value where possible to avoid the clone.

  1. Positional globals are fragile

small_globals[0] = batch, [1] = pos_in_batch — purely by convention. A named struct (still rkyv-compatible with primitive fields) would be safer and more readable.

  1. HashMap vs BTreeMap inconsistency

module_asms uses HashMap while all other maps use BTreeMap. Non-deterministic iteration could matter if rkyv-serialized before draining. Document the choice or unify.

Strengths

  • Consolidation of scattered state into ValidationInput eliminates 3-4 duplicate conversion paths
  • Fixed old serialization bug where send_user_wasms returned early without writing a count byte
  • Good test coverage for from_request (8 focused unit tests)
  • Clean module extractions (InternalFunc, MemoryType) — pure moves with no logic changes
  • The wasip1_stub macro simplification is a nice cleanup
  • Wire format sender/receiver stay in lockstep

Recommended Action

  1. Address critical issues #1 and #2 (delayed message semantics + native arch validation)
  2. Consider adding the protocol-level guard for #3
  3. Add tests for arch validation in its new location
  4. Items #6-8 are optional improvements

@eljobe eljobe assigned pmikolajczyk41 and unassigned eljobe Mar 19, 2026
@pmikolajczyk41
Copy link
Copy Markdown
Member Author

@eljobe

  1. Delayed message semantics changed silently

Yes, this is a bit messy. Especially that part of our codebase supports multiple delayed messages here, while nitro will never send more than 1. The checks were not consistent either. For now, I simplified it even more, in d9a3ce6, to checking just has_delayed_msg, which should be the proper way how nitro client communicates if there's any delayed data attached. I intend to unify this (supporting 0/1 vs any number of delayed messages) even more in the future. But that will require changes to valnode etc, so out of scope for now.

  1. Arch validation missing in native mode

Good point, in 4e7cde2 I made from_request return Result and moved the check there.

  1. Removed "multiple delayed batches" wire protocol validation

Same answer as to 1.

  1. Preimage type no longer validated on receive

This is not a protocol responsibility. I moved checks to other, more relevant places.

  1. No test for relocated arch validation

This is tested in from_request tests.

  1. load_validation_input clone-then-drain is wasteful

Good point. Applied in 3345bec and 7ced81b

  1. Positional globals are fragile

See #4521 (comment)

  1. HashMap vs BTreeMap inconsistency

This inconsistency is very old... anyway, unified in 4a8530b

@eljobe eljobe enabled auto-merge March 19, 2026 12:58
@eljobe eljobe added this pull request to the merge queue Mar 19, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2026
@pmikolajczyk41 pmikolajczyk41 added this pull request to the merge queue Mar 19, 2026
@pmikolajczyk41 pmikolajczyk41 removed this pull request from the merge queue due to a manual request Mar 19, 2026
@pmikolajczyk41 pmikolajczyk41 added this pull request to the merge queue Mar 19, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2026
@pmikolajczyk41 pmikolajczyk41 added this pull request to the merge queue Mar 19, 2026
Merged via the queue into master with commit 2bde565 Mar 19, 2026
27 checks passed
@pmikolajczyk41 pmikolajczyk41 deleted the pmikolajczyk/nit-4671-port-changes-to-master branch March 19, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants