fix: remove excess vec capacity in caches by hanabi1224 · Pull Request #5935 · ChainSafe/forest

hanabi1224 · 2025-08-14T09:55:37Z

Summary of changes

This PR tries to reduce potential excess Vec capacity in caches for lower memory usage

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Change checklist

I have performed a self-review of my own code,
I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
I have added tests that prove my fix is effective or that my feature works (if possible),
I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

Performance
- Reduced memory usage when building tipsets and caching messages, receipts, and events, leading to lower overhead during sync and querying.
Reliability
- Added stricter validation of block headers in tipset creation to ensure consistent parents, state roots, epochs, and unique miners, improving chain integrity.

coderabbitai · 2025-08-14T09:55:44Z

Walkthrough

Adds shrink_to_fit calls before caching or storing vectors in multiple components and introduces verify_block_headers in Tipset::new to validate header invariants before constructing a Tipset. Minor method parameter mutability updates to accommodate shrinking. One cache insert method’s signature now takes a mutable Vec.

Changes

Cohort / File(s)	Summary
Tipset construction `src/blocks/tipset.rs`	Tipset::new now shrinks header capacity and calls verify_block_headers(&headers)? before creating the Tipset; no public API signature changes.
Message cache insertion `src/chain/store/chain_store.rs`	MsgsInTipsetCache::insert now takes mut value: Vec and calls value.shrink_to_fit() before caching; method signature updated accordingly.
Receipt and event caches `src/state_manager/cache.rs`	In impl TipsetReceiptEventCacheHandler for EnabledTipsetDataCache, insert_receipt and insert_events bind inputs as mutable; shrink_to_fit applied to non-empty receipts, events, and roots before caching; trait signatures unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Tipset
  participant Verifier as verify_block_headers

  Caller->>Tipset: new(headers_in)
  Tipset->>Tipset: build NonEmpty<CachingBlockHeader>
  Tipset->>Tipset: headers.shrink_to_fit()
  Tipset->>Verifier: verify_block_headers(&headers)
  Verifier-->>Tipset: Ok() or Error
  Tipset-->>Caller: Ok(Tipset { headers, key })

sequenceDiagram
  participant Code as Caller
  participant Cache as MsgsInTipsetCache
  participant Store as Internal Cache

  Code->>Cache: insert(key, value: Vec<ChainMessage>)
  Cache->>Cache: value.shrink_to_fit()
  Cache->>Store: push(key, value)
  Store-->>Cache: ack

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

elmattic
LesnyRumcajs
akaladarshi

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hm/no-excess-vec-capacity-in-cache

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sudo-shashank

LGTM.. shrink it

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/blocks/tipset.rs (2)
244-253: Verify before shrinking to avoid wasted work on invalid input

Call verify_block_headers before shrink_to_fit() so we fail fast and avoid an unnecessary reallocation if the tipset is invalid.
-        headers.shrink_to_fit();
-        verify_block_headers(&headers)?;
+        verify_block_headers(&headers)?;
+        headers.shrink_to_fit();
551-573: Avoid allocation in verify_block_headers by iterating without collecting

verify_block_headers currently collects into a NonEmpty<&CachingBlockHeader> (allocates) just to perform simple “all equal to first” checks. You can iterate the provided iterator directly, capture the first element, and check invariants without intermediate allocation. This keeps the function zero-alloc and still enforces non-emptiness via a single peek.

For example (illustrative, not a diff):
fn verify_block_headers<'a>(
    headers: impl IntoIterator<Item = &'a CachingBlockHeader>,
) -> Result<(), CreateTipsetError> {
    let mut iter = headers.into_iter();
    let first = match iter.next() {
        Some(h) => h,
        None => return Err(CreateTipsetError::Empty),
    };

    if !iter.clone().all(|it| it.parents == first.parents) {
        return Err(CreateTipsetError::BadParents);
    }
    if !iter.clone().all(|it| it.state_root == first.state_root) {
        return Err(CreateTipsetError::BadStateRoot);
    }
    if !iter.clone().all(|it| it.epoch == first.epoch) {
        return Err(CreateTipsetError::BadEpoch);
    }
    // miner uniqueness
    if iter.map(|it| it.miner_address).chain(std::iter::once(first.miner_address)).all_unique() == false {
        return Err(CreateTipsetError::DuplicateMiner);
    }
    Ok(())
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d641ead and 532e259.

📒 Files selected for processing (3)

src/blocks/tipset.rs (1 hunks)
src/chain/store/chain_store.rs (1 hunks)
src/state_manager/cache.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

🧬 Code Graph Analysis (2)

src/chain/store/chain_store.rs (2)

src/state_manager/cache.rs (1)

insert (154-159)

src/message/chain_message.rs (1)

value (65-70)

src/state_manager/cache.rs (2)

src/blocks/tipset.rs (2)

key (335-338)

key (529-532)

src/shim/executor.rs (1)

key (305-310)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Build forest binaries on Linux AMD64
GitHub Check: tests-release
GitHub Check: tests
GitHub Check: cargo-publish-dry-run
GitHub Check: All lint checks
GitHub Check: Build Ubuntu
GitHub Check: Build MacOS
GitHub Check: Analyze (go)
GitHub Check: Analyze (rust)

🔇 Additional comments (1)

src/chain/store/chain_store.rs (1)

586-589: LGTM: shrink cached Vec to reduce retained capacity

Shrinking before pushing into the LRU is consistent with the PR goal and helps the size-tracking be more accurate.

coderabbitai · 2025-08-14T10:02:09Z

src/state_manager/cache.rs

+    fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) {
        if !receipts.is_empty() {
+            receipts.shrink_to_fit();
            self.receipt_cache.insert(key.clone(), receipts);
        }
    }


💡 Verification agent

🧩 Analysis chain

Don’t skip caching empty receipts; it causes repeated recomputation and changes semantics

Gating insertion on non-empty vectors changes behavior: callers that legitimately produce empty receipts will no longer be cached and may recompute on subsequent lookups. Empty results are still results and should be cached (negative caching).

Apply this change to always cache, while still shrinking:

- fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) { - if !receipts.is_empty() { - receipts.shrink_to_fit(); - self.receipt_cache.insert(key.clone(), receipts); - } - } + fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) { + receipts.shrink_to_fit(); + self.receipt_cache.insert(key.clone(), receipts); + }

To check current usage patterns and potential reliance on empty caching, run:

🏁 Script executed:

#!/bin/bash rg -n "insert_receipt\(|get_receipt_or_else\(" -A 3 -B 3

Length of output: 4422

Always cache empty receipts for proper negative caching

Currently both the handler and its caller short-circuit empty vectors, so empty receipt results are never stored—causing repeated recomputation. We need to:

• In src/state_manager/cache.rs, remove the if !receipts.is_empty() guard in insert_receipt so it always calls insert(...) after shrink_to_fit().
• In src/state_manager/mod.rs, remove the surrounding if !receipts.is_empty() check before calling insert_receipt so empty vectors get cached.

Proposed diffs:

--- a/src/state_manager/cache.rs +++ b/src/state_manager/cache.rs @@ -206,7 +206,6 @@ impl TipsetReceiptEventCacheHandler for EnabledTipsetDataCache { - fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) { - if !receipts.is_empty() { - receipts.shrink_to_fit(); - self.receipt_cache.insert(key.clone(), receipts); - } - } + fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) { + receipts.shrink_to_fit(); + self.receipt_cache.insert(key.clone(), receipts); + }

--- a/src/state_manager/mod.rs +++ b/src/state_manager/mod.rs @@ -462,5 +462,4 @@ impl StateManager { - && !receipts.is_empty() - { - self.receipt_event_cache_handler - .insert_receipt(key, receipts); - } + // always cache receipts, even if empty + { + self.receipt_event_cache_handler.insert_receipt(key, receipts); + }

This ensures that even an empty Vec<Receipt> is cached once and won’t trigger redundant recomputations in get_receipt_or_else.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/state_manager/cache.rs around lines 206-211, remove the `if !receipts.is_empty()` guard so that after `receipts.shrink_to_fit()` the function always calls `self.receipt_cache.insert(key.clone(), receipts);`; also update src/state_manager/mod.rs at the call site (remove the surrounding `if !receipts.is_empty()` check) so `insert_receipt` is invoked even for empty Vec<Receipt> values, ensuring empty receipts are cached for negative caching.

I believe empty vec indicates value computation is skipped, thus should not be cached. cc @akaladarshi to confirm

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2025-08-14T10:02:10Z

src/state_manager/cache.rs

+    fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
        if !events_data.events.is_empty() {
+            events_data.events.shrink_to_fit();
+            events_data.roots.shrink_to_fit();
            self.events_cache.insert(key.clone(), events_data);
        }
    }


💡 Verification agent

🧩 Analysis chain

Same here: cache empty event sets to avoid unnecessary re-computation

As with receipts, events can legitimately be empty. Skipping insertion changes cache semantics and may increase CPU churn.

- fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) { - if !events_data.events.is_empty() { - events_data.events.shrink_to_fit(); - events_data.roots.shrink_to_fit(); - self.events_cache.insert(key.clone(), events_data); - } - } + fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) { + events_data.events.shrink_to_fit(); + events_data.roots.shrink_to_fit(); + self.events_cache.insert(key.clone(), events_data); + }

You can also scan for call sites:

🏁 Script executed:

#!/bin/bash rg -n "insert_events\(|get_events_or_else\(" -A 3 -B 3

Length of output: 3584

🏁 Script executed:

#!/bin/bash rg -n "events_cache" -A5 -B5 src/state_manager/cache.rs

Length of output: 1649

🏁 Script executed:

#!/bin/bash rg -n "fn insert_receipt" -A5 -B5 src/state_manager/cache.rs

Length of output: 1390

🏁 Script executed:

#!/bin/bash rg -n "struct TipsetStateCache" -A5 -B5 .

Length of output: 1434

Always cache empty StateEvents to avoid redundant recomputation

Skipping insertion when events_data.events is empty means callers of get_events_or_else will recompute identical (empty) results on every miss. Align with insert_receipt’s behavior by always caching—even empty—event sets.

Points of attention:

File: src/state_manager/cache.rs

Method: EnabledTipsetDataCache::insert_events (lines 213–219)

Suggested diff:

- fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) { - if !events_data.events.is_empty() { - events_data.events.shrink_to_fit(); - events_data.roots.shrink_to_fit(); - self.events_cache.insert(key.clone(), events_data); - } - } + fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) { + events_data.events.shrink_to_fit(); + events_data.roots.shrink_to_fit(); + self.events_cache.insert(key.clone(), events_data); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {

if !events_data.events.is_empty() {

events_data.events.shrink_to_fit();

events_data.roots.shrink_to_fit();

self.events_cache.insert(key.clone(), events_data);

}

}

fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {

events_data.events.shrink_to_fit();

events_data.roots.shrink_to_fit();

self.events_cache.insert(key.clone(), events_data);

}

🤖 Prompt for AI Agents

In src/state_manager/cache.rs around lines 213 to 219, change EnabledTipsetDataCache::insert_events to always cache the provided StateEvents (including when events is empty) instead of skipping insertion; remove the if !events_data.events.is_empty() guard, keep the shrink_to_fit() calls on events and roots, then always call self.events_cache.insert(key.clone(), events_data) so empty results are cached like insert_receipt does.

I believe empty vec indicates value computation is skipped, thus should not be cached. cc @aatifsyed to confirm

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

I don't think I have enough context to confirm or deny! Is this a Rust question, a protocol question, or a library question? :))

Hey @aatifsyed thanks for your reply! I actually meant to tag @akaladarshi on this sorry about it and have a great day!

fix: remove excess vec capacity in caches

532e259

hanabi1224 marked this pull request as ready for review August 14, 2025 09:56

hanabi1224 requested a review from a team as a code owner August 14, 2025 09:56

hanabi1224 requested review from elmattic and sudo-shashank and removed request for a team August 14, 2025 09:56

sudo-shashank approved these changes Aug 14, 2025

View reviewed changes

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

elmattic approved these changes Aug 18, 2025

View reviewed changes

elmattic added this pull request to the merge queue Aug 18, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 18, 2025

hanabi1224 added this pull request to the merge queue Aug 18, 2025

Merged via the queue into main with commit b9ed919 Aug 18, 2025
48 checks passed

hanabi1224 deleted the hm/no-excess-vec-capacity-in-cache branch August 18, 2025 09:05

coderabbitai bot mentioned this pull request Mar 23, 2026

refactor: refine msgs_in_tipset cache usage #6782

Merged

6 tasks

Conversation

hanabi1224 commented Aug 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reference issue to close (if applicable)

Other information and links

Change checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

sudo-shashank left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

aatifsyed Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

hanabi1224 Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hanabi1224 commented Aug 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Aug 14, 2025 •

edited

Loading

hanabi1224 Aug 14, 2025 •

edited

Loading

coderabbitai bot Aug 14, 2025 •

edited

Loading