Skip to content

fix: remove excess vec capacity in caches#5935

Merged
hanabi1224 merged 1 commit intomainfrom
hm/no-excess-vec-capacity-in-cache
Aug 18, 2025
Merged

fix: remove excess vec capacity in caches#5935
hanabi1224 merged 1 commit intomainfrom
hm/no-excess-vec-capacity-in-cache

Conversation

@hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Aug 14, 2025

Summary of changes

This PR tries to reduce potential excess Vec capacity in caches for lower memory usage

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Summary by CodeRabbit

  • Performance
    • Reduced memory usage when building tipsets and caching messages, receipts, and events, leading to lower overhead during sync and querying.
  • Reliability
    • Added stricter validation of block headers in tipset creation to ensure consistent parents, state roots, epochs, and unique miners, improving chain integrity.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Walkthrough

Adds shrink_to_fit calls before caching or storing vectors in multiple components and introduces verify_block_headers in Tipset::new to validate header invariants before constructing a Tipset. Minor method parameter mutability updates to accommodate shrinking. One cache insert method’s signature now takes a mutable Vec.

Changes

Cohort / File(s) Summary
Tipset construction
src/blocks/tipset.rs
Tipset::new now shrinks header capacity and calls verify_block_headers(&headers)? before creating the Tipset; no public API signature changes.
Message cache insertion
src/chain/store/chain_store.rs
MsgsInTipsetCache::insert now takes mut value: Vec and calls value.shrink_to_fit() before caching; method signature updated accordingly.
Receipt and event caches
src/state_manager/cache.rs
In impl TipsetReceiptEventCacheHandler for EnabledTipsetDataCache, insert_receipt and insert_events bind inputs as mutable; shrink_to_fit applied to non-empty receipts, events, and roots before caching; trait signatures unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Tipset
  participant Verifier as verify_block_headers

  Caller->>Tipset: new(headers_in)
  Tipset->>Tipset: build NonEmpty<CachingBlockHeader>
  Tipset->>Tipset: headers.shrink_to_fit()
  Tipset->>Verifier: verify_block_headers(&headers)
  Verifier-->>Tipset: Ok() or Error
  Tipset-->>Caller: Ok(Tipset { headers, key })
Loading
sequenceDiagram
  participant Code as Caller
  participant Cache as MsgsInTipsetCache
  participant Store as Internal Cache

  Code->>Cache: insert(key, value: Vec<ChainMessage>)
  Cache->>Cache: value.shrink_to_fit()
  Cache->>Store: push(key, value)
  Store-->>Cache: ack
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • elmattic
  • LesnyRumcajs
  • akaladarshi

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/no-excess-vec-capacity-in-cache

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hanabi1224 hanabi1224 marked this pull request as ready for review August 14, 2025 09:56
@hanabi1224 hanabi1224 requested a review from a team as a code owner August 14, 2025 09:56
@hanabi1224 hanabi1224 requested review from elmattic and sudo-shashank and removed request for a team August 14, 2025 09:56
Copy link
Contributor

@sudo-shashank sudo-shashank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. shrink it

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/blocks/tipset.rs (2)

244-253: Verify before shrinking to avoid wasted work on invalid input

Call verify_block_headers before shrink_to_fit() so we fail fast and avoid an unnecessary reallocation if the tipset is invalid.

-        headers.shrink_to_fit();
-        verify_block_headers(&headers)?;
+        verify_block_headers(&headers)?;
+        headers.shrink_to_fit();

551-573: Avoid allocation in verify_block_headers by iterating without collecting

verify_block_headers currently collects into a NonEmpty<&CachingBlockHeader> (allocates) just to perform simple “all equal to first” checks. You can iterate the provided iterator directly, capture the first element, and check invariants without intermediate allocation. This keeps the function zero-alloc and still enforces non-emptiness via a single peek.

For example (illustrative, not a diff):

fn verify_block_headers<'a>(
    headers: impl IntoIterator<Item = &'a CachingBlockHeader>,
) -> Result<(), CreateTipsetError> {
    let mut iter = headers.into_iter();
    let first = match iter.next() {
        Some(h) => h,
        None => return Err(CreateTipsetError::Empty),
    };

    if !iter.clone().all(|it| it.parents == first.parents) {
        return Err(CreateTipsetError::BadParents);
    }
    if !iter.clone().all(|it| it.state_root == first.state_root) {
        return Err(CreateTipsetError::BadStateRoot);
    }
    if !iter.clone().all(|it| it.epoch == first.epoch) {
        return Err(CreateTipsetError::BadEpoch);
    }
    // miner uniqueness
    if iter.map(|it| it.miner_address).chain(std::iter::once(first.miner_address)).all_unique() == false {
        return Err(CreateTipsetError::DuplicateMiner);
    }
    Ok(())
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d641ead and 532e259.

📒 Files selected for processing (3)
  • src/blocks/tipset.rs (1 hunks)
  • src/chain/store/chain_store.rs (1 hunks)
  • src/state_manager/cache.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code Graph Analysis (2)
src/chain/store/chain_store.rs (2)
src/state_manager/cache.rs (1)
  • insert (154-159)
src/message/chain_message.rs (1)
  • value (65-70)
src/state_manager/cache.rs (2)
src/blocks/tipset.rs (2)
  • key (335-338)
  • key (529-532)
src/shim/executor.rs (1)
  • key (305-310)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build forest binaries on Linux AMD64
  • GitHub Check: tests-release
  • GitHub Check: tests
  • GitHub Check: cargo-publish-dry-run
  • GitHub Check: All lint checks
  • GitHub Check: Build Ubuntu
  • GitHub Check: Build MacOS
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (1)
src/chain/store/chain_store.rs (1)

586-589: LGTM: shrink cached Vec to reduce retained capacity

Shrinking before pushing into the LRU is consistent with the PR goal and helps the size-tracking be more accurate.

Comment on lines +206 to 211
fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) {
if !receipts.is_empty() {
receipts.shrink_to_fit();
self.receipt_cache.insert(key.clone(), receipts);
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Don’t skip caching empty receipts; it causes repeated recomputation and changes semantics

Gating insertion on non-empty vectors changes behavior: callers that legitimately produce empty receipts will no longer be cached and may recompute on subsequent lookups. Empty results are still results and should be cached (negative caching).

Apply this change to always cache, while still shrinking:

-    fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) {
-        if !receipts.is_empty() {
-            receipts.shrink_to_fit();
-            self.receipt_cache.insert(key.clone(), receipts);
-        }
-    }
+    fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) {
+        receipts.shrink_to_fit();
+        self.receipt_cache.insert(key.clone(), receipts);
+    }

To check current usage patterns and potential reliance on empty caching, run:


🏁 Script executed:

#!/bin/bash
rg -n "insert_receipt\(|get_receipt_or_else\(" -A 3 -B 3

Length of output: 4422


Always cache empty receipts for proper negative caching

Currently both the handler and its caller short-circuit empty vectors, so empty receipt results are never stored—causing repeated recomputation. We need to:

• In src/state_manager/cache.rs, remove the if !receipts.is_empty() guard in insert_receipt so it always calls insert(...) after shrink_to_fit().
• In src/state_manager/mod.rs, remove the surrounding if !receipts.is_empty() check before calling insert_receipt so empty vectors get cached.

Proposed diffs:

--- a/src/state_manager/cache.rs
+++ b/src/state_manager/cache.rs
@@ -206,7 +206,6 @@ impl TipsetReceiptEventCacheHandler for EnabledTipsetDataCache {
-    fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) {
-        if !receipts.is_empty() {
-            receipts.shrink_to_fit();
-            self.receipt_cache.insert(key.clone(), receipts);
-        }
-    }
+    fn insert_receipt(&self, key: &TipsetKey, mut receipts: Vec<Receipt>) {
+        receipts.shrink_to_fit();
+        self.receipt_cache.insert(key.clone(), receipts);
+    }
--- a/src/state_manager/mod.rs
+++ b/src/state_manager/mod.rs
@@ -462,5 +462,4 @@ impl StateManager {
-        && !receipts.is_empty()
-    {
-        self.receipt_event_cache_handler
-            .insert_receipt(key, receipts);
-    }
+        // always cache receipts, even if empty
+    {
+        self.receipt_event_cache_handler.insert_receipt(key, receipts);
+    }

This ensures that even an empty Vec<Receipt> is cached once and won’t trigger redundant recomputations in get_receipt_or_else.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/state_manager/cache.rs around lines 206-211, remove the `if
!receipts.is_empty()` guard so that after `receipts.shrink_to_fit()` the
function always calls `self.receipt_cache.insert(key.clone(), receipts);`; also
update src/state_manager/mod.rs at the call site (remove the surrounding `if
!receipts.is_empty()` check) so `insert_receipt` is invoked even for empty
Vec<Receipt> values, ensuring empty receipts are cached for negative caching.

Copy link
Contributor Author

@hanabi1224 hanabi1224 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe empty vec indicates value computation is skipped, thus should not be cached. cc @akaladarshi to confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +213 to 219
fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
if !events_data.events.is_empty() {
events_data.events.shrink_to_fit();
events_data.roots.shrink_to_fit();
self.events_cache.insert(key.clone(), events_data);
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Same here: cache empty event sets to avoid unnecessary re-computation

As with receipts, events can legitimately be empty. Skipping insertion changes cache semantics and may increase CPU churn.

-    fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
-        if !events_data.events.is_empty() {
-            events_data.events.shrink_to_fit();
-            events_data.roots.shrink_to_fit();
-            self.events_cache.insert(key.clone(), events_data);
-        }
-    }
+    fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
+        events_data.events.shrink_to_fit();
+        events_data.roots.shrink_to_fit();
+        self.events_cache.insert(key.clone(), events_data);
+    }

You can also scan for call sites:


🏁 Script executed:

#!/bin/bash
rg -n "insert_events\(|get_events_or_else\(" -A 3 -B 3

Length of output: 3584


🏁 Script executed:

#!/bin/bash
rg -n "events_cache" -A5 -B5 src/state_manager/cache.rs

Length of output: 1649


🏁 Script executed:

#!/bin/bash
rg -n "fn insert_receipt" -A5 -B5 src/state_manager/cache.rs

Length of output: 1390


🏁 Script executed:

#!/bin/bash
rg -n "struct TipsetStateCache" -A5 -B5 .

Length of output: 1434


Always cache empty StateEvents to avoid redundant recomputation

Skipping insertion when events_data.events is empty means callers of get_events_or_else will recompute identical (empty) results on every miss. Align with insert_receipt’s behavior by always caching—even empty—event sets.

Points of attention:

  • File: src/state_manager/cache.rs
  • Method: EnabledTipsetDataCache::insert_events (lines 213–219)

Suggested diff:

-    fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
-        if !events_data.events.is_empty() {
-            events_data.events.shrink_to_fit();
-            events_data.roots.shrink_to_fit();
-            self.events_cache.insert(key.clone(), events_data);
-        }
-    }
+    fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
+        events_data.events.shrink_to_fit();
+        events_data.roots.shrink_to_fit();
+        self.events_cache.insert(key.clone(), events_data);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
if !events_data.events.is_empty() {
events_data.events.shrink_to_fit();
events_data.roots.shrink_to_fit();
self.events_cache.insert(key.clone(), events_data);
}
}
fn insert_events(&self, key: &TipsetKey, mut events_data: StateEvents) {
events_data.events.shrink_to_fit();
events_data.roots.shrink_to_fit();
self.events_cache.insert(key.clone(), events_data);
}
🤖 Prompt for AI Agents
In src/state_manager/cache.rs around lines 213 to 219, change
EnabledTipsetDataCache::insert_events to always cache the provided StateEvents
(including when events is empty) instead of skipping insertion; remove the if
!events_data.events.is_empty() guard, keep the shrink_to_fit() calls on events
and roots, then always call self.events_cache.insert(key.clone(), events_data)
so empty results are cached like insert_receipt does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe empty vec indicates value computation is skipped, thus should not be cached. cc @aatifsyed to confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have enough context to confirm or deny! Is this a Rust question, a protocol question, or a library question? :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aatifsyed thanks for your reply! I actually meant to tag @akaladarshi on this sorry about it and have a great day!

@elmattic elmattic added this pull request to the merge queue Aug 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 18, 2025
@hanabi1224 hanabi1224 added this pull request to the merge queue Aug 18, 2025
Merged via the queue into main with commit b9ed919 Aug 18, 2025
48 checks passed
@hanabi1224 hanabi1224 deleted the hm/no-excess-vec-capacity-in-cache branch August 18, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants