Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-3710: Introduce commit_purge API #3811

Merged

Conversation

mpapierski
Copy link
Collaborator

Closes #3710
This PR implements the "commit_purge" API in EE and additional machinery in the ContractRuntime component to execute it after each block.

There are also improvements to the global state to ensure that deleting an entry from the tip of the trie does not involve deserialization, and additionally, scratch trie was modified to cache the bytes of a trie rather than the trie object itself to allow lazy deserialization. This should reduce the memory footprint of a large delete operation by about 6x.

Currently, commit_purge API is called repeatedly after each block is executed with a slice of Key::EraInfo whose length is configured by chainspec's purge_batch_size. After several blocks, when there are no more Key::EraInfo entries, the process will be no-op, and at the next protocol version can be disabled by setting purge_batch_size = 0

The purge of Key::EraInfo consists of two steps:

  • At the upgrade point last Key::EraInfo(activation_point_era - 1) will be copied into a Key::EraSummary.
  • At the end of each executed block there will be a call to commit_purge to delete N's first Key::EraInfo entries starting from 0.

Michał Papierski and others added 17 commits March 16, 2023 17:20
The failure comes from the ScratchTrie which caches the Trie objects,
rather than bytes. To ensure that no V is deserialized while deleting,
we're bypassing a cache, and that leads to a failure after first key is
deleted.
With a cache that operates at byte level we can still perform
optimizations such as avoiding deserialization.
If an operation only uses get() instead of get_raw() it may fail while
writing root to db.
Fixture was regenerated on top of commit 8038340.
This also ensures no new EraInfo objects are produced, and passes all
the integration tests.
The remote assets are only compiled for x86-64 which prevents running
the tests on other platforms (like ARM aarch64).
Add ability to override the base URL so that users can run the tests
using assets from a different location, or locally cached assets by
specifying a `file://<local_directory>` URL.

Signed-off-by: Alexandru Sardan <alexandru@casperlabs.io>
Copy link
Collaborator

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.

types/src/key.rs Outdated Show resolved Hide resolved
types/src/key.rs Outdated Show resolved Hide resolved
types/src/key.rs Outdated Show resolved Hide resolved
types/src/key.rs Show resolved Hide resolved
types/src/key.rs Show resolved Hide resolved
node/src/components/contract_runtime/operations.rs Outdated Show resolved Hide resolved
node/src/components/contract_runtime/operations.rs Outdated Show resolved Hide resolved
node/src/components/contract_runtime.rs Outdated Show resolved Hide resolved
node/src/components/contract_runtime.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further comments, review still incomplete.

execution_engine/Cargo.toml Outdated Show resolved Hide resolved
execution_engine/src/core/engine_state/mod.rs Outdated Show resolved Hide resolved
execution_engine/src/core/engine_state/mod.rs Outdated Show resolved Hide resolved
execution_engine/src/core/engine_state/mod.rs Show resolved Hide resolved
execution_engine/src/core/engine_state/mod.rs Outdated Show resolved Hide resolved
execution_engine/src/core/engine_state/mod.rs Outdated Show resolved Hide resolved
execution_engine/src/core/engine_state/upgrade.rs Outdated Show resolved Hide resolved
/// Outcomes:
/// * Ok(Some(range)) -- these keys should be purged
/// * Ok(None) -- nothing to do, either done, or there is not enough eras to purge
fn calculate_purge_eras(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn calculate_purge_eras(
fn calculate_prune_eras(

It seems like we prefer this process to be called "pruning", which can be seen in other places already.

And if not, then change "prune" to "purge" in other places (for consistency).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fraser999 Thoughts on renaming? There's also chainspec entry purge_batch_size to change which we've already communicated to test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ed's preference was also "prune", so maybe worth changing now before the PR merges.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you renamed it to prune everywhere except this file! 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course I forgot 🤦 Fixed

node/src/effect.rs Outdated Show resolved Hide resolved
node/src/effect.rs Outdated Show resolved Hide resolved
node/src/effect.rs Outdated Show resolved Hide resolved
node/src/effect.rs Outdated Show resolved Hide resolved
node/src/effect/requests.rs Outdated Show resolved Hide resolved
execution_engine/src/storage/trie_store/operations/mod.rs Outdated Show resolved Hide resolved
execution_engine_testing/tests/src/lmdb_fixture.rs Outdated Show resolved Hide resolved
execution_engine_testing/tests/Cargo.toml Outdated Show resolved Hide resolved
This regression was introduced 04de5f0 and resulted in purge starting
from next switch block after activation, instead of starting within
first block activation.
types/src/key.rs Outdated Show resolved Hide resolved
node/src/components/storage.rs Outdated Show resolved Hide resolved
node/src/components/storage.rs Outdated Show resolved Hide resolved
node/src/components/storage.rs Outdated Show resolved Hide resolved
node/src/components/rpc_server/rpcs/docs.rs Show resolved Hide resolved
node/src/components/linear_chain_sync/blocks.rs Outdated Show resolved Hide resolved
execution_engine/src/storage/trie_store/operations/mod.rs Outdated Show resolved Hide resolved
ci/nctl_upgrade.sh Outdated Show resolved Hide resolved
utils/nctl/sh/scenarios-upgrades/upgrade_scenario_11.sh Outdated Show resolved Hide resolved
execution_engine/src/storage/store/mod.rs Show resolved Hide resolved
node/src/types/chainspec/core_config.rs Outdated Show resolved Hide resolved
/// Outcomes:
/// * Ok(Some(range)) -- these keys should be purged
/// * Ok(None) -- nothing to do, either done, or there is not enough eras to purge
fn calculate_purge_eras(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you renamed it to prune everywhere except this file! 😄

execution_engine/src/storage/trie_store/operations/mod.rs Outdated Show resolved Hide resolved
execution_engine/src/storage/trie_store/lmdb.rs Outdated Show resolved Hide resolved
execution_engine/src/core/engine_state/prune.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimistic approval, given the one final comment on upgrade_scenario_11.sh is addressed.

@rafal-ch
Copy link
Contributor

I'll be doing a second review today, but nevertheless, we need a stamp from @dwerner and/or @EdHastingsCasperLabs on this.

Comment on lines +475 to +477
// Can't find key
// Most likely this chain did not yet ran an auction, or recently completed a
// prune
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is accurate, and while it is basically harmless to do so I don't think it is necessary to log this circumstance as a warn.

As far as I recall, a new network w/ genesis at this binary or higher would never have such an entry (correct behavior). An existing network upgraded into this binary must have such an entry. In a future binary we will either remove this handling altogether, or if it is left in place it would essentially translate to noop on networks that had already previously gone thru a migration.

Am i forgetting something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment is correct, the warn! here was useful for testing early in the development to get the math right. I removed the warning.

@@ -2084,4 +2123,39 @@ where
let max_height = self.config.max_runtime_call_stack_height() as usize;
RuntimeStack::new_system_call_stack(max_height)
}

/// Commit a prune of leaf nodes from the tip of the merkle trie.
pub fn commit_prune(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, but could you put this new func after commit_genesis and commit_upgrade for purposes of logical grouping?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 573 to 574
pub fn write_era_summary(&mut self, key: Key, value: EraInfo) {
if let Key::EraSummary = key {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we can get away w/ changing the name of this public function in 1.x

instead, consider making a new function with the new behavior and forward the old function to it, mark the old fn as deprecated, and todo! remove it in 2.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I renamed it back (although I think we would be fine, as we don't reimport RuntimeContext as pub)

Comment on lines +234 to +237
error!(
previous_block_height,
%state_root_hash,
"commit_prune: root not found");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we kill the node in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this and do a panic. It's impossible to hit this case. CC @Fraser999

types/src/key.rs Outdated
Comment on lines 622 to 627
Key::SystemContractRegistry => {
result.append(&mut SYSTEM_CONTRACT_REGISTRY_KEY.to_bytes()?)
result.append(&mut PADDING_BYTES.to_bytes()?)
}
Key::EraSummary => {
result.append(&mut PADDING_BYTES.to_bytes()?);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just | these arms together?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a monster. See nitpicks but LGTM, congrats to all involved. Lets test the heck out it.

@mpapierski
Copy link
Collaborator Author

bors r+

casperlabs-bors-ng bot added a commit that referenced this pull request Apr 14, 2023
3811: GH-3710: Introduce commit_purge API r=mpapierski a=mpapierski

Closes #3710 
This PR implements the "commit_purge" API in EE and additional machinery in the ContractRuntime component to execute it after each block.

There are also improvements to the global state to ensure that deleting an entry from the tip of the trie does not involve deserialization, and additionally, scratch trie was modified to cache the bytes of a trie rather than the trie object itself to allow lazy deserialization. This should reduce the memory footprint of a large delete operation by about 6x.

Currently, commit_purge API is called repeatedly after each block is executed with a slice of Key::EraInfo whose length is configured by chainspec's `purge_batch_size`. After several blocks, when there are no more Key::EraInfo entries, the process will be no-op, and at the next protocol version can be disabled by setting `purge_batch_size = 0`

The purge of Key::EraInfo consists of two steps:
- At the upgrade point last Key::EraInfo(activation_point_era - 1) will be copied into a Key::EraSummary.
- At the end of each executed block there will be a call to commit_purge to delete N's first Key::EraInfo entries starting from 0.

Co-authored-by: Michał Papierski <michal@casperlabs.io>
Co-authored-by: Michał Papierski <michal@papierski.net>
Co-authored-by: Alexandru Sardan <alexandru@casperlabs.io>
Co-authored-by: Karan Dhareshwar <karan@casperlabs.io>
@casperlabs-bors-ng
Copy link
Contributor

This PR was included in a batch that successfully built, but then failed to merge into release-1.4.14. It will not be retried.

Additional information:

{"message":"You're not authorized to push to this branch. Visit https://docs.github.com/articles/about-protected-branches/ for more information.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

@sacherjj sacherjj merged commit 4f32bc8 into casper-network:release-1.4.14 Apr 17, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants