Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment n historical states #5968

Open
1 of 4 tasks
twoeths opened this issue Sep 19, 2023 · 3 comments · Fixed by #6699
Open
1 of 4 tasks

Experiment n historical states #5968

twoeths opened this issue Sep 19, 2023 · 3 comments · Fixed by #6699
Assignees
Labels
meta-feature-request Issues to track feature requests.

Comments

@twoeths
Copy link
Contributor

twoeths commented Sep 19, 2023

Problem description

Today lodestar stores up to 96 states in state cache and up to 10 epochs of checkpoint states in checkpoint state cache, along with justified and finalized state. It causes us a lot of memory and node has to restarted multiple time due to OOM in long unfinality period like in early days of Holesky

Solution description

  • Find a way to load state ssz bytes to a current tree so that we have single seed/base state tree across application
  • Implement Shuffling cache:
    • Stores based on an epoch and dependent root, something like: Map<Epoch, Map<RootHex, Shuffling>>
    • Consumed by attestation verification
    • Consumed when creating CachedBeaconState given a TreeBackedDU BeaconState to save time
    • Prune when finalize()
  • State regen: don't change its logic
  • State cache:
    • Right now we store up to 96 states, we could reduce this number a lot, for example 32 states (number could be changed based on experiment result)
  • Checkpoint cache:
    • Right now we store up to 10 epochs, I think storing up to 2 epochs is enough, no need to store justified/finalized checkpoint in memory
    • Prune from cache if more than 2 epochs and persist to file/db
      • normally it should happen during PrecomputeEpochTransition at the last 1/3 of slot
    • when calling getLatest() or get(), load from db/file if needed
  • Number of states to keep in state/checkpoint cache are configurable to test in different networks

Additional context

Storing temp state to level db and later on remove it may cause the db to grow, may consider persisting it to file system and removing them when chain is finalized

Progress

  • implementation
  • audit apis that related to finalized states
  • RSS issue: it spiked to 30GB on a node then get back to normal after 9h
  • test v1.18.0: in progress
@twoeths twoeths added the meta-feature-request Issues to track feature requests. label Sep 19, 2023
@twoeths
Copy link
Contributor Author

twoeths commented Sep 19, 2023

Regarding merging ssz state bytes to an existing TreeViewDU state, I have some statistics from this branch https://github.com/tuyennhv/lodestar/blob/tuyen/state_perf_test/packages/state-transition/test/unit/util/migrateState.test.ts#L71C9-L71C9

  • load state 7335296: const seedState = stateType.deserializeToViewDU(data_7335296); => it takes 1.3s
  • seedState.hashTreeRoot(); => this takes 36s
  • const migratedState = migrateState(seedState, newStateBytes); => this takes 0.5s-0.7s for 64 slots different. At this step migratedState and seedState share a lot of data, mainly the state.validators
  • migratedState.hashTreeRoot() => this takes 1.5s
  • The total time to create CachedBeaconState is ~2.3s given 64 slots difference (base state vs new state)
  • Same test but for 1 day difference (base state vs new state) takes 2.9x seconds
  • memory increased about 100MB - 110MB

Note this assumes we have a Shuffling cache to save time when creating CachedBeaconState, otherwise it takes 0.8s - 1s more

@twoeths twoeths self-assigned this Sep 19, 2023
@dapplion
Copy link
Contributor

Really cool to see you exploring this solution direction!

load state 7335296: const seedState = stateType.deserializeToViewDU(data_7335296); => it takes 1.3s

Do you propose to store the hashing cache in the DB, or load it from a similar state available in memory?

@twoeths
Copy link
Contributor Author

twoeths commented Sep 20, 2023

Do you propose to store the hashing cache in the DB, or load it from a similar state available in memory?

@dapplion I'd load it from a similar state available in memory. If I migrate from a mainnet seedState 1 day ago to current mainnet state, it takes ~2.9s to create CachedBeaconState which is not too different from 64-slot different load which takes ~2.3s. I think that's due to a lot of validators are not changed overtime, I noted the benchmark result here https://github.com/tuyennhv/lodestar/blob/4a69ce59929ea3065fdf65e75c1d4a88f1922c45/packages/state-transition/test/unit/util/migrateState.test.ts#L66

@twoeths twoeths reopened this Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-feature-request Issues to track feature requests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants