Investigate RocksDB snapshot copying on Partition start #5682
Labels
area/performance
Marks an issue as performance related
area/reliability
Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected)
component/db
kind/research
Marks an issue as part of a research or investigation
kind/toil
Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
Milestone
Description
Currently on restart a partition and becoming leader we delete the runtime folder and copy the latest snapshot into the runtime folder, see #1812. If the db contains a lot of data or is not well compacted like here #5137 then this can take a while, especially if there are not resources like cpus are assigned or when we running on a hard disk. This caused in the past some incidents in camunda cloud.
We should investigate whether it still makes sense to copy the last snapshot or maybe we can create hard links. If we are a single node then we could also immediately start with the runtime.
The text was updated successfully, but these errors were encountered: