Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate RocksDB snapshot copying on Partition start #5682

Closed
Zelldon opened this issue Oct 26, 2020 · 5 comments
Closed

Investigate RocksDB snapshot copying on Partition start #5682

Zelldon opened this issue Oct 26, 2020 · 5 comments
Labels
area/performance Marks an issue as performance related area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) component/db kind/research Marks an issue as part of a research or investigation kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
Milestone

Comments

@Zelldon
Copy link
Member

Zelldon commented Oct 26, 2020

Description

Currently on restart a partition and becoming leader we delete the runtime folder and copy the latest snapshot into the runtime folder, see #1812. If the db contains a lot of data or is not well compacted like here #5137 then this can take a while, especially if there are not resources like cpus are assigned or when we running on a hard disk. This caused in the past some incidents in camunda cloud.

We should investigate whether it still makes sense to copy the last snapshot or maybe we can create hard links. If we are a single node then we could also immediately start with the runtime.

@Zelldon Zelldon added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. area/performance Marks an issue as performance related Impact: Availability kind/research Marks an issue as part of a research or investigation labels Oct 26, 2020
@npepinpe
Copy link
Member

The only issue I can imagine with creating a checkpoint of a snapshot is that you have to open it, maybe? Hopefully you don't have to open the DB before checkpointing it 🤞

@Zelldon
Copy link
Member Author

Zelldon commented Oct 26, 2020

Or we just create hardlinks with java ? https://docs.oracle.com/javase/tutorial/essential/io/links.html#hardLink

@npepinpe
Copy link
Member

afaik it's only safe to hardlink the SST files - not sure about the others, which is why I'd delegate that to RocksDB to figure out, but we could also try and compare both.

@npepinpe npepinpe added this to the RocksDB milestone Jan 3, 2021
@npepinpe npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Availability labels Apr 11, 2022
@Zelldon
Copy link
Member Author

Zelldon commented Oct 10, 2023

@npepinpe can we close this?

@npepinpe
Copy link
Member

Yes 👍

github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
…#5682)

Bumps [org.springframework.boot:spring-boot-starter-parent](https://github.com/spring-projects/spring-boot) from 3.1.4 to 3.1.5.
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](spring-projects/spring-boot@v3.1.4...v3.1.5)

---
updated-dependencies:
- dependency-name: org.springframework.boot:spring-boot-starter-parent
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Marks an issue as performance related area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) component/db kind/research Marks an issue as part of a research or investigation kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
Projects
None yet
Development

No branches or pull requests

3 participants