Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-process compaction support to databases #49

Closed
2 of 3 tasks
Tracked by #44
lasarojc opened this issue Dec 23, 2022 · 4 comments
Closed
2 of 3 tasks
Tracked by #44

Add in-process compaction support to databases #49

lasarojc opened this issue Dec 23, 2022 · 4 comments
Assignees
Labels
backlog A prioritized task in the team's backlog P:storage-optimization Priority: Give operators greater control over storage and storage optimization storage
Milestone

Comments

@lasarojc
Copy link
Contributor

lasarojc commented Dec 23, 2022

Was tendermint/tendermint#9743

Summary

Experiment with adding in-proces compaction, so that nodes don't need to be stopped to perform compaction. This issue was originally targeting levelDB but we added support for this to all cometbft database backends that support this feature: RocksDB, PebbleDB and LevelDB.

Problem Definition

Background

One of the most common problem that operators signal is that storage growth is unbounded and compaction doesn't work. Some operators stop their node, trigger experimental-compact-goleveldb (#8564) which deletes old data, and then restart their node.

Why do we need this feature?

The use of command experimental-compact-goleveldb has the disadvantage that while this is running the node is stopped and is missing blocks. It typically take on the order of tens of minutes to finish compaction of a node on a production network, so the number of missed blocks can be significant.

Proposal

We'll go about this incrementally

  • Tendermint team does initial de-risking and sanity checks to see that in-process compaction can be implemented safely
    • Add a new database type that does compaction
  • We ask an operator to deploy an early experiment replacing one of their full nodes with the patched tendermint version that has in-process compaction
    • relayer team tests relaying against that node, monitor general health
  • We collect advanced metrics on latency in particular, as well as storage growth evolution
@lasarojc lasarojc added the backlog A prioritized task in the team's backlog label Dec 23, 2022
@lasarojc lasarojc mentioned this issue Dec 23, 2022
21 tasks
@adizere adizere removed their assignment Feb 7, 2023
@thanethomson thanethomson added P:storage-optimization Priority: Give operators greater control over storage and storage optimization storage labels Jun 20, 2023
@jmalicevic
Copy link
Contributor

@jmalicevic jmalicevic added this to the 2023-Q4 milestone Nov 23, 2023
@melekes
Copy link
Contributor

melekes commented Dec 11, 2023

@jmalicevic jmalicevic self-assigned this Jan 3, 2024
@adizere adizere modified the milestones: 2023-Q4, 2024-Q1 Jan 9, 2024
github-merge-queue bot pushed a commit that referenced this issue Feb 16, 2024
Blocked on cometbft/cometbft-db#111 and
benchmarking


Addresses #49 

Upon pruning we explicitly call the compaction function of the DB
backend. This has shown to immediately shrink the storage footprint.

We need to evaluate the duration of this compaction depending on the
storage size to be able to reason about the impact on Comet's regular
operations.

ToDo: 
-Extend the `storage` config section with following parameters:
- [ ] `in-process-compaction = false #Enable or disable in-process
compaction. False by default`
- [ ] `in-process-compaction-interval = 10 #Interval in number of blocks
to trigger explicit compaction; 10 by default`



<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mergify bot pushed a commit that referenced this issue Feb 16, 2024
Blocked on cometbft/cometbft-db#111 and
benchmarking

Addresses #49

Upon pruning we explicitly call the compaction function of the DB
backend. This has shown to immediately shrink the storage footprint.

We need to evaluate the duration of this compaction depending on the
storage size to be able to reason about the impact on Comet's regular
operations.

ToDo:
-Extend the `storage` config section with following parameters:
- [ ] `in-process-compaction = false #Enable or disable in-process
compaction. False by default`
- [ ] `in-process-compaction-interval = 10 #Interval in number of blocks
to trigger explicit compaction; 10 by default`

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit cfe8b88)

# Conflicts:
#	internal/state/store.go
#	internal/store/store.go
#	node/node.go
@jmalicevic
Copy link
Contributor

jmalicevic commented Feb 19, 2024

Closed with #1972 and cometbft/cometbft-db#111 . Followed up with a fix resolving: #2476

greg-szabo pushed a commit that referenced this issue Feb 22, 2024
Blocked on cometbft/cometbft-db#111 and
benchmarking


Addresses #49 

Upon pruning we explicitly call the compaction function of the DB
backend. This has shown to immediately shrink the storage footprint.

We need to evaluate the duration of this compaction depending on the
storage size to be able to reason about the impact on Comet's regular
operations.

ToDo: 
-Extend the `storage` config section with following parameters:
- [ ] `in-process-compaction = false #Enable or disable in-process
compaction. False by default`
- [ ] `in-process-compaction-interval = 10 #Interval in number of blocks
to trigger explicit compaction; 10 by default`



<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments

---------

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@jmalicevic
Copy link
Contributor

Local experiments confirmed that the storage footprint with golevelDB shrinks only when forcing compaction. This was confirmed by running the code on a real chain by Informal staking.

@jmalicevic jmalicevic changed the title Add in-process compaction support to LevelDB Add in-process compaction support to databases Mar 4, 2024
github-merge-queue bot pushed a commit that referenced this issue Mar 13, 2024
Addresses #49 

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mergify bot pushed a commit that referenced this issue Mar 13, 2024
Addresses #49

<!--

Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).

If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions

If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.

See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1

-->

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit 63f2629)
jmalicevic added a commit that referenced this issue Mar 13, 2024
…) (#2604)

Addresses #49 



---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #2600 done by
[Mergify](https://mergify.com).

Co-authored-by: Jasmina Malicevic <jasmina.dustinac@gmail.com>
cometcrafter pushed a commit to graphprotocol/cometbft that referenced this issue May 13, 2024
cometbft#2969) (cometbft#40) (cometbft#49)

Speeds up 5% of the non-IO time overhead from
`channel.WritePacketMsgTo`. The CPU time overhead in this function is
quite significant, CPU time is more than 3 times the syscall time for
writing to the net buffer. Working on a github issue for more
substantial refactor / time eliminations, but this 3s is easy enough.

We don't even use this codepath, so this make slice is entirely wasted.
However we should do things that reduce this overhead further.

![image](https://github.com/cometbft/cometbft/assets/6440154/e02e45bf-d6ff-4e11-9983-e81ca1102dc8)

---

#### PR checklist

- [x] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#2949 done by
[Mergify](https://mergify.com).

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
(cherry picked from commit 3d1b9dc)

Co-authored-by: Adam Tucker <adam@osmosis.team>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog A prioritized task in the team's backlog P:storage-optimization Priority: Give operators greater control over storage and storage optimization storage
Projects
Status: Done
Status: Todo
Development

No branches or pull requests

5 participants