New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store each ticket in its own DB bucket. #243
Conversation
// TODO is this dodgy? | ||
if bkt.Get(ConfirmedK)[0] == byte(1) { | ||
ticket.Confirmed = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this particularly icky piece of code for reviewers - any better ideas for how to encode/decode bool
& []byte
?
d2b2748
to
219aee0
Compare
Ready for review |
Still uneasy about the db size growth here, one thing I think we haven't looked at yet is using a custom encoder/decoder in the serialization/deserialization.We can borrow some ideas from how the block header is serialized/deserialized from dcrd. |
While this kind of solution is useful for dcrd where messages need to be kept as tiny as possible for fast propagation through the network, I feel like that is a bit overkill for this case. It would add too much new complexity, whereas both the old solution and the new solution are relatively simple. Interested to hear thoughts from @dhill as an experienced sysadmin. Would you mind reviewing comments on #223, particularly this one? To add some more info, the database size with the new scheme is approximately: 200,000 tickets = 864 MB The legacy VSP with the most voted tickets is dcr.stakeminer.com with 292,000 tickets, and it launched over 5 years ago. |
I've spent a little more time on this. I've tried tweaking the bolt DB FillPercent parameters, I've tried reducing the number of buckets by encoding the vote choices as a JSON string rather than a bucket, I've tried using the internal compacting code from the bolt lib to try and make the db smaller. No progress. In order to reduce filesize and keep the performance gains introduced by this PR, we either need to investigate a new encoding, as suggested by dnldd, or migrate to a new database |
One option which I think could be viable is to delete the raw fee transaction hex from the database once it has been broadcast and confirmed. That should reduce the space needed by each ticket quite significantly. I will run some tests. |
I think removing the transaction hex is a good first step, that should get the db size down. Still think it'd be worthwhile getting a custom encoder/decoder done in the future borrowing from what's working in dcrd currently. That can come later if the confirmed tx hex are getting pruned from the db. |
Marking as a draft as this is now rebased and depends on #260 |
46ebfd4
to
c4e3b30
Compare
I have done some extra benchmarking to ensure the changes from this PR and #260 are working as intended. Testing a database with 200k tickets:
The growth of all of these metrics is still linear. Rebased onto master and ready for review. |
|
6b7a6d0
to
06cd0f3
Compare
**NOTE: This contains a backwards incompatible database migration, so if you plan to test it, please make a copy of your database first.** Moves tickets from a single database bucket containing JSON encoded strings, to a bucket for each ticket. This change is to preemptively deal with scaling issues seen with databases containing tens of thousands of tickets.
Rebased onto master to pick up the latest linter. |
Would this code be useful in the repo? Also curious why |
Adding the benchmark to the repo would likely be more of a burden than helpful.
As for the time differences, I carried out the testing on different machines, albeit with similar-ish specs. Also its likely that I was running other tasks at the same time as the benchmark which would have had an impact. Tbh the important point of the testing is not the exact timing for any given test, but the growth of the times. With this kind of database access we want to ensure that any growth is linear (or better) and not exponential. |
NOTE: This contains a backwards incompatible database migration, so if you plan to test it, please make a copy of your database first.
Moves tickets from a single database bucket containing JSON encoded strings, to a bucket for each ticket.
This change is to preemptively deal with scaling issues seen with databases containing tens of thousands of tickets. (Closes #223)