Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store each ticket in its own DB bucket. #243

Merged
merged 3 commits into from May 24, 2021

Conversation

jholdstock
Copy link
Member

@jholdstock jholdstock commented May 5, 2021

NOTE: This contains a backwards incompatible database migration, so if you plan to test it, please make a copy of your database first.

Moves tickets from a single database bucket containing JSON encoded strings, to a bucket for each ticket.

This change is to preemptively deal with scaling issues seen with databases containing tens of thousands of tickets. (Closes #223)

// TODO is this dodgy?
if bkt.Get(ConfirmedK)[0] == byte(1) {
ticket.Confirmed = true
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this particularly icky piece of code for reviewers - any better ideas for how to encode/decode bool & []byte?

@jholdstock jholdstock force-pushed the benching2 branch 3 times, most recently from d2b2748 to 219aee0 Compare May 6, 2021 10:06
@jholdstock
Copy link
Member Author

  • Validated that key length makes no difference to database disk usage. Previously we used short keys because they were JSON encoded and contributed to increasing disk usage. Now we can use long, descriptive keys.
  • Running on https://testnet-vsp.jholdstock.uk (database contains ~15k tickets). Memory usage and upgrade duration negligible.
  • Used benchmark code from Improve ticket iterations & lookups #223 to test a database containing ~200k tickets:
    • Old code
      • InsertNewTicket 2.39s
      • CountAllTickets 2.41s
    • New code
      • InsertNewTicket 0.30s
      • CountAllTickets 0.27s

Ready for review

@jholdstock jholdstock marked this pull request as ready for review May 6, 2021 11:04
@dnldd
Copy link
Member

dnldd commented May 12, 2021

Still uneasy about the db size growth here, one thing I think we haven't looked at yet is using a custom encoder/decoder in the serialization/deserialization.We can borrow some ideas from how the block header is serialized/deserialized from dcrd. readElement and writeElement here should be good references to put something together. Thoughts?

@jholdstock
Copy link
Member Author

jholdstock commented May 13, 2021

While this kind of solution is useful for dcrd where messages need to be kept as tiny as possible for fast propagation through the network, I feel like that is a bit overkill for this case. It would add too much new complexity, whereas both the old solution and the new solution are relatively simple.

Interested to hear thoughts from @dhill as an experienced sysadmin. Would you mind reviewing comments on #223, particularly this one?

To add some more info, the database size with the new scheme is approximately:

200,000 tickets = 864 MB
400,000 tickets = 1,704 MB

The legacy VSP with the most voted tickets is dcr.stakeminer.com with 292,000 tickets, and it launched over 5 years ago.

@jholdstock
Copy link
Member Author

I've spent a little more time on this. I've tried tweaking the bolt DB FillPercent parameters, I've tried reducing the number of buckets by encoding the vote choices as a JSON string rather than a bucket, I've tried using the internal compacting code from the bolt lib to try and make the db smaller. No progress.

In order to reduce filesize and keep the performance gains introduced by this PR, we either need to investigate a new encoding, as suggested by dnldd, or migrate to a new database

@jholdstock
Copy link
Member Author

One option which I think could be viable is to delete the raw fee transaction hex from the database once it has been broadcast and confirmed. That should reduce the space needed by each ticket quite significantly. I will run some tests.

@dnldd
Copy link
Member

dnldd commented May 18, 2021

I think removing the transaction hex is a good first step, that should get the db size down. Still think it'd be worthwhile getting a custom encoder/decoder done in the future borrowing from what's working in dcrd currently. That can come later if the confirmed tx hex are getting pruned from the db.

@jholdstock
Copy link
Member Author

Marking as a draft as this is now rebased and depends on #260

@jholdstock jholdstock force-pushed the benching2 branch 2 times, most recently from 46ebfd4 to c4e3b30 Compare May 20, 2021 08:05
@jholdstock
Copy link
Member Author

I have done some extra benchmarking to ensure the changes from this PR and #260 are working as intended.

Testing a database with 200k tickets:

v1 v2 (#260) v3 (This PR)
DB Size 366 MB 181 MB 253 MB
InsertNewTicket 2.37 seconds 1.69 seconds 0.86 seconds
CountTickets 2.37 seconds 1.74 seconds 0.27 seconds

The growth of all of these metrics is still linear.

Rebased onto master and ready for review.

@jholdstock jholdstock marked this pull request as ready for review May 20, 2021 10:08
@jholdstock
Copy link
Member Author

jholdstock commented May 20, 2021

database/ticket.go Show resolved Hide resolved
database/ticket.go Outdated Show resolved Hide resolved
webapi/admin.go Outdated Show resolved Hide resolved
background/background.go Outdated Show resolved Hide resolved
@jholdstock jholdstock force-pushed the benching2 branch 2 times, most recently from 6b7a6d0 to 06cd0f3 Compare May 21, 2021 04:27
**NOTE: This contains a backwards incompatible database migration, so if you plan to test it, please make a copy of your database first.**

Moves tickets from a single database bucket containing JSON encoded strings, to a bucket for each ticket.

This change is to preemptively deal with scaling issues seen with databases containing tens of thousands of tickets.
@jholdstock
Copy link
Member Author

Rebased onto master to pick up the latest linter.

@jholdstock jholdstock merged commit 20cb546 into decred:master May 24, 2021
@xaur
Copy link

xaur commented May 31, 2021

Benchmarking code:bench_test.txt

Would this code be useful in the repo?

Also curious why InsertNewTicket took 0.30 s in the May 6 comment but 0.86 s in the May 20 benchmark, while both are on a 200K ticket database. I would expect the latter would be even faster after #260.

@jholdstock
Copy link
Member Author

Adding the benchmark to the repo would likely be more of a burden than helpful.

  • Its not useful to run in CI - it takes too long and the results cannot easily be interpreted automatically.
  • If the code is not being run regularly, it will be liable to fall out of date with the rest of the codebase and need maintenance effort. Maintaining code which isn't being used is not fun.

As for the time differences, I carried out the testing on different machines, albeit with similar-ish specs. Also its likely that I was running other tasks at the same time as the benchmark which would have had an impact. Tbh the important point of the testing is not the exact timing for any given test, but the growth of the times. With this kind of database access we want to ensure that any growth is linear (or better) and not exponential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve ticket iterations & lookups
3 participants