Store each ticket in its own DB bucket. #243

jholdstock · 2021-05-05T16:32:29Z

NOTE: This contains a backwards incompatible database migration, so if you plan to test it, please make a copy of your database first.

Moves tickets from a single database bucket containing JSON encoded strings, to a bucket for each ticket.

This change is to preemptively deal with scaling issues seen with databases containing tens of thousands of tickets. (Closes #223)

jholdstock · 2021-05-05T16:37:08Z

database/ticket.go

+	// TODO is this dodgy?
+	if bkt.Get(ConfirmedK)[0] == byte(1) {
+		ticket.Confirmed = true
+	}


Highlighting this particularly icky piece of code for reviewers - any better ideas for how to encode/decode bool & []byte?

jholdstock · 2021-05-06T11:04:55Z

Validated that key length makes no difference to database disk usage. Previously we used short keys because they were JSON encoded and contributed to increasing disk usage. Now we can use long, descriptive keys.
Running on https://testnet-vsp.jholdstock.uk (database contains ~15k tickets). Memory usage and upgrade duration negligible.
Used benchmark code from Improve ticket iterations & lookups #223 to test a database containing ~200k tickets:
- Old code
  - InsertNewTicket 2.39s
  - CountAllTickets 2.41s
- New code
  - InsertNewTicket 0.30s
  - CountAllTickets 0.27s

Ready for review

dnldd · 2021-05-12T18:22:58Z

Still uneasy about the db size growth here, one thing I think we haven't looked at yet is using a custom encoder/decoder in the serialization/deserialization.We can borrow some ideas from how the block header is serialized/deserialized from dcrd. readElement and writeElement here should be good references to put something together. Thoughts?

jholdstock · 2021-05-13T02:51:25Z

While this kind of solution is useful for dcrd where messages need to be kept as tiny as possible for fast propagation through the network, I feel like that is a bit overkill for this case. It would add too much new complexity, whereas both the old solution and the new solution are relatively simple.

Interested to hear thoughts from @dhill as an experienced sysadmin. Would you mind reviewing comments on #223, particularly this one?

To add some more info, the database size with the new scheme is approximately:

200,000 tickets = 864 MB
400,000 tickets = 1,704 MB

The legacy VSP with the most voted tickets is dcr.stakeminer.com with 292,000 tickets, and it launched over 5 years ago.

jholdstock · 2021-05-18T14:00:42Z

I've spent a little more time on this. I've tried tweaking the bolt DB FillPercent parameters, I've tried reducing the number of buckets by encoding the vote choices as a JSON string rather than a bucket, I've tried using the internal compacting code from the bolt lib to try and make the db smaller. No progress.

In order to reduce filesize and keep the performance gains introduced by this PR, we either need to investigate a new encoding, as suggested by dnldd, or migrate to a new database

jholdstock · 2021-05-18T14:02:48Z

One option which I think could be viable is to delete the raw fee transaction hex from the database once it has been broadcast and confirmed. That should reduce the space needed by each ticket quite significantly. I will run some tests.

dnldd · 2021-05-18T22:39:44Z

I think removing the transaction hex is a good first step, that should get the db size down. Still think it'd be worthwhile getting a custom encoder/decoder done in the future borrowing from what's working in dcrd currently. That can come later if the confirmed tx hex are getting pruned from the db.

jholdstock · 2021-05-19T00:01:24Z

Marking as a draft as this is now rebased and depends on #260

jholdstock · 2021-05-20T10:08:31Z

I have done some extra benchmarking to ensure the changes from this PR and #260 are working as intended.

Testing a database with 200k tickets:

	v1	v2 (#260)	v3 (This PR)
DB Size	366 MB	181 MB	253 MB
InsertNewTicket	2.37 seconds	1.69 seconds	0.86 seconds
CountTickets	2.37 seconds	1.74 seconds	0.27 seconds

The growth of all of these metrics is still linear.

Rebased onto master and ready for review.

jholdstock · 2021-05-20T10:11:21Z

Benchmarking code:bench_test.txt
Full benchmarking results + graphs (check sheets for each version): vspd_benchmarking.ods

database/ticket.go

webapi/admin.go

background/background.go

**NOTE: This contains a backwards incompatible database migration, so if you plan to test it, please make a copy of your database first.** Moves tickets from a single database bucket containing JSON encoded strings, to a bucket for each ticket. This change is to preemptively deal with scaling issues seen with databases containing tens of thousands of tickets.

jholdstock · 2021-05-23T05:28:27Z

Rebased onto master to pick up the latest linter.

xaur · 2021-05-31T20:10:47Z

Benchmarking code:bench_test.txt

Would this code be useful in the repo?

Also curious why InsertNewTicket took 0.30 s in the May 6 comment but 0.86 s in the May 20 benchmark, while both are on a 200K ticket database. I would expect the latter would be even faster after #260.

jholdstock · 2021-05-31T23:29:02Z

Adding the benchmark to the repo would likely be more of a burden than helpful.

Its not useful to run in CI - it takes too long and the results cannot easily be interpreted automatically.
If the code is not being run regularly, it will be liable to fall out of date with the rest of the codebase and need maintenance effort. Maintaining code which isn't being used is not fun.

As for the time differences, I carried out the testing on different machines, albeit with similar-ish specs. Also its likely that I was running other tasks at the same time as the benchmark which would have had an impact. Tbh the important point of the testing is not the exact timing for any given test, but the growth of the times. With this kind of database access we want to ensure that any growth is linear (or better) and not exponential.

jholdstock commented May 5, 2021

View reviewed changes

jholdstock force-pushed the benching2 branch 3 times, most recently from d2b2748 to 219aee0 Compare May 6, 2021 10:06

jholdstock marked this pull request as ready for review May 6, 2021 11:04

jholdstock marked this pull request as draft May 18, 2021 23:43

jholdstock force-pushed the benching2 branch from 140e929 to 0dc33d7 Compare May 19, 2021 00:01

jholdstock force-pushed the benching2 branch 2 times, most recently from 46ebfd4 to c4e3b30 Compare May 20, 2021 08:05

jholdstock marked this pull request as ready for review May 20, 2021 10:08

dnldd suggested changes May 20, 2021

View reviewed changes

database/ticket.go Show resolved Hide resolved

database/ticket.go Outdated Show resolved Hide resolved

webapi/admin.go Outdated Show resolved Hide resolved

background/background.go Outdated Show resolved Hide resolved

jholdstock force-pushed the benching2 branch 2 times, most recently from 6b7a6d0 to 06cd0f3 Compare May 21, 2021 04:27

dnldd approved these changes May 22, 2021

View reviewed changes

jholdstock added 3 commits May 23, 2021 13:26

Don't export DB keys.

33493eb

Fix linter - redundant error handling.

d1ad9b7

jholdstock force-pushed the benching2 branch from ccbb3af to d1ad9b7 Compare May 23, 2021 05:27

jholdstock merged commit 20cb546 into decred:master May 24, 2021

jholdstock mentioned this pull request Jun 25, 2021

db/ticket: Consider optimising ticket import. #280

Closed

jholdstock deleted the benching2 branch January 25, 2022 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store each ticket in its own DB bucket. #243

Store each ticket in its own DB bucket. #243

jholdstock commented May 5, 2021 •

edited

jholdstock May 5, 2021

jholdstock commented May 6, 2021

dnldd commented May 12, 2021

jholdstock commented May 13, 2021 •

edited

jholdstock commented May 18, 2021

jholdstock commented May 18, 2021

dnldd commented May 18, 2021

jholdstock commented May 19, 2021

jholdstock commented May 20, 2021

jholdstock commented May 20, 2021 •

edited

jholdstock commented May 23, 2021

xaur commented May 31, 2021

jholdstock commented May 31, 2021

Store each ticket in its own DB bucket. #243

Store each ticket in its own DB bucket. #243

Conversation

jholdstock commented May 5, 2021 • edited

jholdstock May 5, 2021

Choose a reason for hiding this comment

jholdstock commented May 6, 2021

dnldd commented May 12, 2021

jholdstock commented May 13, 2021 • edited

jholdstock commented May 18, 2021

jholdstock commented May 18, 2021

dnldd commented May 18, 2021

jholdstock commented May 19, 2021

jholdstock commented May 20, 2021

jholdstock commented May 20, 2021 • edited

jholdstock commented May 23, 2021

xaur commented May 31, 2021

jholdstock commented May 31, 2021

jholdstock commented May 5, 2021 •

edited

jholdstock commented May 13, 2021 •

edited

jholdstock commented May 20, 2021 •

edited