Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use keyset delta algorithm to speed up GetData requests #3896

Closed
wants to merge 1 commit into from

Conversation

stejbac
Copy link
Contributor

@stejbac stejbac commented Jan 13, 2020

When Bisq starts it sends simultaneous PreliminaryGetDataRequest packets to two seed nodes to sync the latest trade statistics, offer book, mailbox and other data, followed by a later GetUpdatedDataRequest packet. The requests contain a list of keys of all the data to exclude (as we already have it) and are consequently about 2MB each. The responses to the first two GetData requests are typically around 3MB, it seems, usually consisting mostly of offer and mailbox items (as those are not persisted and have high turnover anyway).

Thus there is typically about 10MB (in total) being sent/received in the preliminary GetData requests/responses to/from the seed nodes. On slow networks, this appears to frequently result in timeouts and difficulty starting up the application, as the 3MB response arrives after the 90s timeout window and gets rejected.

In an attempt to alleviate the problem, this PR replaces the excluded_keys list in the GetData requests with a delta structure (that is a sort of binary checksum/linear code syndrome), which is XORed with a version computed equivalently at the receiving node, to obtain a decodable list of keys. This is the symmetric difference of the two sets of keys at each node, which tells the responding node which keys are missing from the peer.

The delta is decodable provided the symmetric difference isn't too large compared to the physical delta size (powers-of-two from 8KB to 128KB). With the data structure I used, an 8KB delta can reliably decode up to 550 keys and a 128KB delta can reliably decode up to 10450 keys. Since this holds true only when the keys are randomised, they are all hashed to a fixed 64-bit width using a randomly seeded secure PRF (in this case SipHash) - 64 bits is enough to ensure a very low probability of collision per request and deliberately causing collisions is impossible since the random seed/hash-salt is an ephemeral shared secret.

In case decoding fails, the responding peer estimates the size of the undecodable delta and sends that information back, so that the requesting peer can retry with a delta structure big enough to accommodate the expected symmetric difference. The keys can also be restricted to an arbitrary range of 64-bit values, which effectively filters out items out at random, ensuring that the symmetric difference will be reliably decoded in the next request. The code will keep retrying until it gets everything.

The above requires new fields to be added to the PreliminaryGetDataRequest, GetUpdatedDataRequest and GetDataResponse proto definitions. I've tried to do this in a way which is backwards compatible by adding a provisional GET_DATA_KEY_SET_DELTA capability and only using the new delta structure in requests to peers which flag the capability.

This PR is a draft and still needs some work (e.g. to the unit tests and commit messages). I'm also considering making the following possible further enhancements:

  • In the initial requests to the pair of seed nodes, use the range filter to split the keyset in two and send a different half to each pair, so each one only sends back approximately 1.5 MB, cutting the network load in half again. After getting one response, wait a little while (rather than the usual 90s) to get the other response before retrying.

  • Now that the requests are cheap, it might make sense to limit the responses to 1MB or less and make repeated requests to sync everything, rather than getting it all in one go. This should make startup more reliable on very slow networks. Also possibly add a corresponding progress bar to the splash screen.

  • Optimistically persist/cache the offer book and mailbox entries for faster startup next time, in case the application was only shut down for a few hours or days. Since the response now contains a list of unrecognised keys, any items that don't match can be purged from the cache and the rest promoted to the protected store (as the seed node definitely still has them).

I'm not sure whether it would be best to include the above in this PR or consider them for later PRs.

@julianknutsen
Copy link
Contributor

julianknutsen commented Jan 13, 2020

My 2c since I was the last one in here. I can't commit to a full review of this even out of draft, but want to help avoid the pitfalls I fell into during my time in this layer and steer you towards more beneficial work.

How much testing have you done with this?

My understanding of the startup path from @chimp1984 is that the capabilities isn't set on these initial requests so switching on it isn't going to work. Here is the relevant discussion on my proposed changes to these requests a few cycles back: #3610 (comment)

The GetDataRequest would have additional fields which does not break backward compatibility. We just need to deal with the valid case that such fields can be null if the peer is not updated. We cannot use the capability feature here as the GetDataRequest is the first message and the capability not known at that moment. Thought if it turns out that its absolutely needed we could work on a capability exchange protocol which is anyway needed at some point (exchanging capabilities would be then the very first step). Currently the capability feature is a bit unreliable as it depends on the context if the peers have exchanges it.

Some other minor notes after a quick review:

  1. There seems to be quite a few intermingled changes in this one commit. Some performance, some API changes, the actual deltas... For ease of review it would be great to split out the smaller changes so it is easier to see what is actually changing.
  2. This code had enough random booleans in it to begin with and it would be a shame to move it back towards an unmaintainable mess. Consider a separate API for the V1 and V2 APIs that can be independently unit and integration tested without polluting a single API signature.
  3. Since this is the first feature that needs backwards compatibility, it is probably worth designing tests that specifically run the V2 APIs against a V1 store.
  4. Is there a release test plan for this? It is non-trivial work to handle the backwards compatibility and seednode roll-out especially since full-stack integration tests don't exist.

This is a risky enough area and a non-trivial amount of work. It would have been good to see a proposal so the work involved to review, merge, and test it could be analyzed before seeing the code. Everything right now needs a value-add and I'd like to see data (known bugs, user reports, etc) about whether the affected users for this performance patch will bring in the revenue to offset the cost. It might be better to target the known cost sinks for Bisq until the DAO is more self-sufficient.

@stejbac
Copy link
Contributor Author

stejbac commented Jan 13, 2020

Yes, I was planning to tidy it up a bit and separate it into multiple commits. I didn't want to sink too much more effort into it before getting some feedback. Most of the testing has been to the delta structure, although I did do some manual testing of the startup of the application on mainnet against a single seed node running the custom code on a separate machine (over Tor).

@chimp1984
Copy link
Contributor

@stejbac
I did not had time to look into it. But as far I remember we also add the keys for tradestatistic and accout age witness which are by far the biggest chunk (I think its 80k objects in total).
There is a rough idea to split historical tradestatistic and accout age witness data with the more recent one and only handle the recent one as it is now. With each release we would update the histroical data so thet the recent one only covers the last 1-2 months. As said is very rough idea and maybe there are better approaches, but if that would be implemented I assume it would make the data for the request radical smaller. I still appreciate your approach with a more efficient data structure for the request, without looking closer it reminds on bloom filters...

@julianknutsen
Thanks for your input and for taking time to comment!

@sqrrm
Copy link
Member

sqrrm commented Jan 17, 2020

I like the idea and it's a good idea to add the capability exchange as well as it will help in other areas.

I agree with @julianknutsen that this will require a fair bit of testing and it's not a trivial change. The question is if it's worth the effort to do it now or wait.

A lot of users are complaining about slow or failed startups so it's surely an issue that when solved or mitigated would increase usability and with that attract more users. As we don't have a good measure of the amount of users and an even worse measure of the amount of users having trouble with this it's hard to make a well founded decision. It would be good to at least have an estimate of the amount of work needed to get this to production.

@stejbac
Copy link
Contributor Author

stejbac commented Jan 18, 2020

I was largely motivated to look into this due to problems I've experienced from time to time starting up the application on my home network (and also some other poorer networks like a shared hotel WiFi and a phone WiFi hotspot in a rural area with a weak signal). If the local network is particularly bad then I've found it's basically impossible to start the application, due to repeated timeouts (without, say, manually raising them in the code from 90s to 180s).

I think it will be necessary to address the issue fairly soon, for scaling purposes, since the ~2MB request will grow over time as the trade statistics and witness data mount up. Also the ~3MB response (consisting mainly of offers and mailbox entries) will presumably increase in proportion to the trading volume, which will hopefully increase a lot during the next rally (whenever that is). So a sharply increasing proportion of users is likely to be affected over time.

@chimp1984
Copy link
Contributor

chimp1984 commented Jan 18, 2020

I think we have to separate different types of messages.

Trade statistics and account age witness objects are append only and ever growing. They are in sum about 80 000 objects as far I remember.

Offers and mailbox messages are not growing linearely with number of user or trade volume IMO as conversion time for an open offer to a trade will be faster with more traders. Offers and mailbox messages are in the same range most of the time as far I remember. Currently its 383 offers and 455 mailbox messages [1].

Trade statistics and account age witness use a 20 bytes hash (other data 32 bytes).
Total size for 80k objcets would be 1,6 MB, so thats the big chunk of the request. If we would treat those objcets differently (as proposed above using some splitting for historical data or some other idea) we would decrease the issue with the large request quite a bit (and decrease load for seed nodes by processing less keys).

I think the optimisation of the keys (e.g. bloom filter like structure) is still great but might get lower priority if the trade statistics and account age witness are treated differently.

So I would suggest to treat both problems separately and develop a roadmap how to implement and deploy both.

Regarding deployment:
I think we cannot use the capability feature here as at the first request the requester node does not know the seed nodes capabilities (though not 100% sure about that).
But seed nodes are under our control so we can ensure all are deployed with the feature before rolling it out to users. Only remaining problem would be that in case no seed node is available a normal node would be choosen for getData requests and then if that would be a not updated node a disconnection would be triggered as it receives an unknown message. But I think we can ignore that edge case as this would only become an issue if all seed nodes go offline (happend only once in 3,5 years).

I also suggest to use a new API (e.g. new message types) instead of overloading the existing one.

It still would require lots of care and testing but might not be as difficult to deploy like other new features which require same versions between regular nodes.

[1] From http://104.248.88.175

BlindVotePayload=181
RefundAgent=1
Filter=2
TempProposalPayload=87
MailboxStoragePayload=455
ProposalPayload=258
Alert=1
Mediator=2
OfferPayload=383
SignedWitness=1768
BSQ Blocks=613414

@stale
Copy link

stale bot commented Feb 17, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@freimair
Copy link
Member

freimair commented Feb 27, 2020

I had this issue in mind for months now (and obviously missed this suggestion). Here is what I would suggest. Please note that this is the result of previous discussions with @chimp1984 and others.

Proposal

  • bin up the data
  • create a "special" key for addressing bins
  • tie those bins to the application version

Example

By adding the info "I am Bisq v1.2.1" to the list of known objects, we know what objects the client has - namely, objects shipped with the data stores of v1.2.1.

Discussion

Advantages

  • reduce the number of historical keys to be send to O(1) (right now, it is O(n), @stejbac's solution is probably O(ln(n))(?))
  • only touches trade statistics and account age witness objects (which are > 75% of all objects and growing)
  • no new fields needed in the protocol, thus, easier upgrade procedure
  • more robust? if a requestee does not know the "special" key, it just sends all the data
  • much simpler and therefore, easier to maintain compared to @stejbac's solution

Disadvantages

  • introduce a dependency on "time", ie. the bisq version

Aftermath

please keep in mind this is by no means complete. There are some hidden efforts for sure. But as a general direction, that would line up with discussions and results of previous months.

@stale
Copy link

stale bot commented Mar 28, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stejbac
Copy link
Contributor Author

stejbac commented Apr 4, 2020

The bandwidth requirements are not really O(ln(N)) in the keyset size - its more or less proportional to the size of the symmetric difference of the keysets, that is the number of keys the seed node has that the peer doesn't (in the case that the peer doesn't somehow have any keys unknown to the seed node).

So as the datasets grow, the request shouldn't really grow in size. But if the peer has been offline for a while, the initial request will fail due to the difference in the keysets being too large for the initial default delta datastructure to handle. In that case, it will retry with a larger request datastructure that's highly likely to be able to accommodate the difference. So I think that's the main cost with this method - there's no guarantee the initial request to the seed nodes will succeed, which adds some extra delay to the startup of nodes that have been offline for a while (say for months, or maybe days/weeks in future if Bisq does 10x).

Perhaps its better to think of the algorithm to be like that of rsync rather than Bloom filters. (It doesn't use a rolling hash, but like rsync it uses linear coding to detect and locate differences between two datasets.)

Is it worth picking this up again soon to complement the approach taken in bisq-network/projects#25, or leaving it a while perhaps?

@stale
Copy link

stale bot commented May 4, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the was:dropped label May 4, 2020
@cd2357
Copy link
Contributor

cd2357 commented May 4, 2020

Is this still being worked on? Wanted to have a look at it and evtl try it out, but latest commit is in January.

@stale stale bot removed the was:dropped label May 4, 2020
@freimair
Copy link
Member

freimair commented May 5, 2020

there is a project proposal in the making for maybe integrating this into Bisq.

The current status is, that this is not exactly what we need since our status quo in terms of number of objects would overburden the solution by a factor already. However, with bisq-network/projects#25 in place, this becomes interesting again. At least it is a strong candidate.

@stale
Copy link

stale bot commented Jun 4, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the was:dropped label Jun 4, 2020
@stale stale bot removed the was:dropped label Jun 7, 2020
@stejbac
Copy link
Contributor Author

stejbac commented Jun 7, 2020

I've rebased onto master, as these changes were made a while ago.

@stale
Copy link

stale bot commented Jul 7, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the was:dropped label Jul 7, 2020
@stale
Copy link

stale bot commented Jul 14, 2020

This issue has been automatically closed because of inactivity. Feel free to reopen it if you think it is still relevant.

@stale stale bot closed this Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants