This repository has been archived by the owner. It is now read-only.

reduce synchronize timeouts and increase block batch size #2922

Merged
merged 2 commits into from Apr 6, 2018

Conversation

Projects
None yet
3 participants
@tbenz9
Collaborator

tbenz9 commented Apr 5, 2018

The sendBlkTimeout and sendBlocksTimeout are causing a huge amount of wasted time during the initial blockchain sync. I've synchronized the blockchain 3 times with default settings and Sia spent an average of 10701 seconds downloading the blocks. When I dropped the timeouts down to 10seconds I was able to download the blocks in just 815 seconds.

This PR tries to find a happy middle between slower peers and having a performant IBD. I believe every peer should be able to respond within the timeout limits specified, and we won't spend so much time waiting for the peer to timeout if it's not responding.

Changing MaxCatchUpBlocks to 25 is meant to reduce the overhead of requesting and sending so many blocks. This is harder to test since even if I ask for 100 blocks peers only send back 10 at a time right now, this change would have to propagate out to peers before we'll be able to measure its performance implications.

This PR is meant to improve the sync time while we wait for an overhaul of the consensus module.

@tbenz9 tbenz9 requested a review from DavidVorick Apr 5, 2018

@DavidVorick

This comment has been minimized.

Member

DavidVorick commented Apr 5, 2018

The speedup is impressive, and agreed that it's worth pursuing, but as discussed in Discord, we need to make sure that we aren't going to be causing problems for people using e.g. Tor or who are behind restrictive internets (North Korea, Iran, China). We need to make sure that Sia works out of the box for everyone.

If we had some sort of parallel downloads, or perhaps a smart timeout that would double in length each time that the timeout was hit, then we could get the speedup without breaking Sia for certain disadvantaged users.

Doing 25 blocks at a time is also potentially a problem, because on the Sia network that could be up to 50 MB of data. From what I understand, users need to be able to fetch that much data before the timeout expires.

I like the direction of this pull request but we need to make sure we're supporting disadvantaged users at the same time that we introduce speedups for our typical users.

@DavidVorick

This comment has been minimized.

Member

DavidVorick commented Apr 5, 2018

I think we can probably get most of the speedup by starting with a low timeout and steadily increasing it if we are unable to get any nodes to work at all.

For the consecutive blocks, 25 just seems like a really high number to me. It works well earlier in the chain when blocks are small, but in the long term that is a full 50MB per pass, which makes me really uncomfortable.

The true solution here is to implement headers-first block retrieval, and parallel block downloads from nodes. I know that's a lot more work than some simple tweaks to these constants though.

@tbenz9

This comment has been minimized.

Collaborator

tbenz9 commented Apr 5, 2018

Parallel downloads, smart timeouts, and header-first block retrieval are all excellent ideas that should be considered and implemented in the consensus overhaul. This PR is meant to improve our existing code, not rewrite any of it. Those features are way outside the scope of this PR.

I believe the timeouts I propose in this PR are a good balance of performance improvements without causing problems for the disadvanteaged users you mentioned. According to https://metrics.torproject.org the average Tor user can download 5MB in about 12 seconds achieving 3.33Mb/s throughput. Sia's worst case scenario of a 50MB batch size would take exactly 120 seconds to download at 3.33Mb/s. Tor users will still be able to download batches of 25 blocks even if every block is completely full.

According to http://www.speedtest.net/global-index/iraq Iraq has an average download speed of 7.22 Mb/s, they should have no problem downloading a worst case 50MB batch of blocks in 120 seconds. Iran, China, and every other country I've looked at is faster than Iraq.

I stand by my numbers because I believe they allow us to continue to serve the disadvantaged users while offering a noticeable reduction in the time it takes to download the blocks.

@DavidVorick

This comment has been minimized.

Member

DavidVorick commented Apr 6, 2018

For some reason, I was thinking you changed the timeout to 10 seconds, not 2 minutes. I'm not sure why I was thinking that, but my original comment was based on a change to 10 seconds.

We care about medians more than averages, but more specifically we care about the 95th percentile, not the 50th percentile. This is the best resource I could find for that for now: https://www.fastmetrics.com/internet-connection-speed-by-country.php

According to those metrics, even the top 10 countries all have >5% of users at under 4mbps speeds. Some countries have their average speeds (on this particular graph, anyway) under 2mbps (including countries outside of Africa).

For the early days of Sia, a lot of chinese users were complaining continuously that it was very difficult to sync a node in China. These complains subsided substantially when we starting bumping the timeouts to 2 minutes+. That's the biggest reason I'm being stubborn about this - it'd be really bad from my point of view to change some constants and then suddenly a huge user segment starts having trouble syncing. We largely stopped receiving these complaints after we rolled out network changes to substantially boost the timeout and keepalive constants that we were using.


I am definitely not comfortable doing 25 blocks at a time. Especially because we will have trouble testing this until it's fully rolled out anyway.

For the timeouts, they aren't so bad. RelayHeader and SendBlk are both single-round trip RPCs. Relay header has a tiny payload, and SendBlk only ever goes up to about 2 MB for payload. sendBlocks is a little heavier, the caller both writes a small payload, and reads a large payload. Assuming 1mbps and a high-latency handshake, which I think is fair given the statistics I linked above, you'd need at least 3 minutes to complete a download of 10 full size blocks.

In this matter, I am strongly inclined to be highly conservative. I want Sia to be a very robust platform, and I want it to be known for being robust. That's a reputation we don't currently have, and I'm cautious to adjust constants like this especially when it's been a user-reported problem in the past.

@DavidVorick

This comment has been minimized.

Member

DavidVorick commented Apr 6, 2018

If you want something that can be merged quickly, I would be happy if you dropped the change to 25 blocks, and you increased the sendBlocksTimeout to 3 minutes.

@tbenz9

This comment has been minimized.

Collaborator

tbenz9 commented Apr 6, 2018

I've updated the constants to numbers you are comfortable with.

I did want to make one last petition for you to reconsider a lower timeout. The fastmetrics.com study you refer to is using data from 2015 (except where "Update" is specified) Internet speeds have increased in the last 3 years.

Also, it's important to consider the Sia target market. I would argue that users looking for cloud storage are aware of their bandwidth limitations, and businesses are likely to have more reliable and higher bandwidth connections than residential users. In other words, I think going after the 95 percentile is the wrong target audience for your product. Furthermore, those Internet speed metrics are dragged down by slow mobile connections (which is not Sia's current target market).

In summary, I think you're being overly conservative in these numbers and causing unnecessarily slow sync times to be able to serve a very small population of users who are not your target market.

edit: changed "audience" to "market" as it's a better term for what I was trying to describe.

@DavidVorick

This comment has been minimized.

Member

DavidVorick commented Apr 6, 2018

I appreciate your comments and understand where you are coming from, you are correct that Sia is not as useful to people with slow internet speeds, and also if your internet is bottom 5% of the country, you are probably not the Sia target market.

But I think the vision for decentralized infrastructure goes quite a bit deeper than this, and I also thing that we can implement some coding solutions incrementally which will speed up Sia substantially for the high-end users without barricading low end users.

@DavidVorick DavidVorick merged commit b428755 into NebulousLabs:master Apr 6, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lukechampine

This comment has been minimized.

Member

lukechampine commented Apr 7, 2018

@tbenz9, let's make an issue for "Speed up IBD using smarter timeouts" that describes the problem and suggests solving it by starting with a short timeout and extending it as needed.

@tbenz9 tbenz9 deleted the tbenz9:sync-speedup branch Apr 10, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.