Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOAL2-777] Catchup performance : avoid retrying unless previous block was retrieved #31

Merged
merged 1 commit into from
Jun 17, 2019

Conversation

tsachiherman
Copy link
Contributor

Existing code would retry retrieving a block regardless of whether the previous block was retrieved successfully. While (functionally) it won't hurt to make the subsequent attempt, it generate excessive network traffic when the first block retrieval is being delayed.

We want to keep the happy path retrieve blocks as fast as possible, while slowing down once we ran into network failures.

@tsachiherman
Copy link
Contributor Author

Just a small note - these changes are only optimization. I have no reason to believe that the existing network would not be able to sustain the expected load ( by order of magnitude )

@Vervious
Copy link
Contributor

Vervious commented Jun 14, 2019

I am worried that this might adversely affect parallelism: consider the following scenario:

We call fetchAndWrite on the next 50 blocks; all the fetchers (in parallel) hit the same peer X for blocks r + 1, r + 2, ... r + 50; peer X happens to be misconfigured, so all of these requests fail.

But now, all fifty requests are serialized, waiting for seed loopbacks. Round r + 3 is waiting for r + 1, r+5 is waiting for r + 3, r + 50 is waiting for r + 48; because of one faulty peer, and catchup is no longer parallelized until all 50 blocks are fetched.

Is this a problem? Let me think about it some more.

@zeldovich
Copy link
Contributor

I am worried that this might adversely affect parallelism: consider the following scenario:

We call fetchAndWrite on the next 50 blocks; all the fetchers (in parallel) hit the same peer X for blocks r + 1, r + 2, ... r + 50; X happens to be misconfigured, so all of these requests fail.

But now, all fifty requests are serialized, waiting for seed loopbacks, because of one faulty peer, and catchup is no longer parallelized until all 50 blocks are fetched.

Is this a problem? Let me think about it some more.

In this scenario, peer X will be taken out of the candidate peer list, and subsequent block fetches will be pipelined properly.

@Vervious
Copy link
Contributor

Vervious commented Jun 14, 2019

In this scenario, peer X will be taken out of the candidate peer list, and subsequent block fetches will be pipelined properly.

The peer list is checked before we call client.GetBlockBytes, and it seems possible that we could hit the same peer in parallel until a request errors.

That being said, I looked at the code again, and we select peers randomly and look at how many active fetches there are. So probably it's ok.

@derbear derbear merged commit 7b2547b into algorand:master Jun 17, 2019
pzbitskiy pushed a commit to pzbitskiy/go-algorand that referenced this pull request Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants