Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prb-wm – too many concurrent requests with many workers #1285

Open
ngrewe opened this issue May 26, 2023 · 9 comments
Open

prb-wm – too many concurrent requests with many workers #1285

ngrewe opened this issue May 26, 2023 · 9 comments

Comments

@ngrewe
Copy link

ngrewe commented May 26, 2023

When the new prb-wm service handles more than ~250 workers with default settings, we start seeing the following kind of error:

stopped due to error: Rpc error: RPC error: Configured max number of request slots exceeded

This error comes from jsonrpcee and occurs because some client is build with a low (default is 256) limit of concurrent requests and the prb is utilising many of them. I suspect that it's this client here, but I'm not familiar with architecture to be sure:

https://github.com/Phala-Network/phala-blockchain/blob/8fe05eb72b76f4939d0c03e62a3fc7e58b260a5c/standalone/prb/src/datasource.rs#LL555C1-L555C85

We seem to have worked around this (somewhat) by increasing CACHE_SIZE, but it would be nice to have the number of concurrent connections configurable.

@PHA-SYSOPS
Copy link

I can confirm the problem exists in serveral farms. The problem increases when you have more workers connected. I was looking for the specific code that triggers this, so good find @ngrewe ... i now have more to finally get devs moving on this.

@PHA-SYSOPS
Copy link

Oh i also noticed that using headerscache solves it ... BUT you must implement HC correctly (synced, good bind address not 127.0.0.1, etc) then it does limit the amount of load towards the node hitting this client slots issue. So to reproduce just remove HC from the config, only use 1 node as backend, with a WM having ~150 workers.

@ngrewe
Copy link
Author

ngrewe commented Jun 20, 2023

The thing about headers-cache is that the version in the latest docker image is a bit unreliable because it tends to also import unfinalized blocks, which causes sync issues. There's a fix for this in-tree, though... just not released.

@jasl
Copy link
Collaborator

jasl commented Jun 21, 2023

The thing about headers-cache is that the version in the latest docker image is a bit unreliable because it tends to also import unfinalized blocks, which causes sync issues. There's a fix for this in-tree, though... just not released.

I make a release, could you help to try it?

just change the Docker image to

jasl123/phala-headers-cache:23062301
DIGEST:sha256:c0479365396092bf066095fb6ce606c693617d7f3fe585aa70b95692b675f82f

If everything good, I shall move it to phalanetwork org

@ngrewe
Copy link
Author

ngrewe commented Jun 24, 2023

I make a release, could you help to try it?

We'll deploy it in a test environment. I'll get back to you after I've had it soak test for a few days.

@Nexus2k
Copy link

Nexus2k commented Aug 12, 2023

Got the same issue, using 1 node and don't have header-cache, just a single archive node and 178 + 116 workers on two different PRBv3 wm's.
@krhougs can you refactor prb to reuse node connections for all managed workers instead of using a new connection each?

@Nexus2k
Copy link

Nexus2k commented Aug 20, 2023

Can someone maybe refactor the prb code so it just uses a small amount of RPC clients instead of several per worker?

@jasl
Copy link
Collaborator

jasl commented Sep 19, 2023

Sorry for the late response but we're testing a workaround #1388

jasl123/phala-prb:23091801
DIGEST:sha256:9330afe8e474b1c709e9d4def0a04edfc94349d44726e15e4ec7490d15e770fd

@Nexus2k
Copy link

Nexus2k commented Sep 19, 2023

I've already patched my prb version to allow a higher connection count but that just overloads the node more. Please implement some connection pooling or other way that not every worker establishes a new node connection. Other PRBv3 users also mentioned it's more stable when using a header cache which is something that is not mentioned on the public mining wiki.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants