Having large numbers of networks can cause node allocation to fail #2655

dperny · 2018-06-04T18:04:33Z

Since 9fa9ce1, nodes are now allocated with a network attachment for every network.

Unfortunately, this can cause the size of a Node object to exceed the maximum raft message size, which prevents it from being committed to the object store.

From moby/moby#36792

Forgot to mention the same error was triggered by a different action than the one initially reported on this issue, in my case it appeared when creating ~1700 networks and deploying one service on each network.
Jun 01 00:39:52 ip-172-16-0-128 dockerd[1284]: time="2018-06-01T00:39:52.220386603Z" level=error msg="Failed to commit allocation of network resources for node rul9pnxcc2hpj3o7eya1redpk" error="raft: raft message is too large and can't be sent" module=node node.id=281crvu

There are many possible fixes, including disallowing too many attachments, but one way or another we can't wind up with raft messages too large like this.

/cc @ctelfer

The text was updated successfully, but these errors were encountered:

ddtmachado · 2018-06-05T00:39:26Z

Hey @dperny this 9fa9ce1 also degraded the performance of docker managers and service creation in general on the situation described by @eduardolundgren

I can show more precise values, steps and the kind of stress test on the next run. but right now I see like 300% increase in CPU usage on the swarm leader (average use was 15% CPU while running 17.09 and 50% running versions after that commit like 17.12, 18.03 and 18.05)

Basic scenario was a swarm cluster of 3 managers and 12 workers, then running a script that created 2000 networks and 2000 services using those networks.

eduardolundgren · 2018-09-20T16:46:08Z

@dperny any update on this issue?

dperny added kind/bug area/networking labels Jun 4, 2018

dperny mentioned this issue Jun 4, 2018

Error trying to remove dead nodes from swarm: raft message is too large and can't be sent moby/moby#36792

Closed

dperny mentioned this issue Jun 6, 2018

Include networks in node assignments, instead of embedding them in attachments #2658

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Having large numbers of networks can cause node allocation to fail #2655

Having large numbers of networks can cause node allocation to fail #2655

dperny commented Jun 4, 2018

ddtmachado commented Jun 5, 2018

eduardolundgren commented Sep 20, 2018

Having large numbers of networks can cause node allocation to fail #2655

Having large numbers of networks can cause node allocation to fail #2655

Comments

dperny commented Jun 4, 2018

ddtmachado commented Jun 5, 2018

eduardolundgren commented Sep 20, 2018