Integration tests is sending requests to `0.0.0.0` #1051

Geod24 · 2020-07-22T08:05:14Z

Just do a docker-compose up and see the logs:

node-3_1  | 2020-07-22 08:03:04,541 Trace [agora.network.NetworkClient] - Request 'getPublicKey' to http://0.0.0.0:3826 failed: Failed to connect to 0.0.0.0:3826: refused
node-3_1  | 2020-07-22 08:03:04,541 Trace [agora.network.NetworkClient] - Request 'getPublicKey' to http://0.0.0.0:2826 failed: Failed to connect to 0.0.0.0:2826: refused
node-3_1  | 2020-07-22 08:03:06,43 Trace [agora.network.NetworkClient] - Request 'getPublicKey' to http://0.0.0.0:4826 failed: Failed to connect to 0.0.0.0:4826: refused
node-3_1  | 2020-07-22 08:03:06,43 Trace [agora.network.NetworkClient] - Request 'getPublicKey' to http://0.0.0.0:3826 failed: Failed to connect to 0.0.0.0:3826: refused
node-3_1  | 2020-07-22 08:03:06,44 Trace [agora.network.NetworkClient] - Request 'getPublicKey' to http://0.0.0.0:2826 failed: Failed to connect to 0.0.0.0:2826: refused

I don't see why we would ever send a message to 0.0.0.0 ?

The text was updated successfully, but these errors were encountered:

Geod24 · 2020-07-26T23:46:30Z

Here is a stack trace:

632(7): object.Exception@source/agora/network/NetworkManager.d(706): Wrong URL instantiated
----------------
??:? agora.api.Validator.API agora.network.NetworkManager.NetworkManager.getClient(immutable(char)[], core.time.Duration) [0x10d68ce81]
??:? void agora.network.NetworkManager.NetworkManager.ConnectionTask.connect() [0x10d68c765]
??:? void vibe.core.task.TaskFuncInfo.set!(void delegate()).set(ref void delegate()).callDelegate(ref vibe.core.task.TaskFuncInfo) [0x10d78461b]
??:? void vibe.core.task.TaskFuncInfo.call() [0x10dbec2f1]
??:? nothrow void vibe.core.task.TaskFiber.run() [0x10dbebb33]
??:? fiber_entryPoint [0x10ddef54b]

Geod24 · 2020-07-27T00:05:19Z

Better trace:

core.exception.AssertError@source/agora/network/NetworkManager.d(506): Assertion failure
----------------
??:? _d_assert [0x103de5097]
??:? void agora.network.NetworkManager.NetworkManager.addAddress(immutable(char)[]) [0x10368cad7]
??:? int agora.network.NetworkManager.NetworkManager.addAddresses(agora.common.Set.Set!(immutable(char)[]).Set).__foreachbody2(immutable(char)[]) [0x10368cc3f]
??:? int agora.common.Set.Set!(immutable(char)[]).Set.opApply(scope int delegate(immutable(char)[])) [0x10368cbea]
??:? void agora.network.NetworkManager.NetworkManager.addAddresses(agora.common.Set.Set!(immutable(char)[]).Set) [0x10368bdcb]
??:? agora.network.NetworkManager.NetworkManager agora.network.NetworkManager.NetworkManager.__ctor(const(agora.common.Config.NodeConfig), const(agora.common.BanManager.BanManager.Config), const(immutable(char)[][]), const(immutable(char)[][]), agora.common.Metadata.Metadata, agora.common.Task.TaskManager) [0x103685037]
??:? agora.network.NetworkManager.NetworkManager agora.node.FullNode.FullNode.getNetworkManager(const(agora.common.Config.NodeConfig), const(agora.common.BanManager.BanManager.Config), const(immutable(char)[][]), const(immutable(char)[][]), agora.common.Metadata.Metadata, agora.common.Task.TaskManager) [0x10367b7a1]
??:? agora.node.FullNode.FullNode agora.node.FullNode.FullNode.__ctor(const(agora.common.Config.Config), void delegate() nothrow @trusted) [0x103678425]
??:? agora.node.Validator.Validator agora.node.Validator.Validator.__ctor(const(agora.common.Config.Config)) [0x103677edb]
??:? agora.node.FullNode.FullNode agora.node.Runner.runNode(agora.common.Config.Config) [0x103677b5a]
??:? agora.node.FullNode.FullNode agora.node.main.main(immutable(char)[][]).__lambda2() [0x1036778d8]
??:? void vibe.core.task.TaskFuncInfo.set!(agora.node.FullNode.FullNode delegate()).set(ref agora.node.FullNode.FullNode delegate()).callDelegate(ref vibe.core.task.TaskFuncInfo) [0x1037fc00b]
??:? void vibe.core.task.TaskFuncInfo.call() [0x103bea421]
??:? nothrow void vibe.core.task.TaskFiber.run() [0x103be9c63]
??:? fiber_entryPoint [0x103ded74b]

Geod24 · 2020-07-27T05:36:34Z

Urgh, this is because of registerListener.
We have this function: https://github.com/bpfkorea/agora/blob/a108551600f3991726364cd90cb303a27cabc324/source/agora/network/NetworkManager.d#L819-L831
Notice that it's using NodeConfig.address, which is not the node address. It is a netmask, most of the time 0.0.0.0.
This getAddress function is used when a handshake is completed: https://github.com/bpfkorea/agora/blob/a108551600f3991726364cd90cb303a27cabc324/source/agora/network/NetworkManager.d#L403-L404

This means that we register 0.0.0.0 as listener, which the node can never reach. Not to mention registerListener is an obvious DoS attack because one can register addresses of server to make the network DoS that target. We need to re-think our strategy here.

AndrejMitrovic · 2020-07-27T05:50:58Z

Copying my comments from slack:

It was a workaround, not a permanent solution.

Actually it was only a workaround because the quorum sets were not known. We could remove it entirely now that we know which nodes listen to each other.

AndrejMitrovic · 2020-07-27T06:30:33Z

Actually, is there a way to get the connecting address in vibe.d? Then it could just become register() without the parameter.

AndrejMitrovic · 2020-07-27T06:30:51Z

Although for localrest it's a different story.

Geod24 · 2020-07-27T06:31:52Z

Yeah I think we should move to "establish connection" instead of "register listener".

Although for localrest it's a different story.

Yeah it just works.

AndrejMitrovic · 2020-07-27T06:33:09Z

How does it "just work"? There's no way of getting the "address" of the connecting node.

Geod24 · 2020-07-27T06:37:02Z

Command.sender

AndrejMitrovic · 2020-07-27T06:53:19Z

I mean we will still need some way of feeding that into Agora.

AndrejMitrovic · 2020-07-28T01:43:38Z

I thought this might actually be simpler than I originally thought. Shouldn't it be possible to:

Remove registerListener from the interface
Keep registerListener as a public method in FullNode
Register a path manually in vibe.d, something like:

router.route("/register_listener")
    .post((scope HTTPServerRequest req, scope HTTPServerResponse res)
    {
        import std.format;
        string addr = format("%s:%s", req.clientAddress.toAddressString(),
            req.clientAddress.port());
        node.registerListener(addr);
        res.statusCode = 200;
        res.writeVoidBody();
    });

Calling this before registerRestInterface. Would this work?

AndrejMitrovic · 2020-07-28T01:44:25Z

But it was just an experiment, it's likely missing a lot of stuff.

linked0 · 2020-07-29T00:30:49Z

I can also see the logs as follows with the command, docker-compose up.

node-1_1  | [main(s17F) INF] Listening for requests on http://0.0.0.0:3826/
node-2_1  | Task terminated with uncaught exception: Request failure to http://0.0.0.0:3826 after 5 attempts
node-0_1  | Task terminated with uncaught exception: Request failure to http://0.0.0.0:4826 after 5 attempts
node-0_1  | Task terminated with uncaught exception: Request failure to http://0.0.0.0:3826 after 5 attempts
node-3_1  | Task terminated with uncaught exception: Request failure to http://0.0.0.0:3826 after 5 attempts

The goal of this issue is that the NetworkManager doesn't have invalid addresses like 0.0.0.0.

I think @AndrejMitrovic 's suggestion which is using vibe.d would be a good start. @AndrejMitrovic , Thank you for your help.
After writing code as he suggested, I will open a PR and also be searching for other solutions.

AndrejMitrovic · 2020-07-29T01:17:36Z

It would be good to get feedback from @Geod24 because he's used and worked on vibe.d for much longer than I have. He probably knows a better way to do this.

Geod24 · 2020-07-29T04:10:57Z

My preference would be to have a connection established between nodes that is kept alive, and have all requests going through it. That would completely alleviate the need for register listener, but it needs quite some work on the Vibe.d side.

The first step is to establish a connection and keep it alive, and registered on both sides. That should be enough for a node to "establish" a listener. and whenever it gets something it needs to gossip, it can do so on the connection directly. No need to use the same connection for queries at the beginning.

The second step is to separate the validator and full nodes connections into two pools. We always need to communicate with (all) validators, but full nodes are common, and we might want a smarter strategy to talk to them (e.gg. round robin). In order to do this separation, we need to add an optional handshake phase, where nodes send their public key, if any. Remember that it could be 2 validators, 2 full nodes, or 1 full node 1 validator.

To further this, we might want to add the ability to promote / demote a connection from one pool to the other, e.g. when a validator's enrollment start / expire. We surely need to remove those who expire from our set, or at least demote them.

The last step would be to change things so that we can use a connection with a RestInterfaceClient.

At the moment we can just get away with step 1 (for this very issue).

AndrejMitrovic · 2020-07-29T04:38:19Z

My preference would be to have a connection established between nodes that is kept alive, and have all requests going through it.

I think that's what vibe.d already does under the hood? At least that's my impression from reading through RestInterfaceClient / requestHTTP / connectHTTP.

We probably need to change defaultKeepAliveTimeout to a sensible default (it's set to 10 seconds).

AndrejMitrovic · 2020-07-29T04:39:51Z

However vibe.d does keep its own internal pool for all connections, and I'm not sure how easy it is to override this with our own connection pools. There's lots of things in the client which are protected which appear to make it customizable, but it's a guess how well this would work.

Geod24 · 2020-07-29T04:49:10Z

I think that's what vibe.d already does under the hood?

Yes but we have no way to tap into that. And we can't rely on it, because the pool has an eviction strategy. You don't want to loose a connection with a validator because suddenly an attacker is sending you a burst of connections / messages and you just evicted this other connection which wasn't as active.

AndrejMitrovic · 2020-07-29T04:50:42Z

Yes indeed.

linked0 · 2020-07-29T05:27:50Z

From reading your comments, we need to manage connections in two types of pools, like validator_conns, fullnode_conns. Before that, I guess we also need to get the right information about connections not like 0.0.0.0.. Which is the right way to get the information between overriding vibe.d's internal pool and registering a path manually in vibe.d as @AndrejMitrovic suggested?

linked0 · 2020-08-04T01:08:25Z

I found out that the defaultKeepAliveTimeout of the HTTPClientSettings and the keepAliveTimeout of the HTTPServerSettings do not support the TCP Keepalive. Please refer to the vibe.d code below.

Calculating the m_keepAliveLimit from the m_keepAliveTimeout in HTTPClientSettings

Using the m_keepAliveLimit in HTTPClientSettings

Using the m_keepAliveTimeout in HTTPServerSettings

I think we should think of other approaches.

Geod24 · 2020-08-04T08:32:34Z

Note: This is a pretty good resource: https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/index.html

Geod24 added type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense C. Network Communication An issue which is related to network communication labels Jul 22, 2020

Geod24 added this to the 2. Validator milestone Jul 22, 2020

Geod24 mentioned this issue Jul 26, 2020

Integration test doesn't have stack traces #1061

Closed

bpalaggi added the Story-Points:13 This takes > 7 days to complete label Jul 28, 2020

bpalaggi assigned linked0 Jul 28, 2020

linked0 mentioned this issue Jul 31, 2020

Make a node register as a listener with a valid address #1085

Merged

Geod24 linked a pull request Aug 7, 2020 that will close this issue

Make a node register as a listener with a valid address #1085

Merged

linked0 closed this as completed in #1085 Aug 10, 2020

AndrejMitrovic mentioned this issue Aug 13, 2020

Work around CircleCI failure #1115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration tests is sending requests to `0.0.0.0` #1051

Integration tests is sending requests to `0.0.0.0` #1051

Geod24 commented Jul 22, 2020

Geod24 commented Jul 26, 2020 •

edited

Loading

Geod24 commented Jul 27, 2020

Geod24 commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

Geod24 commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

Geod24 commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

AndrejMitrovic commented Jul 28, 2020

AndrejMitrovic commented Jul 28, 2020

linked0 commented Jul 29, 2020 •

edited

Loading

AndrejMitrovic commented Jul 29, 2020

Geod24 commented Jul 29, 2020

AndrejMitrovic commented Jul 29, 2020

AndrejMitrovic commented Jul 29, 2020

Geod24 commented Jul 29, 2020

AndrejMitrovic commented Jul 29, 2020

linked0 commented Jul 29, 2020 •

edited

Loading

linked0 commented Aug 4, 2020 •

edited

Loading

Geod24 commented Aug 4, 2020 •

edited

Loading

Integration tests is sending requests to 0.0.0.0 #1051

Integration tests is sending requests to 0.0.0.0 #1051

Comments

Geod24 commented Jul 22, 2020

Geod24 commented Jul 26, 2020 • edited Loading

Geod24 commented Jul 27, 2020

Geod24 commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

Geod24 commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

Geod24 commented Jul 27, 2020

AndrejMitrovic commented Jul 27, 2020

AndrejMitrovic commented Jul 28, 2020

AndrejMitrovic commented Jul 28, 2020

linked0 commented Jul 29, 2020 • edited Loading

AndrejMitrovic commented Jul 29, 2020

Geod24 commented Jul 29, 2020

AndrejMitrovic commented Jul 29, 2020

AndrejMitrovic commented Jul 29, 2020

Geod24 commented Jul 29, 2020

AndrejMitrovic commented Jul 29, 2020

linked0 commented Jul 29, 2020 • edited Loading

linked0 commented Aug 4, 2020 • edited Loading

Geod24 commented Aug 4, 2020 • edited Loading

Integration tests is sending requests to `0.0.0.0` #1051

Integration tests is sending requests to `0.0.0.0` #1051

Geod24 commented Jul 26, 2020 •

edited

Loading

linked0 commented Jul 29, 2020 •

edited

Loading

linked0 commented Jul 29, 2020 •

edited

Loading

linked0 commented Aug 4, 2020 •

edited

Loading

Geod24 commented Aug 4, 2020 •

edited

Loading