KEP-987: Refactor node to re-use sessions. remove raft. #228

isabelsavannah · 2019-01-23T19:41:27Z

The interesting files are

swarm.cpp
node.cpp
session.cpp
node_test_common.hpp
node_test.cpp
session_test.cpp

pretty much everything else is just collateral damage of the interface changes. I also want to make send_message operate by uuid instead of endpoint, but that requires some design thinking and this was too large already.

ebruck

First quick pass.

I don't see how stale sessions are cleaned up.
Holding a strong shared pointer will not allow asio to destroy the session and cleanup.

swarm/main.cpp

node/node.hpp

node/node.cpp

ebruck · 2019-01-28T19:41:30Z

node/node.cpp

+    this->weak_priv_protobuf_handler =
+            [weak_self = weak_from_this()](auto msg, auto session)
+            {
+                auto strong_self = weak_self.lock();


Why a weak ptr?

to break the cyclical reference between session and node. It could be a shared pointer if node were changed to have weak ptrs.

Unless you track the session in node, you will always create a new connection when pbft could of used an existing one.

I don't understand this? We do track the session in node.

ebruck · 2019-01-28T19:49:33Z

node/node.cpp

-        this->chaos->reschedule_message(std::bind(&node::send_message_str, shared_from_this(), std::move(ep_copy), std::move(msg), close_session));
-        return;
-    }
+        std::lock_guard<std::mutex> lock(this->session_map_mutex);


I'd look at a shared lock which allows for reader/writer access.

My thinking is that the lock is never held for long (only looking up a session, or at worst initiating an async operation), so I didn't want to prematurely optimize

node/node.cpp

node/session.hpp

isabelsavannah · 2019-01-28T23:10:46Z

The answer to "how do sessions get cleaned up" is lazily: when node tries to send a message on a session and discovers that it has closed, it replaces the session, which removes the last shared pointer.

The case where it doesn't get cleaned up is where no message is ever sent to that endpoint (say it was a one-off client). This still doesn't leak a file descriptor (the socket times out and closes), but it does leak some memory. Changing it to a weak pointer makes it a bit cleaner, but it doesn't solve the issue: the weak pointer itself won't be removed from node's session map in this case, and it adds overhead on every message send. I think the solution with either kind of pointer is to regularly sweep the map and remove dead sessions; my intent was to delay implementing that because this PR is too large already.

(the other comments I agree with and I'll put a commit for them in later; I just wanted to unblock this conversation promptly)

ebruck · 2019-01-28T23:34:34Z

The answer to "how do sessions get cleaned up" is lazily: when node tries to send a message on a session and discovers that it has closed, it replaces the session, which removes the last shared pointer.

The case where it doesn't get cleaned up is where no message is ever sent to that endpoint (say it was a one-off client). This still doesn't leak a file descriptor (the socket times out and closes), but it does leak

How? The member variable containing the websocket will never be destroyed, because session's destructor will never be called or until you replace the entry. A timeout does not close the FD as far as I understand networking. I could be wrong. Have you tested this? Does Beast do this for you?

Edit: OK I see you are closing on error in the completion handlers.

some memory. Changing it to a weak pointer makes it a bit cleaner, but it doesn't solve the issue: the weak pointer itself won't be removed from node's session map in this case, and it adds overhead on

I imagine any overhead would be nothing next to what it takes to sign the data or invoke any of the locks we are using. It's premature to worry about this and I'd rather let a profiler guide us instead.

every message send. I think the solution with either kind of pointer is to regularly sweep the map and remove dead sessions; my intent was to delay implementing that because this PR is too large already.

This is how subscription manager deals with it.

(the other comments I agree with and I'll put a commit for them in later; I just wanted to unblock this conversation promptly)

Yeah, I'll try to finish my review ASAP.

node/session.cpp

node/session.hpp

ebruck

There are quite a few indentation code style errors when lambdas are being used as function args.

node/session_base.hpp

paularchard

I see that you've removed the ability to auto-close the session after a message is sent. I was wondering what the rationale for that is?

Also, in the case of an error on session::write, you re-queue the message then close the socket. Is there any intention to try and restart communication? If not, what's the point in re-pushing the data (apart from logging a warning when the session is destroyed)?

isabelsavannah · 2019-01-29T22:48:31Z

I see that you've removed the ability to auto-close the session after a message is sent. I was wondering what the rationale for that is?

The sender of the message won't (and shouldn't) know if the message is sent over an existing session or a new one, and in the former case closing the session would potentially interfere with other messages. Ideally I'd like to have sessions be contained within node, so other stuff doesn't have to reason about them.

Also, in the case of an error on session::write, you re-queue the message then close the socket. Is there any intention to try and restart communication? If not, what's the point in re-pushing the data (apart from logging a warning when the session is destroyed)?

Yes, my intent is for the session to be re-used with a new connection automatically.

ebruck · 2019-01-30T21:07:00Z

node/session.hpp

@@ -71,7 +71,7 @@ namespace bzn
        std::list<std::shared_ptr<bzn::encoded_message>> write_queue;

        bzn::protobuf_handler proto_handler;
-        bzn::session_death_handler death_handler;
+        bzn::session_shutdown_handler death_handler;


sorry I should of also mentioned renaming the member variable as well.

ebruck

You may want to let Monty know that subscriptions will close after 5 min of inactivity. He will need to reconnect or periodically call status or something else to keep the connection alive.

isabelsavannah force-pushed the task/iscroggin/KEP-987 branch 3 times, most recently from 9ab3ab0 to 841fb79 Compare January 26, 2019 00:28

isabelsavannah changed the title ~~just checking coverage don't mind me~~ KEP-987: Refactor node to re-use sessions. remove raft. Jan 26, 2019

isabelsavannah requested review from ebruck and paularchard January 26, 2019 01:08

ebruck suggested changes Jan 28, 2019

View reviewed changes

ebruck reviewed Jan 29, 2019

View reviewed changes

node/session.cpp Show resolved Hide resolved

ebruck reviewed Jan 29, 2019

View reviewed changes

node/session.hpp Outdated Show resolved Hide resolved

ebruck reviewed Jan 29, 2019

View reviewed changes

node/session_base.hpp Outdated Show resolved Hide resolved

ebruck reviewed Jan 29, 2019

View reviewed changes

node/session_base.hpp Outdated Show resolved Hide resolved

paularchard reviewed Jan 29, 2019

View reviewed changes

isabelsavannah closed this Jan 29, 2019

isabelsavannah reopened this Jan 29, 2019

ebruck reviewed Jan 30, 2019

View reviewed changes

isabelsavannah force-pushed the task/iscroggin/KEP-987 branch 2 times, most recently from c0643b7 to c9ade56 Compare January 31, 2019 20:50

isabelsavannah added 5 commits January 31, 2019 14:39

KEP-987: Refactor node to re-use sessions. remove raft.

7978eb7

review comments

40219bb

rename 'death handler's

498724c

more rename

8e2eb53

cleaning after rebase

951a3b9

isabelsavannah force-pushed the task/iscroggin/KEP-987 branch from c9ade56 to 951a3b9 Compare January 31, 2019 22:40

ebruck approved these changes Feb 1, 2019

View reviewed changes

isabelsavannah merged commit 4c5d2a6 into devel Feb 1, 2019

ebruck pushed a commit that referenced this pull request Mar 17, 2019

KEP-987: Refactor node to re-use sessions. remove raft. (#228)

a3520ee

isabelsavannah deleted the task/iscroggin/KEP-987 branch March 18, 2019 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-987: Refactor node to re-use sessions. remove raft. #228

KEP-987: Refactor node to re-use sessions. remove raft. #228

isabelsavannah commented Jan 23, 2019 •

edited

ebruck left a comment

ebruck Jan 28, 2019

isabelsavannah Jan 28, 2019 •

edited

ebruck Jan 29, 2019

isabelsavannah Jan 29, 2019

ebruck Jan 28, 2019

isabelsavannah Jan 29, 2019

isabelsavannah commented Jan 28, 2019

ebruck commented Jan 28, 2019 •

edited

ebruck left a comment

paularchard left a comment

isabelsavannah commented Jan 29, 2019

ebruck Jan 30, 2019

ebruck left a comment

KEP-987: Refactor node to re-use sessions. remove raft. #228

KEP-987: Refactor node to re-use sessions. remove raft. #228

Conversation

isabelsavannah commented Jan 23, 2019 • edited

ebruck left a comment

Choose a reason for hiding this comment

ebruck Jan 28, 2019

Choose a reason for hiding this comment

isabelsavannah Jan 28, 2019 • edited

Choose a reason for hiding this comment

ebruck Jan 29, 2019

Choose a reason for hiding this comment

isabelsavannah Jan 29, 2019

Choose a reason for hiding this comment

ebruck Jan 28, 2019

Choose a reason for hiding this comment

isabelsavannah Jan 29, 2019

Choose a reason for hiding this comment

isabelsavannah commented Jan 28, 2019

ebruck commented Jan 28, 2019 • edited

ebruck left a comment

Choose a reason for hiding this comment

paularchard left a comment

Choose a reason for hiding this comment

isabelsavannah commented Jan 29, 2019

ebruck Jan 30, 2019

Choose a reason for hiding this comment

ebruck left a comment

Choose a reason for hiding this comment

isabelsavannah commented Jan 23, 2019 •

edited

isabelsavannah Jan 28, 2019 •

edited

ebruck commented Jan 28, 2019 •

edited