Skip to content
This repository has been archived by the owner on Mar 3, 2020. It is now read-only.

KEP-987: Refactor node to re-use sessions. remove raft. #228

Merged
merged 5 commits into from Feb 1, 2019

Conversation

isabelsavannah
Copy link
Contributor

@isabelsavannah isabelsavannah commented Jan 23, 2019

The interesting files are

swarm.cpp
node.cpp
session.cpp
node_test_common.hpp
node_test.cpp
session_test.cpp

pretty much everything else is just collateral damage of the interface changes. I also want to make send_message operate by uuid instead of endpoint, but that requires some design thinking and this was too large already.

@isabelsavannah isabelsavannah force-pushed the task/iscroggin/KEP-987 branch 3 times, most recently from 9ab3ab0 to 841fb79 Compare January 26, 2019 00:28
@isabelsavannah isabelsavannah changed the title just checking coverage don't mind me KEP-987: Refactor node to re-use sessions. remove raft. Jan 26, 2019
Copy link
Contributor

@ebruck ebruck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick pass.

I don't see how stale sessions are cleaned up.
Holding a strong shared pointer will not allow asio to destroy the session and cleanup.

swarm/main.cpp Show resolved Hide resolved
node/node.hpp Outdated Show resolved Hide resolved
node/node.cpp Outdated Show resolved Hide resolved
node/node.cpp Outdated
this->weak_priv_protobuf_handler =
[weak_self = weak_from_this()](auto msg, auto session)
{
auto strong_self = weak_self.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a weak ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to break the cyclical reference between session and node. It could be a shared pointer if node were changed to have weak ptrs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you track the session in node, you will always create a new connection when pbft could of used an existing one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this? We do track the session in node.

this->chaos->reschedule_message(std::bind(&node::send_message_str, shared_from_this(), std::move(ep_copy), std::move(msg), close_session));
return;
}
std::lock_guard<std::mutex> lock(this->session_map_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd look at a shared lock which allows for reader/writer access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that the lock is never held for long (only looking up a session, or at worst initiating an async operation), so I didn't want to prematurely optimize

node/node.cpp Outdated Show resolved Hide resolved
node/node.cpp Outdated Show resolved Hide resolved
node/node.cpp Outdated Show resolved Hide resolved
node/session.hpp Outdated Show resolved Hide resolved
@isabelsavannah
Copy link
Contributor Author

The answer to "how do sessions get cleaned up" is lazily: when node tries to send a message on a session and discovers that it has closed, it replaces the session, which removes the last shared pointer.

The case where it doesn't get cleaned up is where no message is ever sent to that endpoint (say it was a one-off client). This still doesn't leak a file descriptor (the socket times out and closes), but it does leak some memory. Changing it to a weak pointer makes it a bit cleaner, but it doesn't solve the issue: the weak pointer itself won't be removed from node's session map in this case, and it adds overhead on every message send. I think the solution with either kind of pointer is to regularly sweep the map and remove dead sessions; my intent was to delay implementing that because this PR is too large already.

(the other comments I agree with and I'll put a commit for them in later; I just wanted to unblock this conversation promptly)

@ebruck
Copy link
Contributor

ebruck commented Jan 28, 2019

The answer to "how do sessions get cleaned up" is lazily: when node tries to send a message on a session and discovers that it has closed, it replaces the session, which removes the last shared pointer.

The case where it doesn't get cleaned up is where no message is ever sent to that endpoint (say it was a one-off client). This still doesn't leak a file descriptor (the socket times out and closes), but it does leak

How? The member variable containing the websocket will never be destroyed, because session's destructor will never be called or until you replace the entry. A timeout does not close the FD as far as I understand networking. I could be wrong. Have you tested this? Does Beast do this for you?

Edit: OK I see you are closing on error in the completion handlers.

some memory. Changing it to a weak pointer makes it a bit cleaner, but it doesn't solve the issue: the weak pointer itself won't be removed from node's session map in this case, and it adds overhead on

I imagine any overhead would be nothing next to what it takes to sign the data or invoke any of the locks we are using. It's premature to worry about this and I'd rather let a profiler guide us instead.

every message send. I think the solution with either kind of pointer is to regularly sweep the map and remove dead sessions; my intent was to delay implementing that because this PR is too large already.

This is how subscription manager deals with it.

(the other comments I agree with and I'll put a commit for them in later; I just wanted to unblock this conversation promptly)

Yeah, I'll try to finish my review ASAP.

node/session.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ebruck ebruck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few indentation code style errors when lambdas are being used as function args.

node/session_base.hpp Outdated Show resolved Hide resolved
node/session_base.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@paularchard paularchard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you've removed the ability to auto-close the session after a message is sent. I was wondering what the rationale for that is?

Also, in the case of an error on session::write, you re-queue the message then close the socket. Is there any intention to try and restart communication? If not, what's the point in re-pushing the data (apart from logging a warning when the session is destroyed)?

@isabelsavannah
Copy link
Contributor Author

I see that you've removed the ability to auto-close the session after a message is sent. I was wondering what the rationale for that is?

The sender of the message won't (and shouldn't) know if the message is sent over an existing session or a new one, and in the former case closing the session would potentially interfere with other messages. Ideally I'd like to have sessions be contained within node, so other stuff doesn't have to reason about them.

Also, in the case of an error on session::write, you re-queue the message then close the socket. Is there any intention to try and restart communication? If not, what's the point in re-pushing the data (apart from logging a warning when the session is destroyed)?

Yes, my intent is for the session to be re-used with a new connection automatically.

node/session.hpp Outdated
@@ -71,7 +71,7 @@ namespace bzn
std::list<std::shared_ptr<bzn::encoded_message>> write_queue;

bzn::protobuf_handler proto_handler;
bzn::session_death_handler death_handler;
bzn::session_shutdown_handler death_handler;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I should of also mentioned renaming the member variable as well.

@isabelsavannah isabelsavannah force-pushed the task/iscroggin/KEP-987 branch 2 times, most recently from c0643b7 to c9ade56 Compare January 31, 2019 20:50
Copy link
Contributor

@ebruck ebruck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to let Monty know that subscriptions will close after 5 min of inactivity. He will need to reconnect or periodically call status or something else to keep the connection alive.

@isabelsavannah isabelsavannah deleted the task/iscroggin/KEP-987 branch March 18, 2019 23:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants