Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active socket, implementation with state machine #68

Merged
merged 48 commits into from
Jul 18, 2023

Conversation

bokner
Copy link
Contributor

@bokner bokner commented Jul 10, 2023

Switch to use active mode for the socket. Addresses #66.

mix.exs Outdated Show resolved Hide resolved
mix.exs Outdated Show resolved Hide resolved
test/client_test.exs Outdated Show resolved Hide resolved
lib/mllp/client.ex Outdated Show resolved Hide resolved
@mfos239
Copy link

mfos239 commented Jul 16, 2023

Confirmed the latest changes fix this issue #65 (comment)
Everything seems to be working great on this end. Thanks so much @bokner and @starbelly!

lib/mllp/client.ex Outdated Show resolved Hide resolved
lib/mllp/client.ex Outdated Show resolved Hide resolved
end

defp reply_to_caller(reply, %{caller: caller, context: context} = data) do
caller && :gen_statem.reply(caller, format_reply(reply, context))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid the boolean logic here, in fact if we're ending up here when caller is nil, we have a bug somewhere further up and it would be best to crash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid situations where the receiver sends more packets after full MLLP has been received.
It would not be a bug on our side, but rather a receiver misbehaving. Do you think we should crash in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we'd hit the clause at line 577, no? This raises an interesting question none the less. Crashing aside, one place a RST is appropriate is when the the remote sends data that is unexpected. I wonder if we should shutdown in this case? Feels heavy handed, but yeah 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we'd hit the clause at line 577, no?

You are correct, we are probably covered for the "more packets" case. Although I'm not entirely sure, as there could be a race condition with the timer in the "receiving" state. There is actually a more serious reason for checking if we have a caller - the receiver can disconnect at any given time. If it happens while the 'send' call is in progress, we notify the caller, otherwise, we don't (no caller there).

If we want to shut down on the "more packets", the natural place is around line 577 indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not entirely sure, as there could be a race condition with the timer in the "receiving" state.

Not sure, will have to stare a bit more and we'll both have to attempt to put it through the ringer :)

I think shutting down makes sense, I think we can just use maybe_close , we might need to make a modification though (i.e., maybe sometimes we want to do a graceful close and maybe some times an abortive close).

Right now maybe_close will attempt to do a graceful close, in this case it's not what we want. We want :gen_tcp.shutdown(socket, :read_write). This is the same behavior as send_timeout_close: true, which ends up resulting in a sock_select() call that does an abortive close (RST). This should be double checked though.

Speaking of send_timeout_close: true, if we don't have it this set as the default in sock opts we should, the only reason it's false in otp is for backwards compat reasons (though it should be configurable by the user).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the send_timout_close bit was resolved by your last commit @bokner. WE still need to hang up the phone on receive_timeout as well. Basically, per the nature of MLLP and the implicit behaviour when shipping HL7 around, there's may be no recourse in a situation where the receive timed out.

Specifically, the client can either hang up the phone or make assumptions about how the server is implemented. In this regard, both the client and server may end up in a bad state and form an error loop. One example where a receive timeout can result in bad things happening is you peel off part of the response, you time out waiting for the rest. The caller of the client decides to send again, yet there still data in transit (i.e., not in the local buffers and bubbled up to our app yet), the send happens, and now the rest of the ack from a previous response arrives. Meanwhile, the remote might get what you just sent and either accept it, and send back a nack or hang up the phone.

Tricky business 😁

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of dropping the connection on the client side for whatever reason (say, receive_timeout) is that then it would have to reconnect again. Maybe it is a better recourse after all.

I guess the sane client would try to check for connection on any error anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and to add to that, there needs to be options for all of it, that really bothers me, I think having too many options is never great, but sadly since MLLP isn't a much of a protocol, the behaviour is going to differ from system to system. All we can do is try to provide a mostly sane set of defaults that cover most cases. If I go by the "protocol" and how tcp works, it leads me here to these decisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tell ya what, let's not agonize over this right now. We're not at 1.0, code is easy to change, etc. Let's get test passing, merge, and take it from there.

bokner and others added 5 commits July 16, 2023 15:19
Co-authored-by: Bryan Paxton <39971740+starbelly@users.noreply.github.com>
Co-authored-by: Bryan Paxton <39971740+starbelly@users.noreply.github.com>
Copy link
Contributor

@starbelly starbelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! ❤️🧡💛💚💙💜

@starbelly starbelly merged commit 0727b47 into HCA-Healthcare:main Jul 18, 2023
13 checks passed
bokner added a commit to bokner/elixir-mllp that referenced this pull request Jul 18, 2023
* hca/main:
  Active socket, implementation with state machine (HCA-Healthcare#68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants