Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround to allow setting TCP_NODELAY #19

Merged
merged 7 commits into from Dec 17, 2020
Merged

Conversation

japhb
Copy link
Contributor

@japhb japhb commented Oct 28, 2020

This is the first part (and also the most likely to need iteration) of
the fixes for croservices/cro#129 .

Until MoarVM/MoarVM#1229 is fixed and the
functionality is available directly in rakudo-moar, we need to use a
workaround to allow setting TCP_NODELAY using NativeCall instead.

This commit introduces Cro::TCP::NoDelay, which exports a sub that
enables setting this on any socket that supports .native-descriptor
(which both TCP and TLS async sockets do) on operating systems that
support the setsockopt call.

There are two portability concerns with this commit:

  1. This commit assumes TCP_NODELAY == 1 and int is 32 bits. I'm not
    sure how much of a problem this actually is.
  2. There is no fallback for operating systems (or Raku VMs) that don't
    support calling setsockopt via NativeCall.

Help hereby requested to resolve issue 2, and determine whether issue 1
is a real problem for us.

This is the first part (and also the most likely to need iteration) of
the fixes for croservices/cro#129 .

Until MoarVM/MoarVM#1229 is fixed and the
functionality is available directly in rakudo-moar, we need to use a
workaround to allow setting TCP_NODELAY using NativeCall instead.

This commit introduces Cro::TCP::NoDelay, which exports a sub that
enables setting this on any socket that supports .native-descriptor
(which both TCP and TLS async sockets do) on operating systems that
support the `setsockopt` call.

There are two portability concerns with this commit:

1. This commit assumes TCP_NODELAY == 1 and `int` is 32 bits.  I'm not
   sure how much of a problem this actually is.
2. There is no fallback for operating systems (or Raku VMs) that don't
   support calling `setsockopt` via NativeCall.

Help hereby requested to resolve issue 2, and determine whether issue 1
is a real problem for us.
Since the NativeCall code has not been tested on Windows (I don't have a
Windows system to test with), just silently skip the TCP_NODELAY code on
those distros.  Since the cost of this check could be paid on every socket
setup, cache the slow $*DISTRO.is-win lookup (it's not going to change in
the middle of runtime, afterall).
my $on = CArray[int32].new(1);
my $size = nativesizeof(int32) * $on.elems;

if setsockopt($nd, PROTO_TCP, TCP_NODELAY, $on, $size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does PROTO_TCP come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, from CORE.setting. Alright. Wonder how portable those constants are...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Of course, putting it in here won't make it more portable, so it's the right place to take it from. :-))

Copy link
Member

@jnthn jnthn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right enough overall. I do think it could be good if we could add tests that use the nodelay flag, so that if there are problems no specific platforms, we'll get a failing test, not a problem much later when this is used.

@japhb
Copy link
Contributor Author

japhb commented Nov 20, 2020

Looks right enough overall.

OK, good.

I do think it could be good if we could add tests that use the nodelay flag, so that if there are problems no specific platforms, we'll get a failing test, not a problem much later when this is used.

Fair enough, I'll work on that during my next hack window. :-)

The original tests in this section can easily be reused, so just run them
three times: without the nodelay option, with it off, and with it on.
Run a simple 3-message send/reply sequence for every combination of client
Cro::TCP::Connector and server Cro::TCP::Listener nodelay options (without
the option, with it off, and with it on).

NOTE: This commit includes a (very short) sleep that doesn't seem like it
should be needed -- but if it isn't there, the test code will often deadlock
as soon as the first client message has been emitted.  My best guess as to
what causes this is the way that Cro::Connector.establish() waits for the
connection Promise before setting up the `whenever` that actually relays
messages, and thus that the Cro::TCP::Connector isn't actually ready for
a very short period.  If true however, this seems like a bug; I don't think
user code should be able to deadlock the connector in this way.
... instead of .raku for test messages.
As per jnthn, the current list is:

* Oldest supported version (2018.12, due to .native-descriptor on async sockets)
* All versions used for current Rakudo Star (2020.01, 2020.05.1, 2020.10)
* Latest available version ('latest')
@japhb
Copy link
Contributor Author

japhb commented Dec 14, 2020

OK, looks like I finally have a fully clean Travis with the new tests (and the new list of Rakudo versions to test).

The only thing still bothering me can be seen with the commit message on ba86b31 -- there is a very short sleep needed after Cro::TCP::Connector.establish before sending any messages through it to prevent a deadlock upon sending the first message. This might be a core bug (not caused by my changes but simply exposed by the new tests); details in that commit message.

@jnthn
Copy link
Member

jnthn commented Dec 16, 2020

The only thing still bothering me can be seen with the commit message on ba86b31 -- there is a very short sleep needed after Cro::TCP::Connector.establish before sending any messages through it to prevent a deadlock upon sending the first message. This might be a core bug (not caused by my changes but simply exposed by the new tests); details in that commit message.

There's a race in the test. Establishing the connection happens asynchronously: the process is started at the point we obtain the Channel of responses, and somewhere along the line it taps the $send.Supply. $send is a Supplier; messages are broadcast to all zero or more subscribers, and when the timing is off, there are zero. The neatest fix for this test is to switch to Supplier::Preserving (I've tried, and it helps).

@japhb
Copy link
Contributor Author

japhb commented Dec 16, 2020

For historical value:

Additional discussion regarding the API occurred in #cro. In short, establish is intended as a helper for the common case in which the user code does not need to block on (or otherwise detect) the completion of the connection. If the user code does need to detect this, it should instead use the connect method directly.

@japhb
Copy link
Contributor Author

japhb commented Dec 17, 2020

Huh, that's the first time I've come across this situation -- you've approved, but not merged. What happens next? Is there something else I need to do?

@jnthn jnthn merged commit 7d2bca7 into croservices:master Dec 17, 2020
@jnthn
Copy link
Member

jnthn commented Dec 17, 2020

@japhb At the time I approved it, the Travis CI verdict still wasn't in; I was just waiting on that, in the unlikely event it found something.

@japhb
Copy link
Contributor Author

japhb commented Dec 18, 2020

AH! OK, I understand now. Thanks, jnthn! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants