Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows: use SetFileCompletionNotificationModes for TCP sockets (#476) #520

Merged
merged 1 commit into from
Jan 20, 2017
Merged

Conversation

steffengy
Copy link
Contributor

@steffengy steffengy commented Jan 19, 2017

SetFileCompletionNotificationModes allows us to tell the kernel not to enqueue the completion message, if it returns SUCCESS immediately.

This removes the CPU-overhead of having to enter the event loop to handle this completion message.

Less memory overhead

You do not need as much internal (or external) buffering,
which can be quite significant, especially in a context where the event loop is only run after buffering the content in an external buffer (up to 3x less, 512 MB insteadof 1.5GB for my case).
In that regard it also behaves similar to the POSIX (epoll) implementation:
You should be able to repeatedly write to the underlying IO, which will still allow it to progress, without requiring any event loop interaction (similar as epoll, which returns eagain, miow returns true).

general issues, some of them already mentioned in the original issue:

  • The API is not supported on Windows XP (unsure it this needs additional attention?)
  • We might want to check for non-IFS LSPs , since if one of them is installed, SetFileCompletionNotificationModes will not work properly if such is installed.
    Golang does perform this check, while libuv doesn't seem to.
    (This is probably because non-IFS LSPs are deprecated for ages)

some issues or rather hurdles using it with UDP-sockets:

  • There seems to be a bug in the underlying WSARecv implementation (which according to this blogpost which is a summary of the issue and links to the original MSDN post, where Microsoft confirmed the issue in the comment section.
    Golang disables UDP because of this, while LIBUV uses a (slightly limited) workaround.
    .NET Core disables it for <= Windows 7.
  • Also the MIO api seems slightly different, which should not be any issue, but for the sake of completeness I still list it here.

This is the main reason this implements this for TCP sockets first.

@nayato
Copy link

nayato commented Jan 19, 2017

@steffengy FYI, recently merged in .NET Core: https://github.com/dotnet/corefx/pull/15141/files#diff-6c4b087518282ebcba0a74cb9bd073c3R22 (search for CheckIfPlatformHasUdpIssue)
The check is being done for Windows version if socket type is UDP.

Copy link
Contributor

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks! A few points:

  • Would it be possible to add a test for this? A windows-specific test is fine.
  • I think it's ok to not explicitly support XP just yet. I don't believe mio ever has, libstd only barely does, and it should be easy to add back in later.
  • I also think it's ok to skip the non-IFS LSP (not that I even know what it is) as it should be easy to add in later if need be and otherwise if it's deprecated let's just wait till we see it.
  • It's ok to skip UDP for now and focus on TCP.

@@ -397,6 +405,10 @@ impl StreamImp {
self.inner.socket.write_overlapped(&buf[pos..], overlapped)
};
match err {
Ok(true) if me.instant_notify => {
me.write = State::Empty;
self.add_readiness(me, Ready::writable());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think we should check/verify the number of bytes written, right?

Copy link
Contributor Author

@steffengy steffengy Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that'll need modification to MIOW though, since we'll need to pass lpNumberOfBytesSend to WSASend in write_overlapped.

A pointer to the number, in bytes, sent by this call if the I/O operation completes immediately.
Use NULL for this parameter if the lpOverlapped parameter is not NULL to avoid potentially erroneous results.

Golang and LIBUV seem to use this, which is surprising since the errorneous results doesn't sound good.

Or did you have something else in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to get that through WSAGetOverlappedResult, right? That is currently exposed in miow as of recently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, seems to work.
Still feels slightly slower (might be worth to investigate it that's just a placebo or if doing
it like libuv/go is advantageous performance-wise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benchmark you're using? I wouldn't mind trying to poke around it in my spare time to see what's up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately the code I'm currently using is to slow for cargo bench to output
an average. (~6 seconds/iteration which writes 1GB).
Since this is implemented now, the next step for me would be to profile it and get it hopefully much faster.

What you might do is bench the test I just added (maybe with writing more data tough) since that should be sufficient to see a difference?

Copy link
Contributor Author

@steffengy steffengy Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton
did some benchmarking to simply deterine the overhead of the .result call.
code: https://gist.github.com/steffengy/598350018d4827887b287ed0ea798850

mio/tcp.rs:

-self.inner.socket.result(overlapped).map(Some)
+Ok(Some((buf.len() - pos, 3u32)))

results (with .result call):

1,943,652 ns/iter (+/- 525,906)
2,009,193 ns/iter (+/- 1,153,769)
1,943,501 ns/iter (+/- 565,776)

without the .result call:

1,753,306 ns/iter (+/- 1,052,692)
1,724,071 ns/iter (+/- 589,609)
1,792,201 ns/iter (+/- 537,950)

EDIT: miow patch steffengy/miow@a37c31d

@steffengy
Copy link
Contributor Author

I'm also not quite sure how to add an explicit test for this, since by default
it should simply be used for everything in test_tcp?
(Maybe it would make sense to run with and without, but I don't quite see how you'd feasily implement that in those tests)

@alexcrichton
Copy link
Contributor

Oh yeah not so much for correctness (as you're right existing tcp tests take care of that) but more a test of right after a write you can write again.

@steffengy
Copy link
Contributor Author

@alexcrichton
Test case is added and verified to only work with these changes.

self.add_readiness(me, Ready::writable());
State::Empty
} else {
State::Pending((buf, pos + transfered_bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory if this happens (which I'm still not 100% sure that it actually can) I think we need to loop here, right? Otherwise nothing's scheduled to move this from the pending to the writing state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have my doubts that this is actually a valid case but in theory yes.
I'll add the loop if we have some conclusive benchmark results for the wsagetoverlappedresult stuff (reducing the commit spam a bit by combining those changes, if the benchmarks show they make sense ).

@alexcrichton
Copy link
Contributor

Test case is added and verified to only work with these changes.

Woo thanks!

@alexcrichton
Copy link
Contributor

Figured we could continue the perf discussion out here rather than on the now-hidden thread.

So it looks like you passed a pointer to WSASend, but according to that documentation:

Use NULL for this parameter if the lpOverlapped parameter is not NULL to avoid potentially erroneous results. This parameter can be NULL only if the lpOverlapped parameter is not NULL.

So that makes me think that we've gotta call WSAGetOverlappedResult maybe? I'm not sure that that means "potentially erroneous results" though...

@steffengy
Copy link
Contributor Author

steffengy commented Jan 20, 2017

@alexcrichton
That's what startled me too (I posted it in some comment before).
LibUv
Golang
.NET
.NET Core

So if even Microsoft uses it in it's .NET implementations.
The bottom part of the documentation is the important one:

If an overlapped operation completes immediately, WSASend returns a value of zero and the
lpNumberOfBytesSent parameter is updated with the number of bytes sent.  
If the overlapped operation is successfully initiated and will complete later, WSASend returns 
SOCKET_ERROR and indicates error code WSA_IO_PENDING. 
In this case, lpNumberOfBytesSent is not   updated.  

so it's totally safe for that case. (It only means the value is invalid for WSA_IO_PENDING/not success)

@steffengy
Copy link
Contributor Author

MIOW is refactored to return the size directly, when valid.
So we just saved some more overhead.

Related PR: yoshuawuyts/miow#8

the test case executes 16 writes (16KB), which return WouldBlock with the old
case. so it basically shows the saved roundtrip through the event loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants