Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use libuv for I/O #69

Merged
merged 51 commits into from
Jul 1, 2016
Merged

Use libuv for I/O #69

merged 51 commits into from
Jul 1, 2016

Conversation

djs55
Copy link
Collaborator

@djs55 djs55 commented Jun 30, 2016

The current Lwt_unix implementation has problems on both Mac and Windows, specifically:

  • on Mac, both the built-in default Lwt_engine and the libev one will use select. Unfortunately even with DARWIN_UNLIMITIED_SELECT we still see EINVAL errors (Lots of connections causes Unix.select to fail with EINVAL #66). Even if this worked it would not perform well with lots of connections.
  • on Windows, the select FD_SETSIZE is 64 and the OCaml Unix.select falls back to a much more complicated emulated code path. The API we should be using for scalable networking on Windows is IO Completion Ports

This PR changes the socket side of the stack to use libuv, the abstraction layer used by node.js which uses IOCP on Windows and kqueue on Mac.

This PR changes all the direct uses of Lwt_unix.file_descr into abstract types Socket.Stream.Tcp.t,Socket.Stream.Udp.tetc. This higher-level interface can then be mapped ontolibuvvia [fdopen/uwt] as well as traditionalLwt_unix.

A side-effect is that duplicated code was coalesced-- in particular there were several places where read and write functions had slightly different error handling.

Another side-effect is adding a Channel on top of the FLOW to avoid reading packet-by-packet. This has increased performance -- some benchmarks to follow.

Note that we must avoid using libraries which use Lwt_unix directly, as these calls will block. Happily most Mirage libraries are already pure in this sense.

The unit tests have been extended to run both the Lwt_unix and Uwt versions. I was able to run nmap tests in containers and observe a stable number of fds (~2000) during the test runs with lsof.

Fixes #66

djs55 added 30 commits June 21, 2016 21:08
Signed-off-by: David Scott <dave.scott@docker.com>
Instead of calling `FLOW.read` and `FLOW.write` directly which map onto
`read(2)` and `write(2)`, layer a buffered Channel on top. This will
allow us to reduce the number of times we make syscalls, for example
we could read several packets in one go if the underlying `FLOW` has
a big enough read buffer.

Signed-off-by: David Scott <dave.scott@docker.com>
Now that we are using a buffered Channel, we can read more bytes at
once from Unix file descriptors. This should reduce the rate at which
we make expensive syscalls.

With a simple iperf test sending from a container and receiving on
the host, this boosts throughput from 1.01 Gbits/sec to 1.33 Gbits/sec
on the Mac.

Signed-off-by: David Scott <dave.scott@docker.com>
Both the Mac and Windows frontends use the same protocol to
request a connection to a given container IP and port, so move
this code to the shared library.

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave@recoil.org>
Rather than use bare `Lwt_unix.send`, `Lwt_unix.recvfrom` this patch
shares the existing code in `Socket.Datagram`. This will make a future
patch to use libuv smaller.

The main difference between the two `recvfrom` uses is that the previous
DNS one would always timeout after 5s, whereas the general UDP NAT
(activated for example if the VM sends DNS to 8.8.8.8) will be listening
for 60s. So if a response is delayed > 5s we might receive it now.
Presumably the same thing happens on the Internet anyway as there is
no `Lwt.cancel` for transmitted UDP packets.

Signed-off-by: David Scott <dave.scott@docker.com>
The generic Unix `SOCK_STREAM` as-a `FLOW` was duplicated: this
patch merges them back together.

Note the error handling in `Socket.Stream` was slightly better: it
cached whethe the fd is closed and returned `Eof` if so.

Signed-off-by: David Scott <dave.scott@docker.com>
Since libuv requires us to use a typed API where Unix domain
sockets, TCP sockets etc are all treated differently, this patch
creates such an API on top of the existing Lwt_unix implementation.

Signed-off-by: David Scott <dave.scott@docker.com>
The current Lwt_unix implementation is the same, but we should
be able to make a libuv compatible version too.

Note this moves the function from vmnet-client to hostnet in order
to hide it behind the abstract interface in hostnet.

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
Since we can't use Lwt_unix directly, this patch hides it behind a
simple `FILES.read_file` interface.

Signed-off-by: David Scott <dave.scott@docker.com>
This removes the remaining hardcoded use of Lwt_unix.sleep.

Signed-off-by: David Scott <dave.scott@docker.com>
This means we should be able to swap out the Lwt_unix implementation
completely for another one.

Signed-off-by: David Scott <dave.scott@docker.com>
This uses
- Uwt.Timer
- Uwt.Pipe (Unix domain sockets and Windows named pipes)
- Uwt.Tcp
- Uwt.Udp

Signed-off-by: David Scott <dave.scott@docker.com>
When using `Lwt_unix` we should use `Lwt_main.run`, when using `Uwt`
we should use `Uwt.Main.run`

Signed-off-by: David Scott <dave.scott@docker.com>
We still default to a select-based event loop. If the --libuv argument
is provided then we switch to libuv instead.

Signed-off-by: David Scott <dave.scott@docker.com>
This is already the case for the Windows frontend. This makes testing
easier.

Signed-off-by: David Scott <dave.scott@docker.com>
We're not using the mirage-unix "platform" when in Uwt mode. Therefore
we shouldn't use the OS.Time module.

Signed-off-by: David Scott <dave.scott@docker.com>
Previously the module was hardcoded to use Lwt_unix via Flow_lwt_unix.
This API change allows the client to choose the FLOW and TIME
implementations.

Signed-off-by: David Scott <dave.scott@docker.com>
Also known as 879b0fc68d2d44f5c6317f033ac50ffcc3d4277f

The backport is here:

djs55/mirage-tcpip@0da78c9

In particular this avoids `Console.log_s` which won't work if we're using
`libuv` (or anything other than `Lwt_main.run` and `Lwt_unix`)

Signed-off-by: David Scott <dave.scott@docker.com>
Note that Host_uwt already supports these via libuv.

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
This is mostly straightforward, but it did require extending the
`Host.Main` interface with `run_in_main` so the hyper-V sockets
implementation can use the correct function.

Signed-off-by: David Scott <dave.scott@docker.com>
We run the unit tests with both Lwt_unix and Uwt.

Signed-off-by: David Scott <dave.scott@docker.com>
This is needed because libuv is not in cygwin.

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
mirage-console.1.1.0 has a half-close fix, and uwt.999 has a modified
dependency: rather than depending on conf-libuv, it depends on libuv
which actually builds libuv.

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
Although these are referenced in the hostnet library they aren't used
and so probably aren't really included, but they are needed.

Signed-off-by: David Scott <dave.scott@docker.com>
Since libuv seems to work, let's use it to work around the problems
with select (notably: EINVAL with large fd sets)

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
@@ -21,149 +21,76 @@ let default d = function None -> d | Some x -> x
let ethernet_serviceid = "30D48B34-7D27-4B0B-AAAF-BBBED334DD59"
let ports_serviceid = "0B95756A-9985-48AD-9470-78E060895BE7"

(* These could be shared with datakit. Perhaps they should live in mirage/conduit? *)
let hvsock_addr_of_uri ~default_serviceid uri =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this can indeed live in Conduit as a custom registration

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave@recoil.org>
We were already in an Lwt.t context, so Lwt.fail is more appropriate.

Signed-off-by: David Scott <dave.scott@docker.com>
This reverts commit 6d36ff1.

Signed-off-by: David Scott <dave.scott@docker.com>
This reverts commit f19ad8d.

Signed-off-by: David Scott <dave.scott@docker.com>
Now that we depend on libuv, we must include it in our package.

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
This is needed to respect the MACOS_DEPLOYMENT_TARGET

Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
Signed-off-by: David Scott <dave.scott@docker.com>
@djs55
Copy link
Collaborator Author

djs55 commented Jul 1, 2016

Regression tests are looking good -- let's try to include this in Docker for Mac/Windows soon.

@djs55 djs55 merged commit 4cf14ab into moby:master Jul 1, 2016
@djs55 djs55 deleted the use-buffered-channels branch July 1, 2016 10:15
@yomimono
Copy link
Contributor

yomimono commented Jul 1, 2016

Belated LGTM :)

@djs55
Copy link
Collaborator Author

djs55 commented Jul 1, 2016

Thanks :) (sorry for the rush)

@avsm
Copy link
Collaborator

avsm commented Jul 1, 2016

lgtm

avsm pushed a commit to avsm/vpnkit that referenced this pull request May 5, 2017
Metadata support (executable files and symlinks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants