Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unix: Why not use Unix Domain Sockets for Named Pipes? #14633

Closed
bpschoch opened this issue May 27, 2015 · 15 comments · Fixed by dotnet/corefx#6833
Closed

Unix: Why not use Unix Domain Sockets for Named Pipes? #14633

bpschoch opened this issue May 27, 2015 · 15 comments · Fixed by dotnet/corefx#6833
Assignees
Milestone

Comments

@bpschoch
Copy link
Contributor

System.IO.Pipes map to native windows implementations of anonymous and named pipes.

Windows anonymous are close in implementation to Unix pipes in that they are one way and byte oriented (not datagram or message oriented).

On the other hand, Named pipes can be either full (supporting I/O in both directions) or half (I/O in one direction) and messages can either be byte oriented or message/datagram oriented. Pipe connections can also be made across systems through the network (I believe using SMB)

Unix has named pipes called FiFo's but they are one way only and not message oriented. The current Pipes port for corefx under unix uses FIfo's and therefore is a subset implementation.

Instead of using Fifo's why don't we use Unix Domain Sockets? They support most of the functionality of windows named pipes including full/half duplex, byte or message oriented. They however don't support cross system connections. see overview of Unix Domain Sockets here: http://www.thomasstover.com/uds.html

Ultimately, what is the goal of the corefx library on a different platform? To bind to existing similar functionality? or to provide strict portability of the api's across platforms (what I'm calling emulation)?

If we strictly bind, then there will always be limitations and differences on functionality provided thus decreasing portability. However by binding, we are hooking into existing OS capabilities and can in theory hook up to other non-net apps on the host platform.

If we strictly emulate (e.g. say grab a chunk of shared memory and then using semaphores to strictly emulate named pipes functionality under Unix), then we can be extremely portable however we can't communicate with anything else.

A possible solution is to extend what can be used as a 'pipe name' in the system.io.pipes implementation where the syntax can be extended to give hints as to what to bind to in the underlying implementation but with the default (without any extended syntax) defaulting to an emulated functional solution.

Comments?

@stephentoub
Copy link
Member

@bpschoch, thanks for the suggestion. For the initial implementation, I went with named pipes / FIFOs as the named pipes implementation because a) it would allow for integration with other tools (as you mentioned), and b) it provided a good-enough mapping of most of the features (as you say, not all) and seemed to make logical sense to use the OS' implementation of the mechanism. That said, if you believe that Unix Domain Sockets would yield a better implementation, we'd certainly be open to exploring that; if you're interested, please feel free to submit a PR with such an implementation, and we'll happily take a look.

@bpschoch
Copy link
Contributor Author

I had multiple purposes in making that post if you read between the lines:

  1. Can we do better in implementing pipes functionality in Unix? yes using Unix Domain Sockets
  2. Do we then have a completely portable implementation in Unix? no (i.e. side effects, number of servers, read/write semantics etc)
  3. Start a discussion on portability and what it means (may only be reasonable for pipes but perhaps more)

What do I mean by 'portable'? My purest definition is that I can take a working .net app running on windows and move it to a non-windows environment and have it work (limited by what is provided in the library, there is no forms support because library doesn't support forms).

So that being the case, I should be able to take an app that uses pipes as documented in MSDN for windows and move it a non-windows platform and it should work.

The tensions we have are that other platforms don't support the same underlying functionality so if you want pure portability then the only option is emulation of the functionality. However as I brought up, then it prevents interoperability which then I would argue is no longer an portability issue but an adaptation issue (i.e. things I have to tweak to make it work on an other platform).

So it seems that they are potentially two competing goals: One to provide a portable implementation that works without changing the app; Two to provide a binding to underlying platforms similar but different implementation.

First is this even a valid concern?
If yes, then what are possible approaches?

  1. 'decorate' the pipename field in the constructor with stuff to indicate what is desired. If taking that approach, I would argue that a plain name, we support pure emulation and decorations would signal the appropriate underlying bindings. e.g. pipe:pipe name socket:pipe name
  2. Add new pipe options enum to indicate what is desired
  3. Add static property somewhere to indicate what is desired.

and yes I can do an implementation but wanted to get through this discussion first as it may affect the implementation.

Bernie

@stephentoub
Copy link
Member

If we can achieve better portability, we should do so; I believe that should be the goal first-and-foremost, getting the best possible behavior/perf/etc. that matches the existing .NET APIs. if we have a choice between two mechanism, both of which have equivalent portability but one of which uses the "more appropriate" underlying mechanism, then we should use that one of course, but if we can emulate the functionality better on top of a different underlying mechanism, we should do so. It's then just question of whether we can actually do it better; that's not just a question of behavior, but also of performance, reliability, etc. I don't think we should have two completely separate underlying implementations and switch between them based on some user request; we could consider something as a separate feature in the distant future, but for now we should pick one implementation and go with it.

@bpschoch
Copy link
Contributor Author

bpschoch commented Jun 5, 2015

I'm researching the possibility of using shared memory to fully implement named pipes (except for cross machine). My research includes trying to understand the exact semantics of how named pipes work (e.g. when things get blocked etc) and I'm a bit confused about input and output buffers. I found this information here (see remarks section): https://msdn.microsoft.com/en-us/library/windows/desktop/aa365150%28v=vs.85%29.aspx describing how output buffers work, but I'm not sure the relevance of input buffers. It seems that when you read, if there is nothing in the output buffer and you wait until there is. Perhaps one of you on the 'inside' could look at the win32 implementation of named pipes to find the semantics around the input buffers for named pipes.

Thanks
Bernie

@stephentoub
Copy link
Member

@bpschoch, checking in on this. Is this something you're still investigating?

@bpschoch
Copy link
Contributor Author

well, it got put on hold. was waiting on some feedback on how the buffers worked. Also was finishing up the re-factoring of the pipes tasks (which I got sidetracked on), but doing pr now.

@bpschoch
Copy link
Contributor Author

bpschoch commented Aug 3, 2015

Named Pipes Overview (part 1)

Core Classes

Name Pipes in System.IO.Pipes are separated into two classes: one for client use and one for server use.
Basically, an app that will be the server, instantiates a NamedPipeServerStream class and can specify the following information:

  • Pipe name as a string
  • enum Pipe direction e.g. In or Out (one-way) or InOut (two way)
  • int maximum number of server instances (or a special constant for no limit)
  • enum Transmission mode e.g. byte message oriented or message oriented (1 write equals 1 read)
  • enum PipeOptions: async or not; write through bypass cache
  • pair of ints to specify recommend buffer sizes for input and output buffer sizes

There is also a special case where you can pass in an existing SafePipeHandle.

An app that will be a client, instantiates a NamedPipeClientStream class and can specify the following information:

  • Pipe name as a string
  • optional server name for cross system pipes
  • enum Pipe direction e.g. In or Out (one-way) or InOut (two way)
  • enum PipeOptions: async or not; write through bypass cache
  • Impersonation level (what to allow the server to find out/impersonate about the client)

As is the case with the server, the client also a special case where you can pass in an existing SafePipeHandle.

Operations

Basically a server is the first to start and create a NamedPipeServerStream and specifying all the options that describes the operation of the pipe. The server than calls on of the WaitForConnect() method variants to pause until a client connects. A client connects to the server by creating a NamedPipeClientStream and calling one of the Connect() variants. When both of these calls are complete, a connect is made and the two side can communicate using the Read() Write() variants.

Both client and server versions of named pipes have various properties to examine and in some cases reset pipe properties. One example is that a client can fetch the number of server instances on the pipe.

From what I understand, you need to create additional NamedPipeServerStream instances for each simultaneous client that needs to be connected. The additional NamedPipeServerStream specify the same pipe name and options.

Summary

Named Pipes support both simplex (one-way) or duplex (two-way) communications between client and server using both byte or message oriented boundaries. There is a difference in operations between a client and a server in that clients connect to servers while servers wait for connections from clients. Named Pipes also support the ability to reach out across the network to servers on other systems using SMB protocols.

@bpschoch bpschoch closed this as completed Aug 3, 2015
@bpschoch
Copy link
Contributor Author

bpschoch commented Aug 5, 2015

Named Pipes Implementation Matrix (part 2)

Named Pipe Feature Win32 *nix FIfo *nix Domain Sockets Shared Memory Emulation
One-way (In or Out) Y Y Y
Two-way (InOut) Y N Y
Byte - oriented Y Y Y
Msg - oriented Y N Y
True Async support Y N Y
Specify Buffer Sizes Y Y Y
Fetch Buffer Sizes Y Y*2 Y
Support # of Servers Y N Y
Fetch # of Servers Y N Y
WaitForPipeToDrain Y N Y
Impersonation Support Y N N
Network Support Y N*1 N*1 N*1

*1 can be implemented via proxy process plus client code to access proxy
*2 Only out buffer size and not on OSX

(NOTE: in progress)

@bpschoch bpschoch reopened this Aug 5, 2015
@bpschoch
Copy link
Contributor Author

bpschoch commented Aug 5, 2015

Named Pipes; Shared Memory Implementation

Conceptual Idea

A shared memory segment would be obtained to transact sending information back and forth on the 'pipe'. A *nix semaphore would be used for inter-process synchronization.
The base of the memory segment would contain the core configuration information for the 'pipe' e.g. number of servers, buffer sizes, pipe properties. Following this base portion would be an array of memory for each potential server. The base of memory for each server would contain semaphore information and the logical state of the pipe e.g. what's queued etc.
Because the raw elements of communications are all in the shared memory, most of the capabilities of Named Pipes can be emulated properly.
Now because pipes have names, something needs to map a name to the shared memory which could be a file name under a special directory.

@bpschoch
Copy link
Contributor Author

bpschoch commented Aug 5, 2015

@stephentoub comments?

@stephentoub
Copy link
Member

Interesting idea. I have concerns around lifetime management and how that would work, since many of the concepts you're describing are not cleaned up when all open descriptors go away, when processes die, etc. With the current implementation, worst case is we're left with a temporary FIFO file on disk that doesn't contain any interesting data. But I believe with the approach you describe, we could end up in a situation where state from one run ends up persisting unintentionally to subsequent runs. I think we'd really need to see a strong proof-of-concept before switching to such an approach.

@bpschoch
Copy link
Contributor Author

My first goal was see if the functionality can be implemented (assuming portability is important), which I believe it can through shared memory.

I agree the lifetime issues are important. Worst case, a caretaker unix proc can run that supervises what's going on and can do any cleanup. The obviously question that then comes to mind is what happens if the caretaker proc goes away and yes that is a problem but could be mitigated by having the users of required supervised resources (e.g. shared memory and/or semaphores in this case) make sure that a copy of the caretaker is always running. BTW such a caretaker proc could also deal with other such issues the library may need.

Ok then on to a proof of concept:
But there are some questions to do more research as the *nix versions have multiple shared memory and semaphore implementation each with pro's and cons. It's been a few years since I was a hard hitting *nix programmer so I need to catch up some and examine some of these trade-offs.

It also would be nice to ultimately implement the solution on the .NET library side vs C side but then the appropriate 'binding' libraries need to be implemented say for semaphores etc. Do you think that this would be a good approach vs doing all the emulation on the C side of things?

Also what do you think of a caretaker proc in general? It would only startup on demand when something the library required supervision.

@stephentoub
Copy link
Member

Also what do you think of a caretaker proc in general?

I personally think we very much want to avoid needing something like that.

@Mart-Bogdan
Copy link

@bpschoch there are one BIG problem with shared memory approach:

Anyone can connect to that shared region of RAM and mess with it's structure, what isn't possible with kernel based pipes/whatever. That's security hole :-( .

One change to your matrix:
regarding Impersonation and sockets there should be Y*3

Domain sockets support getting caller process user id/group id, thou only root can "impersonate" other user's UID/GID.

So full support isn't possible, but i guess only way to make impersonation in .net is by using P/Invoke and handle, or sepver pipe has such method?

@danmoseley
Copy link
Member

For those interested: general domain sockets support was added dotnet/corefx#25246

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 1.0.0-rc2 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants