Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client/server networking abstractions #10308

Closed
ReubenBond opened this issue May 16, 2019 · 54 comments · Fixed by #10321
Closed

Client/server networking abstractions #10308

ReubenBond opened this issue May 16, 2019 · 54 comments · Fixed by #10321
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one

Comments

@ReubenBond
Copy link
Member

ReubenBond commented May 16, 2019

Epic #10869

The goal of this issue is to specify abstractions which the .NET community can use to build robust, correct, high-performance client & server applications. This is a follow-up to #4772 (Project Bedrock) and is largely a summary of a discussion with @davidfowl, @anurse, @halter73, @shirhatti, & @jkotalik.

Initiating & accepting connections

Clients initiate connections to an endpoint using IConnectionFactory. Servers bind to an endpoint using IConnectionListenerFactory and then accept incoming connections using the IConnectionListener returned at binding time.

public abstract class ConnectionContext : IAsyncDisposable
{
  // unchanged except for IAsyncDisposable base
}

public interface IConnectionListenerFactory
{
    ValueTask<IConnectionListener> BindAsync(System.Net.EndPoint endpoint, CancellationToken cancellationToken = default);
}

public interface IConnectionListener : IAsyncDisposable
{
    EndPoint EndPoint { get; }
    ValueTask<ConnectionContext> AcceptAsync(CancellationToken cancellationToken = default);
    ValueTask UnbindAsync(CancellationToken cancellationToken = default);
}

public interface IConnectionFactory
{
    ValueTask<ConnectionContext> ConnectAsync(System.Net.EndPoint endpoint, CancellationToken cancellationToken = default);
}

A framework might want to support multiple different transports, such as file + socket. In that case, it can use multiple IConnectionListenerFactory implementations.

The shared abstractions should not dictate how these factories are are specified or configured: frameworks should continue to follow a framework-specific builder pattern. Eg, KestrelServerOptions has List<ListenOptions> where each entry is a different endpoint to listen on + middleware to handle connections to that endpoint.

Both IConnectionListenerFactory.BindAsync & IConnectionFactory.ConnectAsync accept a System.Net.EndPoint to bind/connect to. The BCL has EndPoint implementations for IPEndPoint, DnsEndPoint, and UnixDomainSocketEndPoint. Users can add new transports which accept custom sub-classes of EndPoint, eg MyOverlayNetworkEndPoint.

Servers bind to endpoints using IConnectionListenerFactory, resulting in a IConnectionListener. The listener's AcceptAsync() is then called in a loop, each time resulting in a ConnectionContext instance which represent a newly accepted connection. When the server wants to stop accepting new connections, the listener is disposed by calling listener.DisposeAsync(). Connections which were previously accepted continue to operate until each is individually terminated.

For implementations of IConnectionListener which are backed by a callback system (eg, libuv), decoupling the lifetime of IConnectionListener and the (many) resulting ConnectionContexts poses a challenge. We need to consider whether or not this minimal interface allows graceful shutdown (including draining connections) to still be cleanly implemented in those cases.

Handling connections

The client/server decides how it handles the ConnectionContext. Typically it will be decorated (eg, via ConnectionDelegate) and pumped until the connection is terminated. Connections can be terminated by calling DisposeAsync() (from IAsyncDisposable) but can also be terminated out-of-band (eg, because the remote endpoint terminated it). In either case, clients/servers can subscribe to the ConnectionContext's IConnectionLifetimeFeature feature to learn about termination and take necessary action (eg, cleanup).

Connection metadata

Currently, IHttpConnectionFeature is used to expose local/remote endpoint information. We will add a new feature, IConnectionEndPointFeature as a uniform way to access this information.

public interface IConnectionEndPointFeature
{
  EndPoint RemoteEndPoint { get; set; }
  EndPoint LocalEndPoint { get; set; }
}
@halter73
Copy link
Member

Thanks!

@davidfowl and I discussed this. We think IConnectionListener.DisposeAsync() should either wait for all active connections to close naturally, or abort all the connections.

If we go the abort route, we'll probably need something similar to IConnectionListener.StopListening() to tell transports to close the listen socket. If we let the connections close naturally, we can probably just close the listen socket at the start of DisposeAsync().

@davidfowl davidfowl added the enhancement This issue represents an ask for new feature or an enhancement to an existing one label May 17, 2019
@davidfowl
Copy link
Member

davidfowl commented May 17, 2019

I'm going to spike a prototype using these new APIs, I already have some design feedback from a hack session @halter73 and I did today but i'll try to make kestrel use it.

cc @tmds @mgravell

@tmds
Copy link
Member

tmds commented May 17, 2019

Great stuff!

ValueTask ConnectAsync(System.Net.EndPoint endpoint);
ValueTask AcceptAsync();

Perhaps it makes sense to add a CancellationToken to these methods.
ConnectAsync can take a long time.
AcceptAsync, maybe not needed, IAsyncDisposable kind of acts as the CancellationToken.

@davidfowl and I discussed this. We think IConnectionListener.DisposeAsync() should either wait for all active connections to close naturally, or abort all the connections.

Maybe this can be moved to the server implementation. It simplifies the IConnectionListener implementation because it no longer needs to track the ConnectionContexts.

@mgravell
Copy link
Member

mgravell commented May 17, 2019

Sounds promising (the goal; I haven't studied the specific API in depth), and it'll be a pleasure to be able to mark large chunks of Pipelines.Sockets.Unofficial as [Obsolete]. One big question I would raise early is: what is the assembly/namespace/package story here? since this issue is under aspnet not corefx, I have a sinking feeling... Basically, it is just really really weird if a client library or a desktop console needs to reference aspnetcore to talk to a socket... (whether as a client or a server)

@benaadams
Copy link
Member

benaadams commented May 17, 2019

it is just really really weird if a client library or a desktop console needs to reference aspnetcore to talk to a socket...

AsyncSocketPower .NET ?

@davidfowl
Copy link
Member

Sounds promising (the goal; I haven't studied the specific API in depth), and it'll be a pleasure to be able to mark large chunks of Pipelines.Sockets.Unofficial as [Obsolete]. One big question I would raise early is: what is the assembly/namespace/package story here? since this issue is under aspnet not corefx, I have a sinking feeling... Basically, it is just really really weird if a client library or a desktop console needs to reference aspnetcore to talk to a socket... (whether as a client or a server)

TL;DR, this design basically introduces a transport abstraction and makes it a bit more decoupled so it can be used standalone (without kestrel or the pipeline). Kestrel and SignalR become a consumer of this new API (like a pluggable TcpListener and TcpClient).

I see a couple of options around naming:

  • We keep the ASP.NET Core naming for both client and server and keep the connection abstractions package that existed in ASP.NET Core 2.1.
  • We make a big breaking change in 3.0 and introduce new naming (Microsoft.Extensions.Server* or System.Net.*) and maybe we even make a bigger breaking change to move the feature collection to a new assembly (or we make a new one).

Depends on how bad people feel about the current naming (I know how important it is) but the big picture here is that we want to create an ecosystem around these things, of middleware, servers, parsers ETC. I don't think the abstractions belongs in corefx because this is a tiny layer above that with more opinions. If I had to draw a comparison, I would say this is the ASP.NET Core but for the communication layer below not tied to a specific transport.

cc @DamianEdwards if he has any ideas

@mgravell
Copy link
Member

maybe it is neither aspnet nor corefx?

corefx <=== (some comms naming thing) <=== aspnet

but: I appreciate that this may be impossible at this point. It is just... a bit messy. But coupling too tightly to aspnet also has impact re transitive dependency graph explosion, etc.

@davidfowl
Copy link
Member

Updated the proposal based on the prototype:

  • Added CancellationToken to ConnectAsync (we can discuss AcceptAsync as well)
  • Added EndPoint to IConnectionListener

@ReubenBond
Copy link
Member Author

ReubenBond commented May 17, 2019

@halter73

@davidfowl and I discussed this. We think IConnectionListener.DisposeAsync() should either wait for all active connections to close naturally, or abort all the connections.

Either way, this suggests that IConnectionListener owns the connections returned from AcceptAsync. There's nothing to stop a framework from tracking connections or even associating them with the IConnectionListener which they came from (if any). The requirement seems onerous, though.

Currently LibuvTransport owns the LibuvThread. Each LibuvConnection gets a reference to one of those LibuvThreads. Would reference counting be insufficient to ensure that each of those LibuvThreads are stopped when they're no longer needed either by the listener or one of the connections?
I might still be misunderstanding.

EDIT: Thinking from Orleans perspective, we would need some internal tracking for both incoming & outgoing connections anyway, for when a node is evicted (declared dead) and we must forcibly terminate any connections with it.

@tmds
Copy link
Member

tmds commented May 17, 2019

Would reference counting be insufficient to ensure that each of those LibuvThreads are stopped when they're no longer needed either by the listener or one of the connections?

It's possible. It is probably simpler to keep them around, and then use them again when needed.

@halter73
Copy link
Member

halter73 commented May 17, 2019

I do want the LibuvThreads to get cleaned up before IConnectionListener.DisposeAsync/StopAsync completes. I think we can do this through a combination of reference counting the active connections and closing the listen socket.

@davidfowl
Copy link
Member

davidfowl commented May 19, 2019

I played with shutdown for a little and it seems like the simplest thing to do would be to implement graceful shutdown in the transports.

public interface IConnectionListener : IAsyncDisposable
{
    EndPoint EndPoint { get; }
    ValueTask<ConnectionContext> AcceptAsync();
    ValueTask StopAsync(CancellationToken cancellationToken = default);
}

The token is the signal for how long StopAsync waits before to tearing all of the connections and state down.

This makes it so that it's easy to do graceful shutdown without a server like kestrel in the middle but you need to call StopAsync with a token do do graceful shutdown of a specified timeout. DisposeAsync could pick a default timeout as well (based on transport specifics)

@davidfowl
Copy link
Member

An alternative design is to pass a CancellationToken to AcceptAsync, this would be the signal to stop accepting new connections and graceful shutdown would be up to the caller. I think we should try to push that logic into the transports since it's so common.

@tmds
Copy link
Member

tmds commented May 20, 2019

I do want the LibuvThreads to get cleaned up before IConnectionListener.DisposeAsync/StopAsync completes. I think we can do this through a combination of reference counting the active connections and closing the listen socket.

Multiple listeners should be able to use the same LibuvThreads.
Those LibuvThreads could get cleaned up when there are no more listeners and connections, but such a cleanup isn't functionally needed

it seems like the simplest thing to do would be to implement graceful shutdown in the transports.

The complexity is the same whether it is implemented in the transport or the server: track the connections and dispose them.

I think we should try to push that logic into the transports since it's so common.

My feeling is this is making the transport more complex. It needs to track connections per listener. If this is in the server, it can be common for different transports.
At the network level there is no relation between lifetime of the accepted sockets and their listener socket, so it seems weird the ConnectionListener does that.

@davidfowl
Copy link
Member

The complexity is the same whether it is implemented in the transport or the server: track the connections and dispose them

There’ll be an order of magnitude less transport implementors than consumers of the API. Ideally it should be possible to use the transport APIs directly and still implement graceful shutdown somewhat trivially. That means, gracefully waiting for connections to close (for some threshold), then aborting them if that fails. Do we want that to be in everyone’s logic instead of providing helpers that transports can use to implement it?

@tmds
Copy link
Member

tmds commented May 20, 2019

Do we want that to be in everyone’s logic instead of providing helpers that transports can use to implement it?

Wanting to provide this as a re-usable piece of logic doesn't mean it has to be part of the listener.

It seems better separation of responsibility. The listener stays close to a listen socket. Tracking and disposing connections is responsibility of component calling Accept. There is no functional reason the listener needs to take this responsibility.

@davidfowl
Copy link
Member

It seems better separation of responsibility. The listener stays close to a listen socket. Tracking and disposing connections is responsibility of component calling Accept. There is no functional reason the listener needs to take this responsibility.

While that's completely reasonable, it makes the listener extremely hard to use standalone. In order to properly shutdown, the owner of the accept loop has to know when all of the accepted connections have completed before calling DisposeAsync on the listener. Remember our transports currently manages their own memory and it would be nice if the caller didn't have to do that to avoid ugly unexpected errors (or worse, use after free bugs!)

I'd like this to be possible:

https://github.com/aspnet/AspNetCore/blob/113f97bb2eff3ad57779c3ff893e2851202dda3c/src/Servers/Kestrel/samples/PlaintextApp/Startup.cs#L55-L90

The alternative is to expose a connection manager API that lives somewhere and needs to be used in conjunction with a listener in order to use it properly.

@ReubenBond
Copy link
Member Author

In systems containing both clients and servers (or nodes which are both, eg Orleans silos), there's going to be some kind of connection tracking anyway.

Eg, Orleans actively closes connections to nodes which are declared dead (voted out by the cluster because they are not functioning). In that case the listener continues operating but some subset of connections are terminated. Clients and servers both do this, so both need some form of connection tracking.

The differences between incoming & outgoing connections largely disappear once the connection is established, so they should be treated similarly (if not identically.)

Generally it's better for the process to terminate abruptly, ungracefully killing open connections, rather than to hang indefinitely because some connection remains open because the developer messed up their connection tracking. This is one of the reasons that I prefer the listener does not block until connections all close.

Maybe this potential pitfall is minor in comparison to the potential simplicity gained by having additional connection tracking in the listener's contract (and presumably the connection factory's contract?)

@halter73
Copy link
Member

halter73 commented May 20, 2019

There should be a convenient way to close the listener in a way that tells it to stop accepting new connections and at least tries to wait for all the already-accepted connections to be closed gracefully (perhaps with some timeout). This wouldn't only mean waiting for the sockets to close, but it would also mean waiting for DisposeAsync to be called on all the accepted ConnectionContexts as a means to ensure the app is done with all the pipes and therefore also the memory pool. Otherwise, like @davidfowl mentions, we're opening app developers up to potentially very subtle use-after-free bugs.

I think it would also be nice to have a way to tell the lister to immediately abort all of its open connections/sockets. I think even this API should attempt to wait a small period of time for all the app code to react to the aborted connections and dispose all the accepted ConnectionContexts once again in order to try to avoid use-after-free issues. This API should throw if the timeout is exceeded to indicate that the memory pool isn't safe to use. Technically, if the app developer tracks all the ConnectionContexts themselves, they could individually abort each connection. So this second API is strictly necessary, but nice to have.

@tmds
Copy link
Member

tmds commented May 21, 2019

This wouldn't only mean waiting for the sockets to close, but it would also mean waiting for DisposeAsync to be called on all the accepted ConnectionContexts as a means to ensure the app is done with all the pipes and therefore also the memory pool.

Do you want to make the lifetime of the memorypool match the lifetime of the listener?
Even without listeners you need a memorypool to make out-bound connections.

@davidfowl
Copy link
Member

Do you want to make the lifetime of the memorypool match the lifetime of the listener?

Yes.

Even without listeners you need a memorypool to make out-bound connections.

Not sure what you mean here. Listeners don't have outbound connections. You might be referring to the PR where we I currently implement the IClientFactory on the SocketTransportFactory as well?

In theory, you can ref count connections to avoid disposing the memory pool too early (but then why not implement the entire connection tracking to simplify the logic).

I think we want to be in a place where this logic doesn't need to be done by consumers of the API since that makes it really hard to use.

@tmds
Copy link
Member

tmds commented May 21, 2019

Not sure what you mean here.

IConnectionFactory connections are not tied to a listener lifetime.

In theory, you can ref count connections to avoid disposing the memory pool too early

Another possibility is that the memory pool stays around (and maybe reduces size based on memory pressure).

@davidfowl
Copy link
Member

davidfowl commented May 21, 2019

IConnectionFactory connections are not tied to a listener lifetime.

See what I posted, the listener and IConnectionFactory are decoupled. They happened to be implemented on the same PR but it's a draft so that's completely unrelated. Pretend they are different objects, the IConnectionFactory could

Another possibility is that the memory pool stays around (and maybe reduces size based on memory pressure).

Yeah but that's complicated and we don't need to complicated things here. Reference counting would be a much easier solution to implement (track connection ConnectAsync/DisposeAsync). Though, we didn't make the IConnectionFactory implement IAsyncDisposable (maybe it should)

@halter73
Copy link
Member

After considering this more, I do not think making the listener's dispose method wait for all its accepted connections to be disposed first is really necessary to avoid use-after-free bugs. After all, whoever rents memory from the pool has to explicitly return or dereference the memory for the memory to go back to the pool. However, I do think this would help developers avoid ODEs that could arise from disposing the pool after immediately disposing the listener.

If developers see that the listener's dispose isn't completing, that should be a sign that they haven't cleaned everything up. Having the listener's dispose throw an exception after a certain timeout could make it even more clear to the developers what's going wrong, but I'm concerned that such a timeout wouldn't work well if you're trying to implement IServer.StopAsync using an IConnectionListener. The problem is the CancellationToken passed into StopAsync can stay uncanceled for arbitrarily long periods of time, and until that token is canceled we don't want to start "ungraceful" shutdown and dispose accepted connections. Having IConnectionListener.DisposeAsync() throw before the token is canceled wouldn't be good.

Ultimately, I think we might need two different methods for gracefully vs. ungracefully closing the listener. DisposeAsync() could be the ungraceful method, and we could add a new graceful method we'll call StopAcceptingNewConnections() for now. Developers who want to support graceful shutdown could call StopAcceptingNewConnections() and wait until they close all of their already-accepted connections before calling DisposeAsync().

Why do we want a StopAcceptingNewConnections() method when you can simply stop calling AcceptAsync()? The main reason for this is that we want to close the listen socket ASAP so other servers can start listening on the endpoint and the listen backlog doesn't fill up with connections the server will never accept.

Alternatively, instead of adding a StopAcceptingNewConnections() method, we might choose to add a StopAsync(CancellationToken) method like the one on IServer.

Shrinking the memory pool is an idea we've been considering for years now, but it's tough to come up with a shrinking strategy that's ideal for all use cases. At a certain point, you might as well use unpooled arrays and let the GC shrink the heap for you.

@analogrelay analogrelay added this to the 3.0.0-preview7 milestone May 21, 2019
@tmds
Copy link
Member

tmds commented May 29, 2019

Thank you for clarifying. The main thing I realized now is how much the example is influenced by the scoping of resources (memory pool/Libuv threads) to the IConnectionListener.
That scope makes it not possible to share the libuv threads between multiple IConnectionListeners and IConnectionFactory.
It also means that any valid use of Memory instances must be within that scope. So Memory can't be passed safely between between Pipes without copying (because their scope may be different).
Is that correct?

@davidfowl
Copy link
Member

That scope makes it not possible to share the libuv threads between multiple IConnectionListeners and IConnectionFactory.

It would require reference counting. We could look at making the factory disposable as well in order to give more flexibility to implementations.

So Memory can't be passed safely between between Pipes without copying (because their scope may be different).
Is that correct?

There’s no feature to do that, with or without listeners.

@tmds
Copy link
Member

tmds commented May 29, 2019

There’s no feature to do that, with or without listeners.

There is a feature request: https://github.com/dotnet/corefx/issues/29022 😄

@clairernovotny
Copy link
Member

Is Multicast being considered in the design for these abstractions? If not, is there a plan for improving multicast support?

@davidfowl
Copy link
Member

No this is purely point to point connection oriented protocols. Nothing is being doing to address connection less scenarios as yet.

@davidfowl davidfowl reopened this May 31, 2019
davidfowl added a commit that referenced this issue May 31, 2019
…#10321)

This is a massive set of changes to Kestrel to remove the existing pubternal transport layer and implement a public facing API for listeners and clients, see the details here #10308.

This change only has the server side pieces of the story as I don't want to add the client APIs without having ported SignalR to use them. Here are the highlights:

- Transport.Abstractions is empty (will be removed in a separate PR as it requires removing it from a ton of places)
- TransportConnection has been moved to Connection.Abstractions (we can decide if we need to consolidate with DefaultConnectionContext in a later PR)
- Added FileHandleEndPoint which allows binding to a file handle (could be a pipe or tcp handle)
ListenOptions has been gutted for most pubternal API and returns various types of binding information . The source of truth is the EndPoint instance.
- Cleaned up a bunch of libuv tests decoupling them from Kestrel.Core

## Breaking changes

- Removing pubternal API is itself a breaking change but one that we already planned to do.
- We've removed the ability to set the scheduling mode on Kestrel
- DisposeAsync was added to ConnectionContext 
- NoDelay was removed from ListenOptions. This has been moved to each of the transports. One major difference though is that it's no longer localized per endpoint but is global. We'd need a derived EndPoint type (or maybe extend IPEndPoint) to store both the socket options and the binding information.
@davidfowl
Copy link
Member

Keeping this open because only the server side has been implemented.

@ReubenBond
Copy link
Member Author

How should we handle the client side implementations? SignalR & Orleans are both good candidates.

I will update Orleans to use the latest version of the server bits & make similar changes to the client bits.

A common-sounding namespace may help adoption (eg Microsoft.Extensions.Connections or System.Net)

@davidfowl
Copy link
Member

How should we handle the client side implementations? SignalR & Orleans are both good candidates.

I planned to do SignalR next. You can totally do orleans with the current APIs.

I will update Orleans to use the latest version of the server bits & make similar changes to the client bits.

That would be good!

A common-sounding namespace may help adoption (eg Microsoft.Extensions.Connections or System.Net)

Agreed but that would be a mega breaking change. Not impossible but we need to discuss it as it's very late in the cycle.

@mgravell
Copy link
Member

Pretty sure I've brought that up every cycle :)

But: if you need all-hands, let me know and I can play with things like SE-Redis (which has mostly client, but also a toy server).

@davidfowl
Copy link
Member

The naming is a super hard break, we can’t just do the client, the server needs to be broken as well. This means we need to break IFeatureCollection as well (or just deal with that dependency?)

@analogrelay
Copy link
Contributor

@davidfowl Is the server work for this done? Is something separate tracking the SignalR work? If so, I'd suggest closing this since there's no more actionable work for servers here.

@davidfowl
Copy link
Member

@davidfowl Is the server work for this done? Is something separate tracking the SignalR work? If so, I'd suggest closing this since there's no more actionable work for servers here.

Yes the server work is done. Can we instead turn this into an EPIC issue and link to other bugs?

@analogrelay
Copy link
Contributor

I guess? Seems overkill.

@analogrelay
Copy link
Contributor

I'll create a new epic. The GitHub events will look weird if I edit @ReubenBond 's text to start linking in other issues.

@ReubenBond
Copy link
Member Author

Fine to edit my issue as you see fit - either way 🙂

@davidfowl
Copy link
Member

Closing this as it's tracking the sever work. The only remaining work here might be a rename.

@tmds
Copy link
Member

tmds commented Sep 3, 2019

Very late feedback:

BindAsync -> ListenAsync would have been a better name. binding is using a local socket address, while listening is starting to accept sockets.

AcceptAsync(CancellationToken cancellationToken = default); -> in retrospect, I don't think we should have added a CancellationToken here, and instead relied on the user calling UnbindAsync, which leads to AcceptAsync returning null at some point.

UnbindAsync -> CloseAsync would have been a better name.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants