Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding optional TLS/encryption to transport channel #828

Closed
veikkoeeva opened this issue Sep 22, 2015 · 55 comments · Fixed by #6035
Closed

Adding optional TLS/encryption to transport channel #828

veikkoeeva opened this issue Sep 22, 2015 · 55 comments · Fixed by #6035
Assignees
Milestone

Comments

@veikkoeeva
Copy link
Contributor

Adding TLS option to client-silo and inter-silo communications would make a great end-to-end encryption promise when combined with encryption between silo and storage. It looks like using TLS in socket connections involves adding a configuration option to settings and then applying it to socket connections. Encryption between silos and storage is less clear to me otherwise than that it may require more than simply changing connection string (for instance, here's background material for Azure Table Storage).

To get this going, what would be needed to add encryption?

  1. Where and what kind of parameters should be added to configuration to support TLS on client-silo and inter-silo communications? It would feel like it's appropriate to allow different settings to apply to option between inter-silo and client-silo communications.
  2. Whe are in code are relevant places handling socket communications and apply encryption settings?
  3. Some consideration should be given on adding encryption on per-provider basis and maybe separated to their respective tickets, but I'll add here to get things going.
@gabikliot
Copy link
Contributor

How one adds TLS to C# Sockets? Is it the StreamSocket?

@attilah
Copy link
Contributor

attilah commented Sep 23, 2015

You can wrap the connected Socket into a NetworkStream object, and then use and configure SslStream, which will has the NetworkStream object as it's inner stream.

There is an OSS project: http://www.supersocket.net/ which has every piece that we need. I don't say to make a dependency on it, but take a look it has TLS/SSL support implemented based on the above classes.

I hope it helps.

@veikkoeeva
Copy link
Contributor Author

@attilah That's a point to consider in some time span, especially thinking of xplat and CoreCLR. This touches upon #307, so I'll cross-link. I don't know if the right solution would be to add TLS as lightweight, one-off option and refactor later as configuration and storage communcations likely remains the same.

@ReubenBond
Copy link
Member

This would be great for the Azure Web Apps case, where Web Apps cannot currently communicate with Cloud Services without creating a point-to-site VPN. If we adopt this approach, I'd like the ability to customize client certificate validation

@galvesribeiro
Copy link
Member

I would stay away from SuperSocket :)

I added the TLS support there several months ago and when SSL got that bleed breach, I added support to TLS 1.2 and 1.3 but they never updated the code. At the beginning the overall project looks nice, but it has a lot of problems and the project owner doesn't was so whiling to fix it in a timely manner so I decided to move away.

The TLS support is pretty simple, you must just use the SslStream class instead of a regular Stream, and provide a callback (optional) if you want to make client certificate validation at the server, or a callback in the client if you want to validate the server certificate. The server certificate is validated by .net internal code automagically, so if the machine connecting to the server doesn't trust any part of the certificate chain, the connection is aborted. If you add the validation callback at the client, you can bypass it by returning true for development/self-signed scenarios.

Look at this other project that I contributed for https://github.com/rdavisau/sockets-for-pcl

This has a simple "bite before bait" approach for sockets in multiple platforms and I added PCL support as well. That would be helpful for client/mobile applications.

@veikkoeeva
Copy link
Contributor Author

@galvesribeiro You mean teh bait-and-switch trick? :)

@galvesribeiro
Copy link
Member

Yeah yeah thats it hehehe :)

Gutemberg
Sent from Outlookhttp://aka.ms/Ox5hz3


From: Veikko Eeva <notifications@github.commailto:notifications@github.com>
Sent: ter?a-feira, setembro 29, 2015 3:20 PM
Subject: Re: [orleans] Adding optional TLS/encryption to transport channel (#828)
To: dotnet/orleans <orleans@noreply.github.commailto:orleans@noreply.github.com>
Cc: Gutemberg Ribeiro <gutemberg@fgrit.commailto:gutemberg@fgrit.com>

@galvesribeirohttps://github.com/galvesribeiro You mean teh bait-and-switchhttp://log.paulbetts.org/the-bait-and-switch-pcl-trick/? :)

Reply to this email directly or view it on GitHubhttps://github.com//issues/828#issuecomment-144144442.

@ReubenBond
Copy link
Member

I want this feature, so I'm willing to put some work in to make it happen.
So if anyone wants input on the design, this is a good place for it.

My personal preference is for self-signed certificates with support for mutual, certificate-based authentication, since everyone can use those.

I'm considering adding the following properties to NodeConfiguration:

/// <summary>
/// Gets or sets the thumbprint of the certificate which the gateway should use to identify itself.
/// </summary>
public string ProxyGatewayServerCertificateThumbprint { get; set; }

/// <summary>
/// Gets or sets the comma-separated list of accepted client certificates.
/// </summary>
public string ProxyGatewayClientCertificateThumbprints { get; set; }

/// <summary>
/// Gets or sets a value indicating whether or not client certificates are required to be valid.
/// </summary>
public bool ProxyGatewayValidateClientCertificate { get; set; }

In the XML, I intend this to look something like:

<ProxyingGateway
  Address="localhost"
  Port="40000"
  ServerCertificateThumbprint="xxx"
  ClientCertificateThumbprints="aaa,bbb,ccc"
  ValidateClientCertificate="false"/>

I will add inverse properties to ClientConfiguration.
Accepting multiple server certificates based on thumbprint allows for certificate roll-over while still constraining the certificate within a set of known-good values. I prefer this over simply trusting that the certificate is valid according to the current machine's cert list and checking the CN.

ValidateClientCertificate and the corresponding ValidateServerCertificate allow us to add support for CNs later without breaking users who are using self-signed certificates. They will be forced to set that value to false or otherwise install & trust the client certificate on the silo host.

Feedback is appreciated. I can rename "Server" to "Silo" if that's preferable.

@kkatsma
Copy link

kkatsma commented Jan 5, 2016

Just beginning to look at implementing an Actor-based system, and being able to encrypt inter-process communication is a huge deal for us. Was wondering if this work was in progress. Based on the last message it seemed possible.

@veikkoeeva
Copy link
Contributor Author

@kkatsma I believe there is no progress other than what you see here. There was some discussion in Gitter. @ReubenBond and plenty of people hang out there too, so you might want to pop in and ask.

I think I personally need this for future endavours I'd like to materialize. Likely @ReubenBond too, but he's a busy startup guy. If you have the inclination, i.e. time and skills or willingness to learn as you go by (we are here to help, naturally), it'd be great to get tyres rolling. :)

@ReubenBond
Copy link
Member

I will be bringing this up at the roadmap meeting, which is the next Orleans Virtual Meetup

@kkatsma
Copy link

kkatsma commented Jan 6, 2016

@veikkoeeva I'm still needing to complete POC work, but assuming that goes well, I would be willing to try. Or, as Yoda says - "do or do not, there is no try."

@veikkoeeva
Copy link
Contributor Author

@kkatsma Sounds good. What I wanted tell that all work is useable and if you get other priorities, we are still better off. So no pressure on it. :)

@alukyan
Copy link

alukyan commented Feb 10, 2017

Are there any plans to make progress in this area? We are seriously considering using Orleans for one of our projects, but lack of security in Orleans transport layer (both silo-to-silo and client-to-silo) is a show stopper considering our security requirements.

@galvesribeiro
Copy link
Member

@alukyan what is your scenario? Also, what of your security requirements? I used to work with Orleans in Banking and Financial Transaction industry which requires PCI so maybe I can help you to workaround and provide the required security requirements while this issue are dealt with. If you want, jump in Gitter and we can chat about it.

@alukyan
Copy link

alukyan commented Jun 14, 2017

Is there any progress regarding this issue?

@sergeybykov
Copy link
Contributor

No progress. Gigya open-sourced their Microdot framework (https://github.com/gigya/microdot) that I believe can be used to address some of the concerns by securing the http endpoints.

@sergeybykov sergeybykov added this to the Backlog milestone Jun 14, 2017
@alukyan
Copy link

alukyan commented Jun 14, 2017

@sergeybykov Not sure how it can address transport layer security for internal Orleans RPC calls, which I understand use plain sockets, not HTTP. If Microdot addresses security of HTTP frontends, then it is not related to this issue.

Have you considered using gRPC as transport layer which supports TLS out of the box? According to authors due to efficient binary serialization, streaming support and TLS channel multiplexing protocol overhead is comparable to raw sockets

@berdon
Copy link
Contributor

berdon commented Dec 6, 2017

Has there been any movement on this? I've been tasked with adding/implementing this into Orleans as we need it for our project. Has anyone on the team thought about this, have a intended approach, available to brain dump?

@galvesribeiro
Copy link
Member

@berdon, @jason-bragg made a prototype of this change using DotNetty as our transport layer and I believe this work is here

However, Project Bedrock and its abstractions would be somehow useful here once it get somewhere released.

@berdon
Copy link
Contributor

berdon commented Dec 6, 2017

Thanks @galvesribeiro!

@jason-bragg Where did you leave off on this? Was there any discussion of pulling these changes in?

@jason-bragg
Copy link
Contributor

jason-bragg commented Dec 6, 2017

@berdon, The DotNetty prototype was an exploration of replacing our networking layer, not just the socket layer, so it's a bit more involved than adding TSL. Replacing our networking layer with an existing and well maintained networking layer, in theory, would have allowed for secure communications, based on the assumption that a well maintained networking layer would have such support.

This initial prototype was not followed up on, as other work took priority.
Max Gortman from the DotNetty assisted, and was a great help.
Findings - From time of prototype, code is quite out of date now
TL;DR
Prototype went well, but was insufficient to make an informed call on replacing networking with DotNetty. Despite the inconclusive results of the prototyping, I remain optimistic of the potential, and am of the opinion that a second pass at the prototype is warranted.

  • Spent ~4 days on prototype starting with no experience with DotNetty or Netty.
  • Successfully replaced silo to silo message sending code path, but failed to get the silo to silo receive code path working.
  • Prototype not complete enough for meaningful performance testing.
  • Primary outstanding work to complete evaluation is to get prototype in a state where performance can be tested, which includes getting the Silo to Silo receive path working, and replacing the Client to silo networking.
  • Primary outstanding work that will be needed after prototype, other than hardening/cleanup of prototype, will be the integration of Orleans serialization with DotNetty buffering. Alternatively, replacement of Orleans’ serialization with an existing serialization system (should one exist) which is already integrated with DotNetty.

Suggested next phase

  • Allot ~3 days to finish prototype. ~0.5 days of Max time.
  • Work with Max to help finalize the silo to silo receive path.
  • Use duplicate of silo to silo code paths with requisite modifications to replace client to silo communication.
  • Work with Max to optimize as much as is reasonable for a prototype.
  • Load test.

Detailed Findings
Error handling – As our current networking stack is mostly monolithic and synchronous while DotNetty’s is modular and asynchronous (a good thing), we will need to take care to ensure that our error handling is preserved during the port. This is especially true in regards to preserving message order. There are two difficulties here, one is in identifying the expected behaviors, as they are not all documented, the other is testing that they are in fact preserved, as we’ve no test scenarios built around the low level networking behaviors. This is an Orleans issue, not a DotNetty one, as we would encounter this problem moving to most any other networking technology, but needs be called out. So far, the only difficulty I’ve encountered (other than divining from the code which behaviors are expected) was handling message serialization errors. For the prototype I ignored serialization errors.
Serialization into networking buffers – For performance reasons, Orleans’ existing networking layer utilizes a buffering layer that is integrated with Orleans serialization. DotNetty utilizes a buffering layer for performance reasons as well. For the prototype, I used Orleans’ serialization into Orleans’ buffers then copied those buffers into DotNetty’s buffers. This is inefficient. If we choose to go forward with DotNetty, we will either need to refactor Orleans’ serialization to serialize objects into DotNetty buffers, or replace Orleans’ serialization with an existing serialization technology which is already integrated with DotNetty. In any case, this inefficiency should be considered when the prototype performance results are evaluated.
Handshake – Upon connection from silo or client, Orleans performs a minimal handshake. Using the existing DotNetty architecture, the approach I took for handling this is for connections to be initialized with a handshake handler and for the handle to be swapped out once the handshake is complete. There were issues with this functionality in the version of DotNetty that I used, so I worked around this with a temporary kludge. The DotNetty handshake issues I encountered have been fixed, so this capability should now be supported. While the need for this functionality should be noted, I’m not convinced this capability needs to be vetted as part of the prototype.
Bind vs. Listen – We use the local address of the silo-to-silo listeners socket as part of the siloID. As the siloID is needed very early in the silo initialization, this necessitates binding to the listener socket early in the silo initialization. DotNetty performs the bind and listen at the same time, meaning we either start listening before the silo is ready (bad) or find another way to identify the silo. In general, this is not a problem, because the silo socket listener address is already well defined enough for the listen address and the bound socket address to be the same, but this is, currently, not always the case.

@berdon
Copy link
Contributor

berdon commented Dec 13, 2017

Thus far, I’ve spent some time abstracting out the Socket* pieces from Orleans into a Transport abstraction. It’s loosely based off the work you did some time ago with less effort in adding “protocol” abstractions. The default implementation should, ideally, do exactly what is currently being done so as to not have any impact with the change. Been swamped elsewhere but hoping to get to testing the abstraction work soon. TLS, at that point, should hopefully come easily with a hidden layer that sits on top the underlying socket implementation – likely just in some OrleansContrib.TlsTransport package or something.

To dig in a little:

Abstraction

ITransport (ie. Socket)
ITransportFactory

Refactoring

Socket -> SocketTransport

IncomingMessageAcceptor ->

  • IncomingMessageAcceptor
  • IncomingMessageSocketAcceptor (default transport implementation; to be renamed)

SocketManager ->

  • TransportManager (probably where some protocol layer would go)
  • SocketManager (default transport implementation; likely removed)
  • SocketTransportFactory (default transport implementation; likely where SocketManager goes)

For the most part it's been a lot of code shuffling and replacing Socket with ITransport.

@galvesribeiro
Copy link
Member

The default implementation should, ideally, do exactly what is currently being done so as to not have any impact with the change.

@berdon do you think we will be able to use other non-socket transports if we use these abstractions as is Today? I think this abstraction should only care about reading and writing to buffers and the implementation would care about the proper details of transporting it.

Also, are you thinking on add TLS (thru those abstractions) only silo-to-silo or do you have an idea on how to do it for the client as well?

Thanks! Good work!

@jbakholt
Copy link

jbakholt commented Jul 18, 2018

Looking forward to this feature being implemented. Now that GDPR has arrived it's become more relevant than ever. The lack of TLS between clients and silos and inter silo communication has pushed us toward implementing simple message property encryption in the latest project I worked on. It's inefficient and not perfect in any way but for the time being it's the best alternative.

Good to see that this issue has been tagged with P2.

@benjaminpetit benjaminpetit removed their assignment Aug 22, 2018
@rdlaitila
Copy link

I hope this receives more attention soon I've been prototyping with Orleans on various projects and the lack of client -> silo encryption means Orleans becomes a weak spot in the security posture of any project's stack.

@siennathesane
Copy link
Contributor

I wanted to chime in here and provide some extraneous use cases that might help prioritise this work. I might be able to help with some of the work, depending on the specifics. Currently these are targeted use cases that I'm actively working on leveraging Orleans for, but wouldn't potentially see production for about a year.

Passthrough

This use case is where an Orleans cluster will be responsible for pipelining highly regulated data from source X to destination Y. The data would range from PII/PCI/HIPAA data to highly classified data for US Agencies. In this case, Orleans would need a comprehensive transport security interface so you could secure silos through TLS or other protocols. This helps keep silo and grain communications in line with various data regulations and allows the cluster to be certified for those data workloads.

Blue/Green Networks

This use case is a little more complicated, where you would have silos as part of the same cluster, but sit on differently regulated networks. In this case, a Green network would be a secure, internal network where all traffic is considered trusted as the network these silos reside in. A Blue network would be a mostly secure but external network, and grain calls into this network are guaranteed to be up for inspection through either firewalls or generalised packet captures from external malicious actors. The silos deployed in the Blue network would be able to make grain calls into the Green network, but as such would still be subject to all the same security concerns as Blue -> Green traffic is still external -> internal, same with Green -> Blue, as it would be internal -> external network transitions. Comprehensive TLS security would be very important to make sure traffic is secured behind encryption.

Those are the two that come to mind, but I can likely think of others where Orleans would need more comprehensive transport security.

@berdon
Copy link
Contributor

berdon commented Mar 5, 2019

I'm getting back to this issue.

General question for interested parties (@ReubenBond, @davidfowl, @jason-bragg, @sergeybykov) for socket refactoring:

Do we want to see Pipe's rolled in and passed around in stead of sockets? Or a more generic abstraction, as originally done above on that poc branch, where there's an ITransport thing and implementers can do whatever?

I've got around 64 hours of "time" slate for this work so I'm hoping I can work with you guys to come up with something everyone is happy with.

@ReubenBond
Copy link
Member

I'm actively working on this at the moment. The current implementation uses pipes and is based on Bedrock APIs for the most part, so ConnectionContext is the core type.

My branch is here: https://github.com/ReubenBond/orleans/tree/feature/networking-replat

It will change significantly in the next week while I stabilize it & your input is appreciated. Currently I've worked only on the core networking and have not put thought into an actual TLS implementation other than it would be based upon an SslStream adapted to work with those Pipes via a middleware. If you'd like to flesh it out more on top of the existing code in the branch then that would be very helpful.

@berdon
Copy link
Contributor

berdon commented Mar 5, 2019

@ReubenBond How best can I help aside from digging into a TLS middleware impl?

@siennathesane
Copy link
Contributor

@ReubenBond not sure I can help with the implementation, but I'm happy to be a guinea pig (as time permits) for end-user experience.

@tedvanderveen
Copy link

tedvanderveen commented Mar 6, 2019

@ReubenBond quite a few implementations in the code base currently have hard dependencies to System.Net or even System.Net.Socket assemblies. For example SiloAddress class that exposes a property of type IPEndPoint. But what if I want to deploy an alternative messaging transport that does not implement TCP Sockets? It would be great when we can use named endpoints for example where a unique string can be used to address a Silo. It traverses all the way up the chain to EndpointOptions where also hard ties to System.Net.Socket assembly.
Are you planning to tackle this as well for this branche?

@veikkoeeva
Copy link
Contributor Author

@tedvanderveen This would be a nice refactoring. Maybe worth entertaining together with #1121 and #3049 as both consider changes to addressing but grains and their state instead of siloes.

As a tangential to perhaps stimulate minds, I wrote in those about making use of IPFS addressing scheme, so maybe in this case one could see a silo as a IPFS directory and make systems such as https://www.sandbox.game/ a bit more integrated way. I.e. run something fast in Orleans, but make it easy to conceptually talk about addressing state in other clusters hosted by others too and reconciliate things over another concensus. I think @sergeybykov has just appropriately deckared going to GDC too. :)

@dzmitry-lahoda
Copy link

dzmitry-lahoda commented Jul 23, 2019

Sounds interesting to register some grains on public secured ports, while other are not secured and exposed only in internal network. That will allow natural way to grow our game over the globe until we get resources to plug HTTPS over each such kind of endpoint.

Other alternative would be to have one cluster, but part of grains could listen on other no secured port so that these could be exposed by k8s/istio/envoy as secured to public. Again, I would like to avoid to have 2 clusters.

Is there any existing mechanism I can hook into to server different ports for different grains?

@galvesribeiro
Copy link
Member

The major reason why you shouldn't put Orleans to a public network is mostly because it doesn't have any protection against usual attacks like buffer overflow etc. Which is different if you look to public-facing servers like Kestrel for example. Also, the messaging between silos and with clients, has no security validation whatsoever, so one untrusted source can just connect to your silo gateway port, and send packages that may see valid for Orleans protocol-wise, but can have malicious code on it. It doesn't matter how you add transport security to it. Yes, you can put a firewall/proxy to avoid DDoS and others, but you still vulnerable to poison messages.

Now that #5436 is merger, soon you will be able to have TLS encryption on it.

The only way I see public gateway could be open, is with the following:

  1. Establish a public RPC protocol for external/untrusted network clients;
  2. Provide extensibility points so we can validate message signatures and/or message encryption. i.e. client sign the messages and the gateway validate the signature using a PK infrastructure, async key, pre-shared key or any other mechanism to encrypt/protect at message level

If that would be possible, then we would win a lot. Because Orleans would have a whole new bred of interoperability with other technologies onboarding...

Perhaps it is a discussion for another issue...

cc: @ReubenBond @sergeybykov @jason-bragg

@veikkoeeva
Copy link
Contributor Author

veikkoeeva commented Jul 24, 2019

@galvesribeiro Maybe the Kestrel bits cover the options you mention, thus opening a Kestrel in cluster. I remember @blowdart mentioning it'll be replacing IIS (maybe at https://coder-coacher.github.io/NDC-Conferences/Security-in-ASPNET-Core-20-Barry-Dorrans-TfmBOzuPFLQ.html or somewhere else).

Makes me personally nervous, though. One should note about the the increased attack surface on data in the cluster and in memory that easily enjoys legal protection, in addition to plain overloading a cluster. Personally I would be happier to have network isolation, proxies etc. to offload (and absorb) some attacks and keep breathing room if something happens. Also streaming/sending data without transformations to cluster so it doesn't need to be decoded and encoded (this also add an attack surface that could, or should, be mitigated).

<Edit: To be clear, something like this could perhaps get some internal LOB apps off the ground, some interesting cases to evaluate could be gateways in sensor fields and maybe Cloudflare Workers.

Even more thinking is that should one use TLS when running Orleans in Azure. Is it about trusing one's cloud provider? Though mandated in some enterprise data centers to encrypt all traffic.

@dzmitry-lahoda
Copy link

As for now seems we will try to open our Orleans for streaming-observer clients via https://shadowsocks.org/en/spec/Protocol.html .

@MV10
Copy link

MV10 commented Sep 15, 2019

The major reason why you shouldn't put Orleans to a public network is mostly because it doesn't have any protection against usual attacks like buffer overflow etc. Which is different if you look to public-facing servers like Kestrel for example.

Just a quick note to point out that the ASP.NET Core team frequently states that Kestrel is not a public-facing server implementation.

This isn't just a "public-facing" question -- lack of TLS just brought my fledgling enterprise-usage scenarios to a screeching halt. Even on our internal networks it's mandated for absolutely everything, no exceptions. FinTech is like that. :)

Does the 3.0 milestone mean this is actually planned for 3.0? (Is there a roadmap somewhere?)

@Drawaes
Copy link
Contributor

Drawaes commented Sep 15, 2019

It's not just fintech but banking in general, also healthcare and with gdpr almost everything. As a side note Kestrel is now allow to be public facing (aka edge server) it was only early versions that had no hardening

https://docs.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel?view=aspnetcore-2.2

@veikkoeeva
Copy link
Contributor Author

veikkoeeva commented Sep 15, 2019

This has been discussed elsewhere already (e.g. at https://gitter.im/dotnet/orleans?at=56fed826d9b73e635f68704a), but to add more weigh into this consideration, it can be also manufacturing execution systems (MES) that are often air-gapped in relation to Internet and could make use Orleans like systems. I mean those systems will not be run in cloud but are distributed. Various infrastructure systems are also such systems, they don't even need to be critical infrastructure. It's not possible to monitor all places where there is traffic against eavesdropping, tampering etc., so all sorts of mitigations are added and even mandated, TLS being one of them.

If I may, link to #1524 (at #1524 (comment)), I added something to consider towards the end) since @rrector makes a very good point in that thread about timing related problems that can be catastrophic in the just mentioned cases. One another is "hidden buffers", whatever they may be then such as just queuing up uncontrollably something to threading context .

But then also, when running TLS one may run out of entropy. That is a real problem too, discussed at https://gitter.im/dotnet/orleans?at=55de1300b0c2ec8705e72a6e. If tracking Gitter discussions is troublesome, https://dev.solita.fi/2015/11/11/raiders-of-the-lost-entropy.html is a good post on the issue too.

@MV10
Copy link

MV10 commented Sep 15, 2019

@Drawaes Tim thanks for pointing that out about Kestrel, I missed that change. Very interesting.

@ReubenBond
Copy link
Member

Does the 3.0 milestone mean this is actually planned for 3.0?

Yes it is planned for 3.0

@ghost ghost locked as resolved and limited conversation to collaborators Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.