Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3883: SSL support for HttpServer and Akka #3571

Closed

Conversation

jacek-lewandowski
Copy link
Contributor

SPARK-3883: SSL support for Akka connections and Jetty based file servers.

This story introduced the following changes:

  • Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace.
  • SSLOptions is created and kept by SecurityManager
  • All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp
  • Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager
  • Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf.
  • Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker

Refer to #3571 for discussion and details

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jacek-lewandowski
Copy link
Contributor Author

@pwendell could you please request this pr to be tested?

@pwendell
Copy link
Contributor

pwendell commented Dec 4, 2014

Jenkins, test this please. Jenkins, add to whitelist.

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24120 has started for PR 3571 at commit 54aad3d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 4, 2014

Test build #24120 has finished for PR 3571 at commit 54aad3d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SSLOptions(enabled: Boolean = false,
    • val classServer = new HttpServer(outputDir, new SecurityManager(conf), classServerPort, "HTTP class server", conf)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24120/
Test FAILed.

@jacek-lewandowski
Copy link
Contributor Author

@pwendell is it possible to access log file somehow? I don't know how to replicate the problems - what is the operating system Jenkins is running on?

@vanzin
Copy link
Contributor

vanzin commented Dec 4, 2014

@jacek-lewandowski from a quick look at the diff, it seems you didn't change anything w.r.t. the configuration. In master, there's no need to add a new config file nor all the different ways of loading it - all daemons should be loading spark-defaults.conf and so you could just use SparkConf for everything like I suggested in the old PR.

Did you have a chance to look at that?

@jacek-lewandowski
Copy link
Contributor Author

@vanzin I just rebased it and fixed so that the tests passes. I'd like to figure out why Jenkins tests fail and then continue the changes. However, I saw in Spark code that it uses SparkConf now in master and worker.

@jacek-lewandowski
Copy link
Contributor Author

@vanzin I'm going to finish it very soon; will you be available to help me with those failing tests?

@vanzin
Copy link
Contributor

vanzin commented Dec 12, 2014

Test logs are gone, so an admin would have to re-trigger them. I'm a little short on time to be able to test this myself at the moment...

@jacek-lewandowski
Copy link
Contributor Author

@vanzin I did some changes but I'm not sure about using Spark configuration in this case. At least it can be not so clear. I mean such cases as running executors. CoarseGrainedExecutorBackend needs SSL configuration to connect to the driver and fetch the real application configuration. In other words, it doesn't have any information about the configuration and it doesn't load the property file. I suppose the same problem would be with DriverWrapper which is used when the driver is run by the worker.

@jacek-lewandowski
Copy link
Contributor Author

It could work if DriverWrapper and CoarseGrainedExecutorBackend would load the daemon's configuration file.

@vanzin
Copy link
Contributor

vanzin commented Dec 18, 2014

Hi @jacek-lewandowski ,

Yes, that's an issue. That's why I suggested in your original PR to use Yarn's approach here:
#2739 (comment)

Code has moved, here's the code in master now:
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala#L74

@jacek-lewandowski
Copy link
Contributor Author

Is it any reason why CoarseGrainedExecutorBackend and DriverWrapper do not load properties file? The loaded properties would be overridden by application configuration anyway?

@vanzin
Copy link
Contributor

vanzin commented Dec 18, 2014

Which property file would they load? You want them to load the property file defined when launching the application, not the default file installed on the node since (i) it may not exist or (ii) it may differ from the one used to launch the job.

So while it would be great to be able to make them load a property file, it feels like a separate feature.

@SparkQA
Copy link

SparkQA commented Dec 19, 2014

Test build #24640 has started for PR 3571 at commit 71599b5.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Dec 19, 2014

Test build #24640 has finished for PR 3571 at commit 71599b5.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class SSLOptions(enabled: Boolean = false,
    • val classServer = new HttpServer(outputDir, new SecurityManager(conf), classServerPort, "HTTP class server", conf)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24640/
Test FAILed.

@jacek-lewandowski
Copy link
Contributor Author

Ok, I think I've found a solution, which is somehow based to what you've shown me. I've pushed some changes for you to take a look before I continue.

The idea for CoarseGrainedExecutorBackend is to use SSL settings from "executor" namespace, defined in worker's properties file - I mean, these settings will be inherited from Spark worker configuration. They will be used by CoarseGrainedExecutorBackend in order to connect to the driver and fetch the application configuration. Once the application configuration is fetched, the settings inherited from the worker will be used as defaults and can be overridden by the application settings.

Similar approach for DriverWrapper (which is used when you run the application in cluster mode). There are two differences - we use "driver" namespace for SSL configuration, and the way how the configuration options are applied. However the final result is similar.

Initially, I wanted to use "daemon" settings as defaults ("daemon" namespace is used to configure SSL for all daemon processes, like Master and Worker). However, it could introduce some vulnerability because the application could access keystores of those daemon processes.

So, in short - the administration will have to configure SSL in at least two namespaces on each node: daemon and executor. It is also recommended to configure it in "driver" namespace to provide good defaults. In the environment, where authentication is not required, all the namespaces can be configured with the same settings.

What do you think @vanzin ?

@vanzin
Copy link
Contributor

vanzin commented Dec 20, 2014

Hi @jacek-lewandowski ,

I think that's an improvement (assuming that you'd remove all the code dealing with the separate ssl.conf, which is still there), but there's still something I'm no seeing. How are the SSL properties propagated from the launcher to the executors?

If I set different options in my launcher that do not match those in the worker's configuration, it seems that things will fail with your new code.

The reason why I pointed out the Yarn code is because that's exactly what it does. The properties you set in your launch process are used by the executors. You don't need to configure anything on the edge nodes. In standalone mode, of course you need to configure the worker daemons themselves, but you don't need to manage client configuration in every worker node.

Anyway, it seems you're not planning to handle Yarn in this change at all (I didn't notice any changes there?), so we can always treat that as a future enhancement. I'll let other reviewers chime in on whether they think this limitation is acceptable in standalone mode. If I misunderstood something about your code, apologies, and please correct me!

Looking at just the last commit, the only comment I had was the somehow confouding use of SparkModule and namespace. It doesn't seem like the SparkModule type is adding much functionality here... also, I'd rather keep the config for master/worker and the config for the history server in separate namespaces, although that's not a big deal (since you can use different config files to achieve the same thing).

@jacek-lewandowski
Copy link
Contributor Author

@vanzin the executor configuration set on the edge nodes is considered a default and the one which is used to connect to driver to fetch the configuration set in launcher. If the launcher configuration override the settings in "executor" namespace, the new settings will be used.

However, still I have some doubts about custom SSL settings on executors. Imagine the following situation: Say you want to have separate keys for executors, the executors communicate with the Master. It is not a problem to add Master public key to the trust store of the executor. However, how would you add the executor public key to the trust store of the master? You'd need to reconfigure it and restart, or I'm missing something.

According to Yarn mode, I'll provide support for it. Now this is my primary task so I'll work on this ticket full time :)

@SparkQA
Copy link

SparkQA commented Dec 23, 2014

Test build #24741 has started for PR 3571 at commit ff5a776.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 23, 2014

Test build #24741 has finished for PR 3571 at commit ff5a776.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SSLOptions(enabled: Boolean = false,
    • val classServer = new HttpServer(outputDir, new SecurityManager(conf, SparkModule.Client), classServerPort, "HTTP class server", conf)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24741/
Test FAILed.

@vanzin
Copy link
Contributor

vanzin commented Dec 23, 2014

@jacek-lewandowski my understanding is that you don't need the master to know the public key of every executor. The master can either disable cert validation, or have a CA cert in its trust store that can then be used by an admin to sign the user's certificate, which the user can then use in his app.

If you do something like that, a default configuration for apps wouldn't work, because each app can potentially have a different certificate. I can see this being particularly desirable for long-runnning apps such as Spark Streaming.

Also, it seems there are legitimate style check failures in your patch - that's why tests aren't running.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26544 has started for PR 3571 at commit 9ef4ed1.

  • This patch merges cleanly.

@jacek-lewandowski
Copy link
Contributor Author

@JoshRosen I've just added the documentation. I hope there is enough of it :)

@JoshRosen
Copy link
Contributor

@jacek-lewandowski Thanks for adding docs; I'll look over them now.

import com.typesafe.config.{Config, ConfigFactory, ConfigValueFactory}
import org.eclipse.jetty.util.ssl.SslContextFactory

/** SSLOptions class is a common container for SSL configuration options. It offers methods to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nit, but do you mind placing the /** on its own line? All of the Java/Scaladoc comments should generally look like

/**
 * comments comments comments [...]
 */

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26544 has finished for PR 3571 at commit 9ef4ed1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26544/
Test FAILed.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor

Overall, this looks good to me, so I'm going to just wait for Jenkins to pass then pull this in for 1.3. We can fix any documentation touch-ups or other things during the QA period.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26548 has started for PR 3571 at commit 9ef4ed1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26548 has finished for PR 3571 at commit 9ef4ed1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26548/
Test FAILed.

@jacek-lewandowski
Copy link
Contributor Author

I don't get why the tests failed. I only added documentation!

@pwendell
Copy link
Contributor

pwendell commented Feb 2, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26556 has started for PR 3571 at commit 9ef4ed1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 3, 2015

Test build #26556 has finished for PR 3571 at commit 9ef4ed1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26556/
Test PASSed.

@JoshRosen
Copy link
Contributor

I've fixed the documentation formatting nits myself, so I'm going to merge this into master (1.3.0). Thanks @jacek-lewandowski for all of your hard work on this feature and to @vanzin for your help reviewing this patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants