[SPARK-5158] [core] [security] Spark standalone mode can authenticate against a Kerberos-secured Hadoop cluster #4106

…s-secured Hadoop cluster Previously, Kerberos secured Hadoop clusters could only be accessed by Spark running on top of YARN. In other words, Spark standalone clusters had no way to read from secure Hadoop clusters. Other solutions were proposed previously, but all of them attempted to perform authentication by obtaining a token on a single node and passing that token around to all of the other Spark worker nodes. The shipping of the token is risky, and all previous iterations fell short in leaving the token open to man-in-the-middle attacks. This patch introduces an alternative approach. It assumes that the keytab file has already been distributed to every node in the cluster. When Spark starts in standalone mode, all of the workers individually log in via Kerberos using specified configurations in the driver's SparkConf. In addition, on basic Hadoop cluster setups the key tab file is often already manually deployed on all of the cluster's nodes; it's not a huge stretch to expect the keytab files to be deployed to the Spark worker nodes as well, if they are not already there. This assumes that Spark will always authenticate with Kerberos using the same principal and keytab, and the login is done at the very start of the job. Strictly speaking we should be trying to reduce the surface area of the region of code that operates under a logged-in state. Or to put it another way, the authentication should only be performed precisely when files are written or read from HDFS, and after the read or write is performed the subject should be logged out. However this is difficult to write and prone to errors, so this is left for a future refactor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5158] [core] [security] Spark standalone mode can authenticate against a Kerberos-secured Hadoop cluster #4106

[SPARK-5158] [core] [security] Spark standalone mode can authenticate against a Kerberos-secured Hadoop cluster #4106

Commits on Feb 25, 2015