Skip to content

Conversation

@xuzikun2003
Copy link

@xuzikun2003 xuzikun2003 commented Apr 20, 2019

What changes were proposed in this pull request?

Use the provided principal and keytab to do Kerberos login before setting up the JDBC connection.

How was this patch tested?

Step 1: Configure Microsoft SQL Server

Step 2: Start spark shell
/spark/bin/spark-shell --master yarn
--files hdfs:///spark/admin.keytab

Step 3: Run the following Scala code in spark shell
import java.util.Properties

val dataSrc = "jdbc"
val hostname = "master-0.azdata.local"
val port = 1433
val database = "spark"
val url = s"jdbc:sqlserver://${hostname}:${port};database=${database};integratedSecurity=true;authenticationScheme=JavaKerberos"

val df = Seq(
(8, "bat"),
(64, "mouse"),
(-27, "horse")
).toDF("number", "word")

import org.apache.spark.sql.SaveMode

df.write.mode(SaveMode.Overwrite).format("jdbc").option("principal", "admin@AZDATA.LOCAL").option("keytab", "hdfs:///spark/admin.keytab").option("url", url).option("dbtable", "spark").save()

val outDf = spark.read.format("jdbc").option("principal", "admin@AZDATA.LOCAL").option("keytab", "admin.keytab").option("url", url).option("dbtable", "spark").load()

outDf.show()

@xuzikun2003 xuzikun2003 changed the title Support Kerberos login in JDBC connector [SPARK-12312][JDBC]Support Kerberos login in JDBC connector Apr 20, 2019
@xuzikun2003 xuzikun2003 changed the title [SPARK-12312][JDBC]Support Kerberos login in JDBC connector [SPARK-12312][SQL]Support Kerberos login in JDBC connector Apr 20, 2019
@xuzikun2003
Copy link
Author

Try to make a fix for this JIRA
https://issues.apache.org/jira/browse/SPARK-12312

@shanyu
Copy link
Contributor

shanyu commented Apr 29, 2019

+1

@sundarclear
Copy link

+1

Copy link
Contributor

@misutoth misutoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow the instructions but I would not like to deploy a commercial database tool for this purpose. It would be nice to get an open source database to use for reproduction and validation.

The other thing I was wondering is how the keytab gets into the hdfs:///spark/admin.keytab path?

}
})
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is no option for TGT nor delegation token. This later may be useful for hadoop components though. But distributing the keytab is not the most secure way and KDC may be overloaded.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @misutoth ,

  1. Distributing keytab is based on the assumption of secure HDFS. Each executor requires a keytab to do kinit, so we have to distribute the keytab across different executors. If HDFS is secured, then the keytab should be safe.

  2. A user is required to upload the data to /spark/admin.keytab. This file is pre-uploaded by the user.

  3. Since SQL server does not delegation token, we cannot use delegation token. If SQL server supports delegation token in the future, we can use a more efficient way to do the kinit. Right now we need to use keytab (or password) to do the kinit and this limit comes from SQL server side, not HDFS side.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see an issue with UserGroupInformation.loginUserFromKeytabAndReturnUGI(options.principal, keytabFileName)

This would replace static field on UGI for keytab and principal. So further call to UGI for for hadoop may not work.

I think better approach would be to not to rely on UGI. instead do login using LoginContext and wrap call with Subject.doAs(subject,...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see an issue with UserGroupInformation.loginUserFromKeytabAndReturnUGI(options.principal, keytabFileName)

This would replace static field on UGI for keytab and principal. So further call to UGI for for hadoop may not work.

I think better approach would be to not to rely on UGI. instead do login using LoginContext and wrap call with Subject.doAs(subject,...)

Good suggestion. I will give a try and if it works I will update this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@misutoth @dasbh, the keytab is only for JDBC connection and not intended for future use by other hadoop calls, for now, since these database does not support delegation token yet. Usually customer should use their own principal in keytab file stored in secure HDFS to login to sql database, the same as the credential they use to submit the Spark application. This is really a workaround for databases that do not support delegation token.

@misutoth
Copy link
Contributor

misutoth commented Aug 9, 2019

@xuzikun2003 , are you going to resume work on this? It seems this change is left behind so I am going to start working on this. Please let me know if you are about to move this forward as soon as possible so that we do not duplicate our efforts.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@gaborgsomogyi
Copy link
Contributor

@vanzin may I ask to close this? I think it won't continue.
I'm planning to pick this up.

@z47xu
Copy link

z47xu commented Jan 16, 2020

@vanzin may I ask to close this? I think it won't continue.
I'm planning to pick this up.

What is your plan?

@gaborgsomogyi
Copy link
Contributor

File a PR and solve it.

@maropu
Copy link
Member

maropu commented Apr 30, 2020

I'll close this based on the current status of the Jira.

@maropu maropu closed this Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants