Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FAQ]seatunnel(waterdrop) how to access the HDFS、YARN and Hive that is using Kerberos authentication #590

Closed
garyelephant opened this issue Dec 1, 2020 · 0 comments
Labels

Comments

@garyelephant
Copy link
Contributor

garyelephant commented Dec 1, 2020

Kerberos是大数据技术栈中,经常被用来做安全验证的一种服务,在各个互联网公司中使用比较广泛,
如果你的HDFS、YARN、Hive、HBase等服务是需要通过Kerberos鉴权的,我们在这里提供两种方式来使用Waterdrop访问这些服务上的资源:

第一种:启动Waterdrop前,先kinit

在部署Waterdrop的机器上,先执行kinit命令登陆,在执行Waterdrop的启动命令,如start-waterdrop.sh即可,这样Waterdrop会携带当前登陆用户的Principal,代表此Principal来访问HDFS、YARN、Hive、HBase,这里要保证此Principal是有对应权限的。

第二种:

但是对于需要长期运行的程序,无论是以deploy-mode=client模式还是deploy-mode=cluster模式提交到YARN长期运行,用第一种方式带来的问题是kinit登陆后获取的delegation token会在24小时内过期(具体多长时间过期可以在Kerberos中配置,详询你司Kerberos的管理员),token一旦过期程序就没有权限了。所以对于你长期运行的程序,我们直接在配置文件中指定相关参数即可,如下(适用于Waterdrop v1.x):

// app.conf
spark {
    ...
    # 最重要的是这两个参数:
    spark.yarn.keytab = ...
    spark.yarn.principal = ...
}

input {
    hive { ... }
}

filter {
    sql { ... }
}

output {
    clickhouse { ... }
}

之后直接启动Waterdrop即可。

./bin/start-waterdrop.sh -c app.conf --其他参数

至于principal,keytab是什么?这些都是Kerberos的基本概念,与Waterdrop无关,在这里就不赘述了,也可以看下面的参考文档。


参考文档:

  1. https://spark.apache.org/docs/2.4.3/security.html#kerberos
  2. https://spark.apache.org/docs/2.4.3/running-on-yarn.html#yarn-specific-kerberos-configuration
  3. https://blog.stratio.com/spark-kerberos-safe-story/
  4. https://andriymz.github.io/spark/how-spark-uses-kerberos-authentication/#
  5. https://andriymz.github.io/kerberos/authentication-using-kerberos/#authenticate-by-providing-jaas-configuration-at-application-start
@garyelephant garyelephant changed the title waterdrop 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? [常见问题]Waterdrop 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? Dec 1, 2020
@garyelephant garyelephant pinned this issue Dec 1, 2020
@garyelephant garyelephant mentioned this issue Dec 1, 2020
Closed
@garyelephant garyelephant changed the title [常见问题]Waterdrop 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? [常见问题]seatunnel/waterdrop 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? Oct 22, 2021
@garyelephant garyelephant changed the title [常见问题]seatunnel/waterdrop 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? [常见问题]seatunnel(waterdrop) 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? Nov 16, 2021
@davidzollo davidzollo changed the title [常见问题]seatunnel(waterdrop) 如何访问kerberos鉴权的HDFS、YARN、Hive等资源? [FAQ]seatunnel(waterdrop) how to access the HDFS、YARN and Hive that is using Kerberos authentication Dec 2, 2021
@davidzollo davidzollo added the FAQ label Dec 4, 2021
@davidzollo davidzollo unpinned this issue Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants