Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataX---遇到的异常 #91

Open
AronChung opened this issue Nov 30, 2020 · 0 comments
Open

DataX---遇到的异常 #91

AronChung opened this issue Nov 30, 2020 · 0 comments
Labels
DataX datax深入研究
Projects

Comments

@AronChung
Copy link
Owner

AronChung commented Nov 30, 2020

kerberosPrincipal异常

2020-11-30 13:46:55.521 [job-0] ERROR HdfsWriter$Job - kerberos认证失败,请确定kerberosKeytabFilePath[/tmp/zhongchuming/hive.keytab]和kerberosPrincipal[hive/cluster-t04.zcm.com@ZCM.COM]填写正确
2020-11-30 13:46:55.535 [job-0] ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: Code:[HdfsWriter-09], Description:[KERBEROS认证失败]. - java.io.IOException: Login failure for hive/cluster-t04.zcm.com@ZCM.COM from keytab /tmp/zcm/hive.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user

在确定keytab账号密码无错误之后,将问题定位到kerberosPrincipal
kerberosPrincipal与hadoop.security.auth_to_local参数有关,参考这篇文章https://www.jianshu.com/p/2ad4be7ecf39
image
在尝试过后发现是acceptance filter的问题,于是去掉,留下hive@ZCM.COM

随后出现另一个异常:

WARNING: Exception encountered while connecting to the server : java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name
2020-11-30 11:27:42.905 [job-0] ERROR HdfsWriter$Job - 判断文件路径[message:filePath =/user/hive/warehouse/zcm_test.db/t_time_sync_test]是否存在时发生网络IO异常,请检查您的网络是否正常!
2020-11-30 11:27:42.914 [job-0] ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: Code:[HdfsWriter-06], Description:[与HDFS建立连接时出现IO异常.]. - java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name; Host Details : local host is: "cluster-t01.zcm.com/192.168.1.207"; destination host is: "cluster-t02.zcm.com":8020;

该异常又指向了kerberosPrincipal
查看了hdfs的default_realm,发现配置了两个域,猜测是找不到域
添加参数dfs.namenode.kerberos.principal.pattern,令其匹配所有路径
"hadoopConfig": {"dfs.namenode.kerberos.principal.pattern": "*" }
然后就跑通了

textFile同步出现mysql与hive条数不一致,但同步log中却没有显示存在脏数据,显示同步完成

  1. mysql数据中含有分隔符,导致到了Hive切割有误
    将表的文件格式改为orc,重新导入即可;或者改一个分隔符,但这就无法实现所有任务通用
  2. hive表的行分隔符是\n,同步的数据源需要对\n做转义成\n
  3. hive客户端查出条数更多,但使用presto时条数却没有问题 (未知)
@AronChung AronChung added the DataX datax深入研究 label Nov 30, 2020
@AronChung AronChung added this to DataX in My Blog Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataX datax深入研究
Projects
My Blog
  
DataX
Development

No branches or pull requests

1 participant