hive---ddl在200s左右结束 #106

AronChung · 2021-09-29T07:30:30Z

现象

create\drop等dll执行时长都为200秒

原因分析

ddl的流程涉及的组件为：hs2 -> hms -> sentry -> hdfs acl
排查过程：

慢查询
show locks
hs2日志
hms日志 Failed to sync requested HMS notifications up to the event ID xxxx
namenode日志
sentry日志 timed out wait request for id xxxx

timed out wait request for id  xxxx事件解析
use hive;
select * from NOTIFICATION_LOG where event_id=xxxx;

查看sentry处理的eventId是否跟hms的一致
sentry: 
select * from  sentry.SENTRY_HMS_NOTIFICATION_ID order by NOTIFICATION_ID desc limit 10;

hms:
select * from hive.NOTIFICATION_SEQUENCE Order by NEXT_EVENT_ID desc limit 10;

经过排查，定位到hms和sentry的两句log，确定是HMS notifications出了问题，下载sentry源码master分支，找到异常所在的代码

从而进一步找到200秒超时的参数：

// Should match the value for RPC timeout in HMS client config
    public static final String SENTRY_NOTIFICATION_SYNC_TIMEOUT_MS = "sentry.notification.sync.timeout.ms";
    public static final int SENTRY_NOTIFICATION_SYNC_TIMEOUT_DEFAULT = 200000;

分析这块源码逻辑，这块主要是开启了hdfs-sentry acl同步后，hdfs， sentry， hive metastore server三者间权限同步的消息处理。当突然大批量的目录权限消息需要处理，后台线程处理不过来，消息积压滞后就会出现这个异常。这个异常不影响集群使用，只是会导致create，drop table 慢需要等200s，这样等待也是为了追上最新的id，可以通过设置sentry sentry.notification.sync.timeout.ms（默认200s）参数调小超时时间，减小等待时间，积压不多的话可以让它自行消费处理掉。我们这次同时出现了hive metastore server 参与同步消息处理的线程被异常退出，导致sentry的sentry_hms_notification_id 表数据一直没更新，需要重启hive metastore server。如果积压了太多消息，让它慢慢消费处理需要的时间太长，可能一直追不上，这时可以选择丢掉这些消息。具体操作在sentry sentry_hms_notification_id 表中插入一条最大值(等于当前消息的id，从notification_sequence 表中获取) ，重启sentry 服务。notification_log 表存储了消息日志信息。

总结：

于昨天上午10:35:08起，用户操作ddl时开始变慢
那个时间有大量的DDL 分区删除操作(20分钟有2w多)
导致sentry到hdfs acl的链路速度跟不上ddl的请求速度导致，有大量的ddl命令导致sentry到hdfs acl的链路处理不及时
超时时间默认200s，这就是为什么大家drop/create表时都刚好200s的原因
持续至今日凌晨0点50分，ddl数量恢复正常，问题得以缓解

The text was updated successfully, but these errors were encountered:

AronChung added the Hive hive label Sep 29, 2021

AronChung changed the title ~~hive---ddl在200s左右结束(未完待续)~~ hive---ddl在200s左右结束 Oct 20, 2021

AronChung added this to Hive in My Blog Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hive---ddl在200s左右结束 #106

hive---ddl在200s左右结束 #106

AronChung commented Sep 29, 2021 •

edited

Loading

hive---ddl在200s左右结束 #106

hive---ddl在200s左右结束 #106

Comments

AronChung commented Sep 29, 2021 • edited Loading

现象

原因分析

AronChung commented Sep 29, 2021 •

edited

Loading