Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

持久节点HTTP健康检查失败 #12068

Open
Joey777210 opened this issue May 8, 2024 · 5 comments
Open

持久节点HTTP健康检查失败 #12068

Joey777210 opened this issue May 8, 2024 · 5 comments
Labels
kind/discussion Category issues related to discussion

Comments

@Joey777210
Copy link
Contributor

我们注册持久节点规模大概在8万左右,采用HTTP健康检查,运行一段时间后有大量不健康,观察naming.log发现打印大量

http:500 milliseconds timeout on connection http-outgoing-52823038824

http:Connection lease request time out

观察代码发现Nacos默认的HTTP健康检查Timeout为500ms,连接池大小为核心数,健康检查间隔为5s一次。

感觉是服务能力跟不上需要检查的节点量级,请问是否有方式优化这种问题? 目前想到的是把健康检查时间调长,但在代码中没有发现可以配置的地方?

@KomachiSion
Copy link
Collaborator

目前好像没办法调整,不过3.0有计划重新设计一下健康检查相关的部分。

@KomachiSion KomachiSion added the kind/discussion Category issues related to discussion label May 10, 2024
@Joey777210
Copy link
Contributor Author

大佬,另外发现使用HTTP对持久节点健康检查时,naming-server.log中打印大量的Client change for service .......http check started before last one finished日志,节点大量不健康,请问可能的原因是什么呢?
@KomachiSion

@KomachiSion
Copy link
Collaborator

Client change for service ....... 这个日志就是某个服务的某个实例发生了变化, 比如健康检查状态变了,或者有新注册,或者更新。
http check started before last one finished 这个日志应该就是你上面提的问题,因为连接超时,或者线程池不足,导致任务积压了,下一次的检查任务已经启动,但是之前的还没有结束。

如果目前已经出现这个问题,建议扩容nacos节点,把健康检查的压力分散到多个节点上。

@Joey777210
Copy link
Contributor Author

Client change for service发生的频率非常高,毫秒级地在刷新日志;
探活频率的问题我通过改代码的方式暂时解决了

@xuechaos
Copy link
Member

@Joey777210 8w节点注册多少个服务哈,有多少目前多少机器支撑?
额外问一下业务场景,目前持久化服务使用的场景具体是什么?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Category issues related to discussion
Projects
None yet
Development

No branches or pull requests

3 participants