Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

连接处于CLOSE_WAIT状态导致健康检查失败 #2662

Open
icexin opened this issue Jun 12, 2024 · 7 comments
Open

连接处于CLOSE_WAIT状态导致健康检查失败 #2662

icexin opened this issue Jun 12, 2024 · 7 comments

Comments

@icexin
Copy link

icexin commented Jun 12, 2024

Describe the bug (描述bug)
服务端的一次宕机后,客户端就一直因为检查检查失败导致rpc失败,报错 [E112]Fail to select server from xxx。在出问题的机器上可以看到连接处于CLOSE_WAIT状态。

Versions (各种版本)
OS: ubuntu 20.04
Compiler: clang-8
brpc: 1.8.0
protobuf: 3.15.8

Additional context/screenshots (更多上下文/截图)

image image
@chenBright
Copy link
Contributor

有这个日志吗?

brpc/src/brpc/socket.cpp

Lines 2564 to 2567 in 2e18318

int Socket::CheckHealth() {
if (_hc_count == 0) {
LOG(INFO) << "Checking " << *this;
}

@chenBright
Copy link
Contributor

CLOSE_WAIT状态持续很久吗?

@icexin
Copy link
Author

icexin commented Jun 18, 2024

对,在我们手动重启之前一直是CLOSE_WAIT状态

@chenBright
Copy link
Contributor

看着像是是还有rpc没有结束。框架内部要等到连接上全部rpc结束了,才会close fd,然后进行健康检查。

@chenBright
Copy link
Contributor

另外,服务端没起来,健康检查也不会成功吧。

@icexin
Copy link
Author

icexin commented Jun 18, 2024

这边是客户端处于close_wait状态,我们用的是同步rpc,应该rpc很快就处理完了。服务端后来起来后,客户端也因为这个状态一直健康检查失败

@chenBright
Copy link
Contributor

如果长时间处于CLOSE_WAIT状态,应该是某处持有了socket的引用,导致一直没有close fd,然后进行健康检查。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants