-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection problem between runtime and gateway #255
Comments
For 5bnZHzXqdRwun6NkzMgksUirAdspUnUwLBFYG91QC55p, the final ping for keepalive is: And 5bnZHzZXgN4JsbKaFsqZ75HczHHh5dsrcAh9Ake4wKwA connect 5bnZHzXqdRwun6NkzMgksUirAdspUnUwLBFYG91QC55p The sn-miner use client_ping_timeout(default is 5min) to purge clients' cache, so, it's because 5bn..55p hasn't pinged from keep alive for too long, SN thinks it has dropped, resulting in a NotFound when 5bn..KwA calls, thus the connection fails. |
So this problem, is also bdt's sn ping stopped unexpectedly, resulting in the problem of being considered offline by the SN server? It should be the same or similar problem as the one below So we need to review the SN ping logic inside the bdt stack to see what would cause the ping loop to be aborted |
All tasks are blocked, causing the bdt-stack is unable to communicate properly:
After 5 minutes and 30 seconds, when querying 5bnZHzXqdRwun6NkzMgksUirAdspUnUwLBFYG91QC55p through sn, it will definitely not be found, and during this period, there will be no communication with it:
When the check\dead.rs detect this situation, the gateway will be restart? If so, the device an be connected after 5min. |
After switching active OOD, there is a situation where runtime and OOD cannot be connected, and pings connected through bdt fail
OOD: 5bnZHzXqdRwun6NkzMgksUirAdspUnUwLBFYG91QC55p
runtime: 5bnZHzZXgN4JsbKaFsqZ75HczHHh5dsrcAh9Ake4wKwA
Runtime side
Last connection on runtime side
Details
4904: [2023-05-06 20:47:15.658443 +08:00] INFO [ThreadId(4)] [component\cyfs-lib\src\requestor\bdt.rs:181] http-bdt request to 5bnZHzXqdRwun6NkzMgksUirAdspUnUwLBFYG91QC55p:84 success! during=208ms, seq=TempSeq(4211563353)
The next connections to the gateway are all timeout errors.
Details
4933: [2023-05-06 20:47:26.396892 +08:00] WARN [ThreadId(8)] [component\cyfs-lib\src\requestor\bdt.rs:110] connect to 5bnZHzXqdRwun6NkzMgksUirAdspUnUwLBFYG91QC55p:84 failed! with_desc=true, during=5009ms, err: (Timeout, future has timed out, None)
And after restarting the runtime side, it also can't connect to the gateway, so it looks like there is a problem on the gateway side
OOD side
Last request log received from runtime
Details
[2023-05-06 20:47:18.905150 +08:00] INFO [ThreadId(7)] [component\cyfs-stack\src\interface\http_bdt_listener.rs:167] recv bdt http request. source=5bnZHzZXgN4JsbKaFsqZ75HczHHh5dsrcAh9Ake4wKwA, seq=TempSeq(4211563355), method=POST, url=http:// 5bnzhzxqdrwun6nkzmgksuiradspunuwlbfyg91qc55p:84/non/, len=Some(26)
It looks like the OOD is running and for some reason other devices can't connect to itself
logs.zip
The text was updated successfully, but these errors were encountered: