Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiKV should delay request snapshot if it exceeds receiving snapshot limit #15972

Open
overvenus opened this issue Nov 13, 2023 · 3 comments · May be fixed by #17019
Open

TiKV should delay request snapshot if it exceeds receiving snapshot limit #15972

overvenus opened this issue Nov 13, 2023 · 3 comments · May be fixed by #17019
Labels
severity/minor type/bug Type: Issue - Confirmed a bug

Comments

@overvenus
Copy link
Member

overvenus commented Nov 13, 2023

Bug Report

From the log below, we found leaders fail to send snapshots, because their receive-ends have reached snapshot task limits.
Snapshots generated by leaders can consumes lots of disk space, and may be wasted if leaders compacts their logs.

It's better to let follower delay requests snapshot, if they find the snapshot limit is reached.

TiKV Logs

tikv-20 got lots of send snapshot failures.

[2023/11/13 13:40:00.097 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:00.854 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:00.889 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=158552] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:02.100 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:02.852 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:04.102 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:04.854 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:05.602 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:06.103 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:06.342 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:06.858 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:08.106 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:08.857 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1561753] [to>
[2023/11/13 13:40:10.108 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:10.857 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:12.107 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:12.354 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:12.862 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:13.125 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:14.231 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:14.364 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=339339] [to_>
[2023/11/13 13:40:14.921 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:15.364 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=339339] [to_>
[2023/11/13 13:40:16.113 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:16.865 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1561753] [to>
[2023/11/13 13:40:17.366 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=339339] [to_>
[2023/11/13 13:40:18.117 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:18.869 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:19.106 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:19.849 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:20.119 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:20.871 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1561753] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:22.121 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:24.122 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:26.124 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:26.602 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:28.132 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:30.127 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:32.131 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:33.355 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:34.133 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:35.708 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1392211] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:36.132 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:36.756 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1392211] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:38.137 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:38.757 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1392211] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:40.109 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>
[2023/11/13 13:40:40.760 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1392211] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:41.507 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1392211] [to>
[2023/11/13 13:40:41.783 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1889027] [to>
[2023/11/13 13:40:42.144 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:42.784 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1392211] [to>
[2023/11/13 13:40:42.787 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1889027] [to>
[2023/11/13 13:40:44.143 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1567474] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:44.386 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1016225] [to>
[2023/11/13 13:40:44.780 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1392211] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:44.782 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err=Grpc(RemoteStopped)] [region_id=1889027] [to_addr=tc-tikv-21.tc-tikv-peer.titan-open-zccxk.svc:20160] [thread_id=0x5]
[2023/11/13 13:40:45.770 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1016225] [to>
[2023/11/13 13:40:46.145 +08:00] [ERROR] [snap.rs:545] ["failed to send snap"] [err="Grpc(RpcFinished(Some(RpcStatus { code: 8-RESOURCE_EXHAUSTED, message: \"the number of received snapshot tasks 32 exceeded the limitation 32\", details: [] })))"] [region_id=1567474] [to>

tikv-21 had reached snapshot limit.

[2023/11/13 13:40:44.741 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.748 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.767 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.784 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.800 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.948 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.948 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:44.999 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.071 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.074 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.115 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.131 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.166 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.174 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.181 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.243 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.289 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.299 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.306 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.329 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.341 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.380 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.492 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.524 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.674 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.704 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.772 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.782 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.789 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.794 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.805 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.810 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.830 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:45.842 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.002 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.056 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.058 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.059 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.062 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.147 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.171 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.339 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.394 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.442 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.484 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.565 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.692 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.750 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.753 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.768 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.772 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.787 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.795 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.817 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.835 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:46.865 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:47.001 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:47.038 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]
[2023/11/13 13:40:47.056 +08:00] [WARN] [snap.rs:425] ["too many recving snapshot tasks, ignore"] [thread_id=0x5]

What version of TiKV are you using?

v7.4.0

Steps to reproduce

Scale in, scale out or some scenarios that involve lots of region balance.

What did you expect?

No "too many recving snapshot tasks" logs.

What did happened?

Lots of "too many recving snapshot tasks" and wasted snapshots.

@overvenus overvenus added type/bug Type: Issue - Confirmed a bug severity/minor labels Nov 13, 2023
@zzl-7
Copy link

zzl-7 commented Dec 23, 2023

Hi @overvenus Is this open to contribution?

@tonyxuqqi
Copy link
Contributor

@zzl-7 Yes, TiKV is always open to contribution if it's not assigned yet.

@tonyxuqqi
Copy link
Contributor

/cc @hbisheng

ti-chi-bot bot added a commit that referenced this issue May 20, 2024
ref #15972

This commit starts to introduce a snapshot precheck mechanism to reduce
unnecessary snapshot drops and generations. The mechanism functions as follows: 

Before a leader sends a snapshot to a follower, it first sends a precheck
message. This message serves as a preliminary inquiry to the follower, seeking
confirmation of its readiness to receive a snapshot. Upon receiving the message,
the follower consults its concurrency limiter and returns a response to the
leader. The leader will only proceed to generate the snapshot after the precheck
is passed. 

A passed precheck means the leader has reserved a spot on the follower so the
subsequent snapshot send should succeed. The reservation has a TTL and the
leader is supposed to complete the snapshot generation and transmission within
the TTL timeframe.

Note that this commit implements the concurrency limiter without actually using
it. A follow-up commit will update the snapshot sending process and make use of
this concurrency limiter.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: Neil Shen <overvenus@gmail.com>
ti-chi-bot bot added a commit that referenced this issue May 23, 2024
ref #15972

An issue with the previous implementation of the concurrency limiter is that 
its APIs are not idempotent. If a single region leader keeps sending precheck 
requests to the same receiver (this may happen if the precheck response is 
somehow lost on the network), it would consume all reservations, thereby 
blocking all other snapshots. This commit addresses this problem by updating 
the concurrency limiter to deduplicate requests based on `region_id`. If the 
limiter already has a valid reservation for a region, calling try_recv() again 
will not allocate a new one.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/minor type/bug Type: Issue - Confirmed a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants