-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Problem
During SH testing, a Storage Manager (SM) process exhausted thread resources with 541 threads (normally ~80+). Thread dump analysis shows:
Thread Distribution:
- 209 threads: grpc_core::Executor::ThreadMain (gRPC's global executor pool, all idle)
- 246 threads: Empty stack (0x0000000000000000, likely leaked/zombie threads)
- 32 threads: grpc_threadpool (gRPC internal thread pool)
- 54 threads: Application threads (folly, nuraft, iomanager, etc.) - normal
AI Analysis - Possible Causes
- gRPC executor unbounded growth: gRPC's internal thread pool auto-scales with load but never shrinks. Without ResourceQuota limits, high concurrent RPC calls or connection churn causes accumulation.
- Thread lifecycle issue: 246 empty-stack threads suggest cleanup problems, possibly in system libraries (folly/nuraft/boost.asio) or OS-level issues.
Current State
The sisl gRPC wrapper (sisl/src/grpc/rpc_server.cpp:44-79) does not set resource quotas:
m_builder.SetMaxReceiveMessageSize(max_receive_msg_size);
m_builder.SetMaxSendMessageSize(max_send_msg_size);
// Missing: ResourceQuota to limit executor threads
Why Not Fixing Now
- Root cause unclear: Need to double confirm by human
-Impact minimal: Pod auto-restarts when hitting limits, no persistent service degradation - Optimal limit unknown: Need to determine appropriate MaxThreads value
Metadata
Metadata
Assignees
Labels
No labels