Skip to content

Excessive gRPC Executor Threads Without Resource Quota Limits #291

@Besroy

Description

@Besroy

Problem

During SH testing, a Storage Manager (SM) process exhausted thread resources with 541 threads (normally ~80+). Thread dump analysis shows:

Thread Distribution:

  • 209 threads: grpc_core::Executor::ThreadMain (gRPC's global executor pool, all idle)
  • 246 threads: Empty stack (0x0000000000000000, likely leaked/zombie threads)
  • 32 threads: grpc_threadpool (gRPC internal thread pool)
  • 54 threads: Application threads (folly, nuraft, iomanager, etc.) - normal

AI Analysis - Possible Causes

  1. gRPC executor unbounded growth: gRPC's internal thread pool auto-scales with load but never shrinks. Without ResourceQuota limits, high concurrent RPC calls or connection churn causes accumulation.
  2. Thread lifecycle issue: 246 empty-stack threads suggest cleanup problems, possibly in system libraries (folly/nuraft/boost.asio) or OS-level issues.

Current State

The sisl gRPC wrapper (sisl/src/grpc/rpc_server.cpp:44-79) does not set resource quotas:

  m_builder.SetMaxReceiveMessageSize(max_receive_msg_size);
  m_builder.SetMaxSendMessageSize(max_send_msg_size);
  // Missing: ResourceQuota to limit executor threads

Why Not Fixing Now

  • Root cause unclear: Need to double confirm by human
    -Impact minimal: Pod auto-restarts when hitting limits, no persistent service degradation
  • Optimal limit unknown: Need to determine appropriate MaxThreads value

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions