Skip to content

[Bug][Manager] No proper error handling when MQ message query thread pool is exhausted #12073

@healchow

Description

@healchow

What happened

Currently, when querying MQ messages from multiple Pulsar clusters, there are several issues:

  1. No thread pool management: Message queries are executed without proper thread pool management, which may lead to resource exhaustion under high concurrency.

  2. No graceful handling for task rejection: When too many concurrent requests come in, the system doesn't properly handle the RejectedExecutionException and doesn't provide a user-friendly error response.

  3. No task cancellation mechanism: When task submission fails, previously submitted tasks continue to run unnecessarily.

  4. No interruption support: Long-running IO operations cannot be cancelled when the request is aborted.

What you expected to happen

  1. Add a dedicated thread pool: Use ThreadPoolTaskExecutor with configurable core/max pool size and queue capacity for message query tasks.

  2. Implement task cancellation: When RejectedExecutionException occurs, cancel all previously submitted tasks to free up resources.

  3. Add interruption checks: Check Thread.currentThread().isInterrupted() before and after IO operations to support task cancellation.

How to reproduce

If you frequently submit tasks to a multi-cluster Pulsar setup, you will find that later-submitted tasks time out during submission. However, these tasks remain queued and processed in the code, even though they are useless and should be discarded.

Environment

No response

InLong version

master

InLong Component

InLong Manager

Are you willing to submit PR?

  • Yes, I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

type/bugSomething is wrong

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions