Skip to content

DragonFly instance unresponsive: Error: Stalled heartbeat() fiber for 62453444040472 seconds because of big value serialization #4836

@wernermorgenstern

Description

@wernermorgenstern

Describe the bug
A DragonFly instance wasn't responding to any queues, and all we could see was a lot of
Stalled heartbeat() fiber for 62453444040472 seconds because of big value serialization messages in the logs. The seconds changed, every second

To Reproduce
Steps to reproduce the behavior:

  1. Run a DragonFLy instance for a period of time (not specific time)
  2. For BullMQ Pro, unable to add new jobs to the queue, and no workers are able to access it. But no BullMQ Errors
  3. Check the DragonFly Logs

Expected behavior
DragonFly instance should be able to run for a long time

Environment (please complete the following information):

  • OS: AWS Linux
  • Kernel: Linux dragonfly-bullmq-admin-response-0-primary-0 5.10.233-224.894.amzn2.aarch64 #1 SMP Mon Jan 27 16:53:26 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
  • Containerized?: EKS Kubernetes, instance type: c7gn.xlarge
  • Dragonfly Version: 1.27.2

Additional context
We run BullMQ Pro with this specific DragonFly Instance

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions