Skip to content

[Bug] Standalone broker does not become responsive after start.  #20875

@zbentley

Description

@zbentley

Search before asking

  • I searched in the issues and found nothing similar.

Version

Pulsar 2.10.3.

Docker v4.21.1
Pulsar is the only container running. Docker has 10 CPUs, 24GB of RAM, and 4GB of swap. The host machine is not under memory or CPU pressure (running a terminal and Docker, no other programs, activity monitor reports idle).

Docker engine: 24.0.2

Config:

{
  "builder": {
    "gc": {
      "defaultKeepStorage": "180GB",
      "enabled": true
    }
  },
  "experimental": true,
  "features": {
    "buildkit": true
  }
}

MacOS 12.6.7

ARM/M1 processor.

Minimal reproduce step

  • Run this Docker command:
docker run \
		--rm \
		--name chariot_local_pulsar \
		-it \
		-p 6650:6650 \
		-p 8080:8080 \
		--cap-add=SYS_PTRACE \
		--platform linux/x86_64 \
		apachepulsar/pulsar:2.10.3 \
		bin/pulsar standalone -nss -nfw

What did you expect to see?

Within a few minutes, a broker that I can use for admin API and messaging.

What did you see instead?

This bug is intermittent. Usually, things work fine.

But sometimes (about 1 or 2 out of every 5 times), the broker never starts up; it writes the attached logs and then stops emitting any log info (no activity for 30min). I can not connect to the management API or the messaging port and can not perform any operations. No attempts to connect to either port cause new log output to occur.

Anything else?

I have observed similar problems on Pulsar 3.0, with different logs.

Restarting the computer, or resetting docker (docker system prune --all --force) doesn't seem to reduce or increase the likelihood of this occurring. The very first broker start seems about as likely to fail as any subsequent restart, whether or not the docker daemon was restarted or the image was re-pulled.

If I exec into the container and attempt to capture a heap dump, I get the following error:

I have no name!@2f8835389778:/pulsar$ jmap -dump:live,format=b,file=dump.hprof 1
Exception in thread "main" com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /proc/1/root/tmp/.java_pid1: target process 1 doesn't respond within 10500ms or HotSpot VM not loaded
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100)
	at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
	at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jmap.JMap.executeCommandForPid(JMap.java:128)
	at jdk.jcmd/sun.tools.jmap.JMap.dump(JMap.java:208)
	at jdk.jcmd/sun.tools.jmap.JMap.main(JMap.java:114)

top doesn't report resource exhaustion in the container:

top - 16:33:00 up  1:02,  0 users,  load average: 1.00, 1.01, 1.00
Tasks:   3 total,   0 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.1 us,  0.0 sy,  0.0 ni, 89.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  24010.6 total,  19671.0 free,   1454.2 used,   2885.4 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  21960.3 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    1 10000     20   0 9907.3m 883056  35864 S 101.0   3.6  19:22.82 java
  423 10000     20   0  149248  13184   4992 S   0.0   0.1   0:00.07 bash
  457 10000      0   0  150872  11024   4876 0   0.0   0.0   0:00.00 top

ps output:

UID        PID  PPID  C STIME TTY          TIME CMD
10000        1     0 99 16:14 pts/0    00:21:11 /usr/bin/qemu-x86_64 /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Dlog4j.shutdo
10000      423     0  0 16:24 pts/1    00:00:00 /usr/bin/qemu-x86_64 /bin/bash /bin/bash
10000      529   423  0 15:30 ?        00:00:00 ps -ef

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Staletype/bugThe PR fixed a bug or issue reported a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions