Skip to content

Health check fails when running Docker images #6710

@cdbartholomew

Description

@cdbartholomew

Describe the bug
When requesting the health check endpoint for a broker, it always fails when running inside a Docker image:


 --- An unexpected error occurred in the server ---

Message: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 5 lookup request timedout after ms 30000

Stacktrace:

java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 5 lookup request timedout after ms 30000
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
	at java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1284)
	at java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1270)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at org.apache.pulsar.client.impl.ProducerImpl.lambda$connectionOpened$14(ProducerImpl.java:1205)
	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1026)
	at org.apache.pulsar.client.impl.ClientCnx.lambda$channelActive$0(ClientCnx.java:187)
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:176)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 5 lookup request timedout after ms 30000
	at org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1025)
	... 10 more

This problem does not happen when running the broker in standalone mode.

This issue is present in master and in the v2.5.1-candidate-2 tag. In an identical setup running 2.5.0, this does not happen.

To Reproduce

  1. Build Docker images like this:
git checkout master
mvn install -DskipTests
cd docker
./build.sh
  1. Run the image in Kubernetes on minikube.

  2. From inside the broker container, call the health endpoint:

curl localhost:8080/admin/v2/brokers/health

Expected behavior
Return HTTP 200.

Additional context

Every time the endpoint is called, it leaves a handing producer on the topic:

bin/pulsar-admin topics stats persistent://pulsar/pulsar/10.32.0.6:8080/healthcheck
{
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 0.0,
  "msgThroughputOut" : 0.0,
  "averageMsgSize" : 0.0,
  "storageSize" : 0,
  "backlogSize" : 0,
  "publishers" : [ {
    "msgRateIn" : 0.0,
    "msgThroughputIn" : 0.0,
    "averageMsgSize" : 0.0,
    "producerId" : 0,
    "metadata" : { },
    "producerName" : "pulsar-0-2",
    "connectedSince" : "2020-04-10T11:20:20.001Z",
    "clientVersion" : "2.5.1",
    "address" : "/10.32.0.6:56852"
  }, {
    "msgRateIn" : 0.0,
    "msgThroughputIn" : 0.0,
    "averageMsgSize" : 0.0,
    "producerId" : 2,
    "metadata" : { },
    "producerName" : "pulsar-0-4",
    "connectedSince" : "2020-04-10T11:35:42.63Z",
    "clientVersion" : "2.5.1",
    "address" : "/10.32.0.6:56852"
  }, {
    "msgRateIn" : 0.0,
    "msgThroughputIn" : 0.0,
    "averageMsgSize" : 0.0,
    "producerId" : 1,
    "metadata" : { },
    "producerName" : "pulsar-0-3",
    "connectedSince" : "2020-04-10T11:34:27.901Z",
    "clientVersion" : "2.5.1",
    "address" : "/10.32.0.6:56852"
  } ],
  "subscriptions" : { },
  "replication" : { },
  "deduplicationStatus" : "Disabled",
  "bytesInCounter" : 0,
  "msgInCounter" : 0
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe PR fixed a bug or issue reported a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions