Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka taking 10's of seconds to initialize #1597

Closed
jcalcote opened this issue Dec 19, 2017 · 5 comments
Closed

kafka taking 10's of seconds to initialize #1597

jcalcote opened this issue Dec 19, 2017 · 5 comments

Comments

@jcalcote
Copy link

jcalcote commented Dec 19, 2017

Description

I have a simple shared library that requires kafka to be up and running before a consumer may make requests to that library. The broker is running on the local host. To be sure the kafka bus is up on the local host, we run rd_kafka_metadata first with our localhost broker bootstrap URL. Getting the broker metadata can take many seconds - 10-15 - just to establish the connection with the broker (before it even attempts to obtain the metadata - I stepped into rd_kafka_metadata to see what it was doing and it returns RD_KAFKA_RESP_ERR__TRANSPORT many times before finally getting past that point and retrieving metadata).

  • Why is this?
  • Is there anything we can do to speed up the process?
  • Is there a client or broker setting we can use to allow very quick connections?

Since it's localhost-only infrastructure, it would seem possible to establish such a connection in less than a second normally.

I'm using librdkafka 0.11.1 on CentOS 7 against a 2.10-0.9.0.1 kafka broker all on one host.

@edenhill
Copy link
Contributor

edenhill commented Jan 2, 2018

The initial connection should be instantaneous, I suggest you enable debugging to figure out what is going on by setting the debug config property to broker,protocol

@jcalcote
Copy link
Author

jcalcote commented Jan 4, 2018

Hi Magnus,

I've done some debugging - first, as you suggested, by enabling the broker,protocol debug options. During the startup process, where I check to see if the broker is running, I see the following kafka log messages output:

$ (start test program here)
22964.173540: _seed_random - start
22964.173592: _seed_random - end
22964.173695: _init_entry - start
22964.173727: _init_entry - after rcu_init()
22964.173742: _init_entry - after rcu_register_thread()
Kafka [rdkafka#consumer-1][BROKER-7]: [thrd:app]: localhost:9092/bootstrap: Added new broker with NodeId -1
22964.178179: _init_entry - after kafka_util_create_broker_check_handle
Kafka [rdkafka#consumer-1][BRKMAIN-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Enter main broker thread
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: broker in state INIT connecting
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connecting to ipv4#127.0.0.1:9092 (plaintext) with socket 44
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state INIT -> CONNECT
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connected to ipv4#127.0.0.1:9092
Kafka [rdkafka#consumer-1][CONNECTED-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connected (#1)
Kafka [rdkafka#consumer-1][FEATURE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Updated enabled protocol features +ApiVersion to ApiVersion
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state CONNECT -> APIVERSION_QUERY
Kafka [rdkafka#consumer-1][SEND-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Sent ApiVersionRequest (v0, 25 bytes @ 0, CorrId 1)
Kafka [rdkafka#consumer-1][BRKMAIN-7]: [thrd::0/internal]: :0/internal: Enter main broker thread
Kafka [rdkafka#consumer-1][STATE-7]: [thrd::0/internal]: :0/internal: Broker changed state INIT -> UP
22964.278172: _init_entry - after kafka_util_is_broker_running
22964.379600: _init_entry - after kafka_util_is_broker_running
... 10 seconds later - the log timestamps are in monotonic clock seconds ...
22974.092666: _init_entry - after kafka_util_is_broker_running
22974.193281: _init_entry - after kafka_util_is_broker_running
Kafka [rdkafka#consumer-1][REQERR-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: ApiVersionRequest failed: Local: Timed out: explicit actions 0x0
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: failed: err: Local: Not implemented: (errno: Operation now in progress)
Kafka [rdkafka#consumer-1][FAIL-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: ApiVersionRequest failed: Local: Timed out: probably due to old broker version
Kafka [rdkafka#consumer-1][FEATURE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Updated enabled protocol features -ApiVersion to 
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state APIVERSION_QUERY -> DOWN
Kafka [rdkafka#consumer-1][REQTMOUT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Timed out 1+0+0 requests
22974.293526: _init_entry - after kafka_util_is_broker_running
22974.394082: _init_entry - after kafka_util_is_broker_running
22974.495320: _init_entry - after kafka_util_is_broker_running
22974.597638: _init_entry - after kafka_util_is_broker_running
22974.701137: _init_entry - after kafka_util_is_broker_running
22974.804368: _init_entry - after kafka_util_is_broker_running
22974.905314: _init_entry - after kafka_util_is_broker_running
22975.007678: _init_entry - after kafka_util_is_broker_running
22975.108951: _init_entry - after kafka_util_is_broker_running
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: broker in state DOWN connecting
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connecting to ipv6#[::1]:9092 (plaintext) with socket 44
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state DOWN -> CONNECT
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connected to ipv6#[::1]:9092
Kafka [rdkafka#consumer-1][CONNECTED-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connected (#2)
Kafka [rdkafka#consumer-1][APIVERSION-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Using (configuration fallback) 0.9.0 protocol features
Kafka [rdkafka#consumer-1][FEATURE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Updated enabled protocol features to BrokerBalancedConsumer,ThrottleTime,Sasl,BrokerGroupCoordinator,LZ4
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state CONNECT -> UP
Kafka [rdkafka#consumer-1][SEND-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Sent MetadataRequest (v0, 25 bytes @ 0, CorrId 2)
Kafka [rdkafka#consumer-1][SEND-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Sent MetadataRequest (v0, 25 bytes @ 0, CorrId 3)
Kafka [rdkafka#consumer-1][RECV-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Received MetadataResponse (v0, 1452 bytes, CorrId 2, rtt 3.95ms)
Kafka [rdkafka#consumer-1][BROKER-7]: [thrd:main]: jmc-c7dev2:9092/0: Added new broker with NodeId 0
Kafka [rdkafka#consumer-1][BRKMAIN-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Enter main broker thread
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: broker in state INIT connecting
Kafka [rdkafka#consumer-1][RECV-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Received MetadataResponse (v0, 1452 bytes, CorrId 3, rtt 5.64ms)
22975.208616: _init_entry - after kafka_util_is_broker_running
Kafka [rdkafka#consumer-1][DESTROY-7]: [thrd:app]: Terminating instance
Kafka [rdkafka#consumer-1][DESTROY-7]: [thrd:main]: Destroy internal
Kafka [rdkafka#consumer-1][DESTROY-7]: [thrd:main]: Removing all topics
Kafka [rdkafka#consumer-1][TERM-7]: [thrd::0/internal]: :0/internal: Received TERMINATE op in state UP: Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Connecting to ipv4#172.17.0.1:9092 (plaintext) with socket 47
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Broker changed state INIT -> CONNECT
Kafka [rdkafka#consumer-1][TERMINATE-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Handle is terminating: failed 0 request(s) in retry+outbuf
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: failed: err: Local: Broker handle destroyed: (errno: Operation now in progress)
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Broker changed state CONNECT -> DOWN
Kafka [rdkafka#consumer-1][TERMINATE-7]: [thrd::0/internal]: :0/internal: Handle is terminating: failed 0 request(s) in retry+outbuf
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd::0/internal]: :0/internal: failed: err: Local: Broker handle destroyed: (errno: Success)
Kafka [rdkafka#consumer-1][STATE-7]: [thrd::0/internal]: :0/internal: Broker changed state UP -> DOWN
Kafka [rdkafka#consumer-1][TERMINATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Handle is terminating: failed 0 request(s) in retry+outbuf
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: failed: err: Local: Broker handle destroyed: (errno: Resource temporarily unavailable)
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state UP -> DOWN
22975.311799: _init_entry - after kafka_util_destroy_broker_check_handle
22975.339612: _init_entry - after outer check while loop
22975.339632: _init_entry - after rcu_unregister_thread
$ (end of test program)

Not sure what this log output is telling me - I'd appreciate any more tips on how I can determine what's going on here...

Thanks!
John

@jcalcote
Copy link
Author

jcalcote commented Jan 5, 2018

Hi Magnus,

The APIVERSION_QUERY log message gave me a clue - brought to mind something you wrote in one of your articles about how the API version query functionality was broken below a certain version of the broker (I think). I tried my test program with version 2.12-0.11.0.0 and the 12-second delay disappeared.

Is this a bug in librdkafka related to how it talks to broker version 2.10-0.9.0.1 or is it just that that version of the broker can't handle a version query properly?

Here's the log output with broker version 2.12-0.11.0.0:

$ (start test program)
25073.172916: _seed_random - start
25073.173004: _seed_random - end
25073.173115: _init_entry - start
25073.173125: _init_entry - after rcu_init()
25073.173139: _init_entry - after rcu_register_thread()
Kafka [rdkafka#consumer-1][BROKER-7]: [thrd:app]: localhost:9092/bootstrap: Added new broker with NodeId -1
25073.179222: _init_entry - after kafka_util_create_broker_check_handle
Kafka [rdkafka#consumer-1][BRKMAIN-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Enter main broker thread
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: broker in state INIT connecting
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connecting to ipv4#127.0.0.1:9092 (plaintext) with socket 44
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state INIT -> CONNECT
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connected to ipv4#127.0.0.1:9092
Kafka [rdkafka#consumer-1][CONNECTED-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connected (#1)
Kafka [rdkafka#consumer-1][FEATURE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Updated enabled protocol features +ApiVersion to ApiVersion
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state CONNECT -> APIVERSION_QUERY
Kafka [rdkafka#consumer-1][SEND-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Sent ApiVersionRequest (v0, 25 bytes @ 0, CorrId 1)
Kafka [rdkafka#consumer-1][BRKMAIN-7]: [thrd::0/internal]: :0/internal: Enter main broker thread
Kafka [rdkafka#consumer-1][STATE-7]: [thrd::0/internal]: :0/internal: Broker changed state INIT -> UP
Kafka [rdkafka#consumer-1][RECV-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Received ApiVersionResponse (v0, 210 bytes, CorrId 1, rtt 3.22ms)
Kafka [rdkafka#consumer-1][FEATURE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Updated enabled protocol features to MsgVer1,ApiVersion,BrokerBalancedConsumer,ThrottleTime,Sasl,SaslHandshake,BrokerGroupCoordinator,LZ4,OffsetTime,MsgVer2
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state APIVERSION_QUERY -> UP
Kafka [rdkafka#consumer-1][SEND-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Sent MetadataRequest (v2, 25 bytes @ 0, CorrId 2)
Kafka [rdkafka#consumer-1][SEND-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Sent MetadataRequest (v2, 25 bytes @ 0, CorrId 3)
Kafka [rdkafka#consumer-1][RECV-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Received MetadataResponse (v2, 58 bytes, CorrId 2, rtt 3.72ms)
Kafka [rdkafka#consumer-1][BROKER-7]: [thrd:main]: jmc-c7dev2:9092/0: Added new broker with NodeId 0
Kafka [rdkafka#consumer-1][CLUSTERID-7]: [thrd:main]: localhost:9092/bootstrap: ClusterId update "" -> "4L6dRvehQ-ygQTCpldjJKA"
Kafka [rdkafka#consumer-1][BRKMAIN-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Enter main broker thread
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: broker in state INIT connecting
Kafka [rdkafka#consumer-1][RECV-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Received MetadataResponse (v2, 1485 bytes, CorrId 3, rtt 14.38ms)
25073.198186: _init_entry - after kafka_util_is_broker_running
Kafka [rdkafka#consumer-1][DESTROY-7]: [thrd:app]: Terminating instance
Kafka [rdkafka#consumer-1][DESTROY-7]: [thrd:main]: Destroy internal
Kafka [rdkafka#consumer-1][DESTROY-7]: [thrd:main]: Removing all topics
Kafka [rdkafka#consumer-1][TERM-7]: [thrd::0/internal]: :0/internal: Received TERMINATE op in state UP: 1 refcnts, 0 toppar(s), 0 fetch toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs
Kafka [rdkafka#consumer-1][CONNECT-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Connecting to ipv6#[fe80::a4b6:4c6b:5a54:f6c4%enp0s3]:9092 (plaintext) with socket 47
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Broker changed state INIT -> CONNECT
Kafka [rdkafka#consumer-1][TERMINATE-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Handle is terminating: failed 0 request(s) in retry+outbuf
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: failed: err: Local: Broker handle destroyed: (errno: Operation now in progress)
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:jmc-c7dev2:9092/0]: jmc-c7dev2:9092/0: Broker changed state CONNECT -> DOWN
Kafka [rdkafka#consumer-1][TERMINATE-7]: [thrd::0/internal]: :0/internal: Handle is terminating: failed 0 request(s) in retry+outbuf
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd::0/internal]: :0/internal: failed: err: Local: Broker handle destroyed: (errno: Success)
Kafka [rdkafka#consumer-1][STATE-7]: [thrd::0/internal]: :0/internal: Broker changed state UP -> DOWN
Kafka [rdkafka#consumer-1][TERMINATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Handle is terminating: failed 0 request(s) in retry+outbuf
Kafka [rdkafka#consumer-1][BROKERFAIL-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: failed: err: Local: Broker handle destroyed: (errno: Resource temporarily unavailable)
Kafka [rdkafka#consumer-1][STATE-7]: [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Broker changed state UP -> DOWN
25073.299866: _init_entry - after kafka_util_destroy_broker_check_handle
25073.302983: _init_entry - after outer check while loop
25073.325244: _init_entry - after rcu_unregister_thread
$ (end test program) 

We are updating our broker sometime in the near future, but this code has to work with the 0.9.0.1 broker in the meantime. Disable API version check, perhaps?

Thanks!
John

@jcalcote
Copy link
Author

jcalcote commented Jan 5, 2018

Hi Magnus,

Don't you just love it when your customer figures out his own problem? :)

I followed the instructions at: https://github.com/edenhill/librdkafka/wiki/Broker-version-compatibility and my 12 second delay went away.

You can close this one.

Thanks for the great feedback!
John

@edenhill
Copy link
Contributor

edenhill commented Jan 5, 2018

Glad to be of service ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants