Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between isr info from librdkafka vs kafka-list-topic.sh #147

Closed
winbatch opened this issue Sep 10, 2014 · 15 comments
Closed

Difference between isr info from librdkafka vs kafka-list-topic.sh #147

winbatch opened this issue Sep 10, 2014 · 15 comments

Comments

@winbatch
Copy link

Hi, we've been noticing that we're getting different results between the 2 tools. The difference always seems to be that rdkafka shows fewer in sync than the kafka provided tool (never the reverse). One difference is that rdkafka goes to the brokers for this while the kafka provided tool goes to zookeeper. Could this indicate some sort of mismatch between the 2?

kafka-list-topic.sh --zookeeper ${ZOOKEEPER_CLUSTER}:${ZOOKEEPER_PORT}

topic: ESM_METRICS_NETWORK partition: 0 leader: 1052189105 replicas: 1052189116,1052189105,1052189106 isr: 1052189105,1052189106,1052189116

topic: ESM_METRICS_NETWORK partition: 1 leader: 1052189106 replicas: 1052189117,1052189106,1052189116 isr: 1052189106,1052189117,1052189116

topic: ESM_METRICS_NETWORK partition: 2 leader: 1052189117 replicas: 1052189104,1052189116,1052189117 isr: 1052189117,1052189104,1052189116

topic: ESM_METRICS_NETWORK partition: 3 leader: 1052189105 replicas: 1052189105,1052189117,1052189104 isr: 1052189105,1052189104,1052189117

topic: ESM_METRICS_NETWORK partition: 4 leader: 1052189106 replicas: 1052189106,1052189104,1052189105 isr: 1052189106,1052189104,1052189105

Magnus tool:

rdkafka_example -L -b kafkalog1.messaging.nimbus.masked.com:5757 -t ESM_METRICS_NETWORK

Metadata for ESM_METRICS_NETWORK (from broker -1: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap):

5 brokers:

broker 1052189104 at d146537-023.dc.masked.com:5757

broker 1052189117 at d146537-025.dc.masked.com:5757

broker 1052189105 at d146537-021.dc.masked.com:5757

broker 1052189116 at d146537-024.dc.masked.com:5757

broker 1052189106 at d146537-022.dc.masked.com:5757

1 topics:

topic "ESM_METRICS_NETWORK" with 5 partitions:

partition 0, leader 1052189105, replicas: 1052189116,1052189105,1052189106, isrs: 1052189105,1052189106

partition 1, leader 1052189106, replicas: 1052189117,1052189106,1052189116, isrs: 1052189106

partition 2, leader 1052189117, replicas: 1052189104,1052189116,1052189117, isrs: 1052189117,1052189104

partition 3, leader 1052189105, replicas: 1052189105,1052189117,1052189104, isrs: 1052189105,1052189117

partition 4, leader 1052189106, replicas: 1052189106,1052189104,1052189105, isrs: 1052189106,1052189105
@edenhill
Copy link
Contributor

could you try my tool with -d metadata as well just to make sure its not a printout problem?

@winbatch
Copy link
Author

./rdkafka_example -L -b kafkalog1.messaging.nimbus.masked.com:5757  -t ESM_METRICS_NETWORK -d metadata

1410364708.554 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: Request metadata for ESM_METRICS_NETWORK: application requested

1410364708.554 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: Request metadata: scheduled: not in broker thread

1410364708.588 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: Request metadata for all topics: connected

1410364708.588 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: Requesting metadata for all topics

1410364708.588 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: Request metadata for ESM_METRICS_NETWORK: application requested

1410364708.588 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: Requesting metadata for known topics

1410364709.529 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: ===== Received metadata from kafkalog1.messaging.nimbus.masked.com:5757/bootstrap =====

1410364709.529 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap: 5 brokers, 1 topics

1410364709.529 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Broker #0/5: d146537-023.dc.masked.com:5757 NodeId 1052189104

1410364709.531 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Broker #1/5: d146537-025.dc.masked.com:5757 NodeId 1052189117

1410364709.531 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Broker #2/5: d146537-021.dc.masked.com:5757 NodeId 1052189105

1410364709.531 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Broker #3/5: d146537-024.dc.masked.com:5757 NodeId 1052189116

1410364709.531 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Broker #4/5: d146537-022.dc.masked.com:5757 NodeId 1052189106

1410364709.533 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Topic #0/1: ESM_METRICS_NETWORK with 5 partitions

1410364709.535 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Topic ESM_METRICS_NETWORK partition 0 Leader 1052189105

1410364709.535 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Topic ESM_METRICS_NETWORK partition 1 Leader 1052189106

1410364709.535 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Topic ESM_METRICS_NETWORK partition 2 Leader 1052189117

1410364709.535 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Topic ESM_METRICS_NETWORK partition 3 Leader 1052189105

1410364709.535 RDKAFKA-7-METADATA: rdkafka#producer-0: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap:   Topic ESM_METRICS_NETWORK partition 4 Leader 1052189106

Metadata for ESM_METRICS_NETWORK (from broker -1: kafkalog1.messaging.nimbus.masked.com:5757/bootstrap):

5 brokers:

  broker 1052189104 at d146537-023.dc.masked.com:5757

  broker 1052189117 at d146537-025.dc.masked.com:5757

  broker 1052189105 at d146537-021.dc.masked.com:5757

  broker 1052189116 at d146537-024.dc.masked.com:5757

  broker 1052189106 at d146537-022.dc.masked.com:5757

1 topics:

  topic "ESM_METRICS_NETWORK" with 5 partitions:

    partition 0, leader 1052189105, replicas: 1052189116,1052189105,1052189106, isrs: 1052189105,1052189106

    partition 1, leader 1052189106, replicas: 1052189117,1052189106,1052189116, isrs: 1052189106

    partition 2, leader 1052189117, replicas: 1052189104,1052189116,1052189117, isrs: 1052189117,1052189104

    partition 3, leader 1052189105, replicas: 1052189105,1052189117,1052189104, isrs: 1052189105,1052189117

    partition 4, leader 1052189106, replicas: 1052189106,1052189104,1052189105, isrs: 1052189106,1052189105

@edenhill
Copy link
Contributor

Oh, it doesnt output ISRs, nevermind.

So either the cluster <-> ZK state is desynced, or there is a handling error in rdkafka.
Would be good to verify with another tool that queries metadata directly from brokers, dont know which one though.
An alternative would be to tcpdump the packet and decode the broker reply by hand.

@winbatch
Copy link
Author

Do you know if the broker asks Zk for the info or has its own cache?

How do you re sync? Bounce the brokers to recache from Zk?

@edenhill
Copy link
Contributor

1: dont know, but probably cached
2: dont know

@winbatch
Copy link
Author

Oh no! I've stumped the master! Now what am I to do?? :(

@edenhill
Copy link
Contributor

Do the tcpdump dance!

tshark -x -i eth0 -f 'tcp port 9092

@edenhill
Copy link
Contributor

This was reported from another user (who provided a tcpdump!).
The broker is indeed reporting another ISR set than Zk does, so a problem with broker <-> Zk sync.

@winbatch
Copy link
Author

Was that user able to figure out how to re sync?

On Thursday, September 18, 2014, Magnus Edenhill notifications@github.com
wrote:

This was reported from another user (who provided a tcpdump!).
The broker is indeed reporting another ISR set than Zk does, so a problem
with broker <-> Zk sync.


Reply to this email directly or view it on GitHub
#147 (comment).

@edenhill
Copy link
Contributor

Not yet, working on root-causing it.
Seems we're not the first ones either:
http://grokbase.com/t/kafka/users/13a3vxdte9/isr-differs-between-kafka-metadata-and-zookeeper

@ottomata
Copy link
Contributor

@ottomata
Copy link
Contributor

I noticed that in my case, only 1 of the 4 brokers was ever missing in the ISRs reported by Kafka Brokers (via librdkafka). That JIRA indicated that a preferred-replica-election should fix the problem. I did this:

controlled-shutdown of offending broker 21. Then actual shutdown of broker 21. Once this was done, librdkafka metadata showed the correct ISRs, since this offending broker really was not in any ISRs. I then restarted broker 21 and let its replicas catch back up. Once it caught up, zookeeper reported that all ISRs were in sync. I then checked librdkafka's metadata, and broker 21 was not listed in any ISR. I then ran a preferred-replica-election. broker 21 was then promoted to leader of some partitions. librdkafka then only showed broker 21 being in the ISRs for which it was also the leader. Any partition that has a replica on broker 21 does not show up in the ISR unless broker 21 is the leader.

$ kafkacat -L -b analytics1022.eqiad.wmnet  -t webrequest_upload
Metadata for webrequest_upload (from broker -1: analytics1022.eqiad.wmnet:9092/bootstrap):
 4 brokers:
  broker 12 at analytics1012.eqiad.wmnet:9092
  broker 21 at analytics1021.eqiad.wmnet:9092
  broker 22 at analytics1022.eqiad.wmnet:9092
  broker 18 at analytics1018.eqiad.wmnet:9092
 1 topics:
  topic "webrequest_upload" with 12 partitions:
    partition 11, leader 12, replicas: 12,21,22, isrs: 12,22
    partition 5, leader 21, replicas: 21,22,12, isrs: 22,12,21
    partition 10, leader 22, replicas: 22,18,21, isrs: 18,22
    partition 7, leader 12, replicas: 12,18,21, isrs: 12,18
    partition 8, leader 18, replicas: 18,22,12, isrs: 12,18,22
    partition 3, leader 12, replicas: 12,22,18, isrs: 12,18,22
    partition 4, leader 18, replicas: 18,21,22, isrs: 18,22
    partition 1, leader 21, replicas: 21,18,22, isrs: 18,22,21
    partition 6, leader 22, replicas: 22,12,18, isrs: 12,18,22
    partition 2, leader 22, replicas: 22,21,12, isrs: 12,22
    partition 9, leader 21, replicas: 21,12,18, isrs: 12,18,21
    partition 0, leader 18, replicas: 18,12,21, isrs: 12,18

@ottomata
Copy link
Contributor

@edenhill or @winbatch, want to comment on the most recent post here?
https://issues.apache.org/jira/browse/KAFKA-1367

Specifically:
"Currently, the ISR part in a metadata response is not really used by the clients. Do you have a usage for this?"

@edenhill
Copy link
Contributor

Done!

@edenhill
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants