Skip to content

Conversation

@bpot
Copy link
Owner

@bpot bpot commented Jun 4, 2014

I added some debug logging to this fork https://github.com/sclasen/poseidon which also contains some logic to actually read the ProduceResponse to determine if it was successful. Sending to a topic with 32 partitions and replication factor 3.

poseidon logs
KAFKADEBUG LEADER FOR 23 is 3867 
KAFKADEBUG api.3865.1 sending messages for broker 3867
KAFKADEBUG PRODUCE CORRELATION 1 from CLIENT api.3865.1 TO <ip:port of 3867>
kafka logs
Produce request with correlation id 1 from client api.3865.1 
on partition [<topic>,23] failed 
due to Leader not local for partition [<topic>,23] on broker 3867 (kafka.server.KafkaApis)

So poseidon is somehow getting confused about the leading broker.

@jorgeortiz85
Copy link

I also ran into this issue.

The bug is at https://github.com/bpot/poseidon/blob/master/lib/poseidon/message_conductor.rb#L34, which assumes that partitions come over the wire in partition_id order. In fact this is not the case.

E.g.,

[58] pry(#<Poseidon::SyncProducer>):3> @cluster_metadata.topic_metadata["test.jorge"].partitions
=> [#<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=5,
  leader=3,
  replicas=[3, 2, 1],
  isr=[3, 2, 1]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=7,
  leader=2,
  replicas=[2, 3, 1],
  isr=[2, 3, 1]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=3,
  leader=1,
  replicas=[1, 3, 2],
  isr=[1, 3, 2]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=4,
  leader=2,
  replicas=[2, 1, 3],
  isr=[2, 1, 3]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=1,
  leader=2,
  replicas=[2, 3, 1],
  isr=[2, 3, 1]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=6,
  leader=1,
  replicas=[1, 2, 3],
  isr=[1, 2, 3]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=2,
  leader=3,
  replicas=[3, 1, 2],
  isr=[3, 1, 2]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=0,
  leader=1,
  replicas=[1, 2, 3],
  isr=[1, 2, 3]>]

Notice the id fields are [5,7,3,4,1,6,2,0].

PartitionMetadata should be kept in a Hash by id, rather than an array that assumes id order.

@bpot
Copy link
Owner

bpot commented Jun 4, 2014

Nice catch guys! Working on a fix.

When finding a leader for a partition we were using the
partition_id as the index of the partition in the metdata array
instead of finding the partition in the array with the correct id.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.12%) when pulling 02a8613 on bp/wrong_broker into b4f1a69 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+15.08%) when pulling 8d93ea5 on bp/wrong_broker into b4f1a69 on master.

bpot added a commit that referenced this pull request Jun 4, 2014
poseidon frequently sends to wrong broker
@bpot bpot merged commit a6ad44e into master Jun 4, 2014
@bpot bpot deleted the bp/wrong_broker branch June 4, 2014 02:16
@bpot
Copy link
Owner

bpot commented Jun 4, 2014

@sclasen @jorgeortiz85 this should be fixed in master now. I will try and get a new release out soon.

@sclasen
Copy link
Contributor Author

sclasen commented Jun 4, 2014

@bpot @jorgeortiz85 fwiw I dont think posiedon is actually reading the produce response at any point (that I can tell) so how would one know if there were other issues with sending messages?

Needs something along the lines of sclasen@cd891dd

so that when you send with acks != 0 , you can actually be sure the message was delivered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants