poseidon frequently sends to wrong broker #36

bpot · 2014-06-04T01:41:27Z

I added some debug logging to this fork https://github.com/sclasen/poseidon which also contains some logic to actually read the ProduceResponse to determine if it was successful. Sending to a topic with 32 partitions and replication factor 3.

poseidon logs
KAFKADEBUG LEADER FOR 23 is 3867 
KAFKADEBUG api.3865.1 sending messages for broker 3867
KAFKADEBUG PRODUCE CORRELATION 1 from CLIENT api.3865.1 TO <ip:port of 3867>

kafka logs
Produce request with correlation id 1 from client api.3865.1 
on partition [<topic>,23] failed 
due to Leader not local for partition [<topic>,23] on broker 3867 (kafka.server.KafkaApis)

So poseidon is somehow getting confused about the leading broker.

jorgeortiz85 · 2014-05-30T08:37:13Z

I also ran into this issue.

The bug is at https://github.com/bpot/poseidon/blob/master/lib/poseidon/message_conductor.rb#L34, which assumes that partitions come over the wire in partition_id order. In fact this is not the case.

E.g.,

[58] pry(#<Poseidon::SyncProducer>):3> @cluster_metadata.topic_metadata["test.jorge"].partitions
=> [#<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=5,
  leader=3,
  replicas=[3, 2, 1],
  isr=[3, 2, 1]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=7,
  leader=2,
  replicas=[2, 3, 1],
  isr=[2, 3, 1]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=3,
  leader=1,
  replicas=[1, 3, 2],
  isr=[1, 3, 2]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=4,
  leader=2,
  replicas=[2, 1, 3],
  isr=[2, 1, 3]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=1,
  leader=2,
  replicas=[2, 3, 1],
  isr=[2, 3, 1]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=6,
  leader=1,
  replicas=[1, 2, 3],
  isr=[1, 2, 3]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=2,
  leader=3,
  replicas=[3, 1, 2],
  isr=[3, 1, 2]>,
 #<struct Poseidon::Protocol::PartitionMetadata
  error=0,
  id=0,
  leader=1,
  replicas=[1, 2, 3],
  isr=[1, 2, 3]>]

Notice the id fields are [5,7,3,4,1,6,2,0].

PartitionMetadata should be kept in a Hash by id, rather than an array that assumes id order.

bpot · 2014-06-04T01:07:58Z

Nice catch guys! Working on a fix.

When finding a leader for a partition we were using the partition_id as the index of the partition in the metdata array instead of finding the partition in the array with the correct id.

coveralls · 2014-06-04T01:48:25Z

Coverage increased (+0.12%) when pulling 02a8613 on bp/wrong_broker into b4f1a69 on master.

coveralls · 2014-06-04T01:54:35Z

Coverage increased (+15.08%) when pulling 8d93ea5 on bp/wrong_broker into b4f1a69 on master.

poseidon frequently sends to wrong broker

bpot · 2014-06-04T02:17:14Z

@sclasen @jorgeortiz85 this should be fixed in master now. I will try and get a new release out soon.

sclasen · 2014-06-04T02:41:12Z

@bpot @jorgeortiz85 fwiw I dont think posiedon is actually reading the produce response at any point (that I can tell) so how would one know if there were other issues with sending messages?

Needs something along the lines of sclasen@cd891dd

so that when you send with acks != 0 , you can actually be sure the message was delivered.

Fix issue causing us to send messages to the wrong partition [GH-36]

02a8613

When finding a leader for a partition we were using the partition_id as the index of the partition in the metdata array instead of finding the partition in the array with the correct id.

Specify a more strict rspec dependency

8d93ea5

bpot added a commit that referenced this pull request Jun 4, 2014

Merge pull request #36 from bpot/bp/wrong_broker

a6ad44e

poseidon frequently sends to wrong broker

bpot merged commit a6ad44e into master Jun 4, 2014

bpot deleted the bp/wrong_broker branch June 4, 2014 02:16

sclasen mentioned this pull request Jun 4, 2014

check produce responses for errors #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

poseidon frequently sends to wrong broker #36

poseidon frequently sends to wrong broker #36

Uh oh!

bpot commented Jun 4, 2014

Uh oh!

jorgeortiz85 commented May 30, 2014

Uh oh!

bpot commented Jun 4, 2014

Uh oh!

coveralls commented Jun 4, 2014

Uh oh!

coveralls commented Jun 4, 2014

Uh oh!

bpot commented Jun 4, 2014

Uh oh!

sclasen commented Jun 4, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

poseidon frequently sends to wrong broker #36

poseidon frequently sends to wrong broker #36

Uh oh!

Conversation

bpot commented Jun 4, 2014

Uh oh!

jorgeortiz85 commented May 30, 2014

Uh oh!

bpot commented Jun 4, 2014

Uh oh!

coveralls commented Jun 4, 2014

Uh oh!

coveralls commented Jun 4, 2014

Uh oh!

bpot commented Jun 4, 2014

Uh oh!

sclasen commented Jun 4, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants