Problem with producing messages to topic with unavailable partitions #213

wkuranowski · 2014-11-27T14:26:23Z

I have a cluster with 5 brokers and topic with replication factor 2 and 10 partitions.

When I shutdown 2 or 3 brokers, producer has problem to use available partitions only (I am using Round Robin partitioner). I am unable to continuously send new messages to Input() channel. There are only short windows when this is possible (in attached log look for producing string).

[Sarama] 2014/11/27 14:48:43 producer/aggregator hit maximum request size, forcing blocking flush
[Sarama] 2014/11/27 14:48:44 producer/aggregator hit maximum request size, forcing blocking flush
[Sarama] 2014/11/27 14:48:44 producer/leader state change to [retrying] on adstream2/9
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:44 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:44 producer/aggregator hit maximum request size, forcing blocking flush
[Sarama] 2014/11/27 14:48:44 producer/leader state change to [retrying] on adstream2/5
[Sarama] 2014/11/27 14:48:44 producer/leader state change to [retrying] on adstream2/0
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20873cb40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2119f3c80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:44 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20aebe540 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee8400 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20ff11ec0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20b25de40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:44 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:44 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20bee8b80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee8b40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:45 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee9480 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20bee8bc0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:45 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:45 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:46 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:46 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:46 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:46 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20bee9b00 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:46 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (7 retries remaining)
[Sarama] 2014/11/27 14:48:46 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:46 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:46 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:46 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:46 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:46 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee9b40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:46 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:46 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (7 retries remaining)
[Sarama] 2014/11/27 14:48:46 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:46 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:46 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (7 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (6 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc209194800 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:47 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (6 retries remaining)
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (6 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (5 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc209194300 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20bee94c0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc209195340 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:47 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:47 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:47 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (5 retries remaining)
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (5 retries remaining)
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:48 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (4 retries remaining)
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:48 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:48 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:48 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:48 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:48 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:48 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919c080 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:48 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919c100 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:48 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:48 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:48 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:48 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:48 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:48 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:48 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (4 retries remaining)
[Sarama] 2014/11/27 14:48:48 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (4 retries remaining)
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919cd40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919cd80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:49 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (3 retries remaining)
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919d640 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919d680 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:49 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:49 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (3 retries remaining)
[Sarama] 2014/11/27 14:48:49 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (3 retries remaining)
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919dc80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (2 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a4480 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (2 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (2 retries remaining)
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a4980 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (1 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20919c0c0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919dcc0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a5140 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:50 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (1 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (1 retries remaining)
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Agent] 2014/11/27 14:48:51 Errors() output: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a5840 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6ea400 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc2091a58c0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc2091a5880 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:51 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Registered new broker #3 at localhost:9094
[Agent] 2014/11/27 14:48:51 Errors() output: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20d6eae40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6eae00 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Agent] 2014/11/27 14:48:51 Errors() output: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20d6eb180 {0 0} } : kafka: broker: not connected
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6eb600 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:51 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:52 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:52 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:52 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6f8180 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:52 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:52 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:52 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:52 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:52 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:52 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6f8640 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:52 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:52 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:52 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:53 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:53 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:53 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:53 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:53 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:53 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:53 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6f8b40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:53 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:53 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:53 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:53 dial tcp 127.0.0.1:9096: connection refused

The text was updated successfully, but these errors were encountered:

eapache · 2014-11-28T18:09:32Z

It should recover eventually and start sending messages only on the available partitions. It would recover faster if you reduce your config.MetadataRetries and config.WaitForElection somewhat in the client - 10x750ms seems very high, and will force the producer to wait at least 7.5 seconds per failed message before it can start producing again. The defaults of 3x250ms are fine for us normally.

What happens if you reduce the timeouts, or just let it run for longer?

wkuranowski · 2014-12-01T16:58:13Z

I have about 100k - 200k messages per second to my cluster. Even setting 3x250ms for retries doesn't seem to be good. I hope that you can just mark partitions as unavailable and fail all messages assigned to them or just reassign messages to available partitions.

Now I see that you are probably doing this in #215. Tell me when it's ready to test :)

eapache · 2014-12-01T22:00:39Z

You can try the leader-breaker branch now: https://github.com/Shopify/sarama/tree/leader-breaker

#215 is the first part, which tweaks behaviour so we stop sending new messages to unavailable partitions.

The leader-breaker branch is additional work on top to fail messages quickly when a partition goes down. Once #215 is merged it will have to be rebased on master and then properly vetted. Right now it hardcodes failing-fast for 10 seconds in between retries though I'm open to making that configurable if necessary.

P.S. You can also play with ChannelBufferSize in the producer config, setting it to a large value (perhaps 2048) may or may not help, I'm not sure. In a benchmark scenario (generating messages as fast as possible) it will probably just fill up and not make a difference, but if you're pushing real messages then it might give you some wiggle room.

wkuranowski · 2014-12-09T13:43:01Z

I test current leader-breaker branch. It looks much better. But I think that there are still some issues. I get a lot of circuit breaker is open errors. It looks like messages are still sent to unavailable partitions. Is this expected behaviour for this branch? I am using round robin partitioner.

eapache · 2014-12-09T14:30:25Z

It should stop producing to a partition once Kafka returns LeaderNotAvailable for that partition. If your cluster isn't picking up that the leader is gone for some reason then it would keep trying. I'm not sure why that would be.

wkuranowski · 2014-12-09T14:44:29Z

I get 'kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.' so it looks like Kafka returns LeaderNotAvailable for that partition. And I see retries every 10 seconds.

I look at RoundRobinPartitioner code and I don't know if I understand it correctly. You pick partition number from 0 to numPartitions-1. What if partition number 2 (of 10) is not available?

wkuranowski · 2014-12-09T15:13:53Z

Ok, I see that you select partition from WritablePartitions for round robin partitioner, so it seems good... I will try to do some more testing...

wkuranowski · 2014-12-09T16:25:02Z

It's strange... I have 10 partitions. When I shut down 2 brokers, partition 4 and 8 become unavailable. But in error channel I see circuit breaker is open for messages with assigned partition = 9. But partition 9 is available...

@eapache do you have any idea?

eapache · 2014-12-09T16:38:45Z

I pushed a potential fix (f1a6b99) to the leader-breaker branch. I suspect what was happening was that partition 9, while still available, was changing leader (i.e. one of the two brokers you shut down was the old leader of 9). When this happened it was refreshing its metadata, and because of a case I missed the error for the other partitions was trickling up, causing its circuit to open.

wkuranowski · 2014-12-09T16:52:12Z

Some more info. Here is my topic description:

Topic: adstream2        Partition: 0    Leader: 3       Replicas: 3,2   Isr: 3,2
Topic: adstream2        Partition: 1    Leader: 4       Replicas: 4,3   Isr: 3,4
Topic: adstream2        Partition: 2    Leader: 5       Replicas: 5,4   Isr: 4,5
Topic: adstream2        Partition: 3    Leader: 1       Replicas: 1,5   Isr: 1,5
Topic: adstream2        Partition: 4    Leader: 2       Replicas: 2,1   Isr: 2,1
Topic: adstream2        Partition: 5    Leader: 3       Replicas: 3,4   Isr: 3,4
Topic: adstream2        Partition: 6    Leader: 4       Replicas: 4,5   Isr: 4,5
Topic: adstream2        Partition: 7    Leader: 5       Replicas: 5,1   Isr: 1,5
Topic: adstream2        Partition: 8    Leader: 1       Replicas: 1,2   Isr: 2,1
Topic: adstream2        Partition: 9    Leader: 2       Replicas: 2,3   Isr: 3,2

When I kill broker 1 and 2, I lose partition 4 and 8 and get circuit breaker is open for messages with assigned partition 9.
When I kill broker 1 and 5, I lose partition 3 and 7 and get circuit breaker is open for messages with assigned partition 2.

Lost partitions (4, 8 or 3, 7) are correctly recognized and are not on the WriteablePartitions list. Kafka topic description also correctly marks unavailable partitions - Leader: -1.

wkuranowski · 2014-12-09T17:15:31Z

Ok, I can confirm that f1a6b99 fixes the problem. Almost... because when I kill a broker which is also a controller in a cluster, there are still some problems. If controller was our last replica for a partition, then this partition is not recognized as unavailable. This could be a Kafka bug, because description for that partition still contains that broker as available broker, even after several minutes.

I am going to test this with Kafka 0.8.2-beta...

wkuranowski · 2014-12-09T17:37:14Z

Kafka 0.8.2-beta looks good. Killing a controller is handled properly there. So I must say that everything now works really nice. Thanks! :)

I have a question. I want to resend failed messages from error channel. I have a separate goroutine to read from that channel. Is it safe to just construct a new MessageToSend and send to Input() channel? What about deadlocks?

- As I discovered while investigating additional issues from #213, I made a stupid in 1b465e7 and kept returning an error in the case where we ran out of leader-election retries, despite *explicitly stating in the commit message* that the whole point was to stop doing that. - Simplify and reorganize flow control in several places. - Clean up and improve log messages.

eapache · 2014-12-09T17:45:01Z

Is it safe to just construct a new MessageToSend and send to Input() channel? What about deadlocks?

you should be able to send the very same MessageToSend back to the input channel, no need to construct a new one
you will have to maintain a local buffer and use a select to avoid deadlocks, something like:

select {
case err := <-Errors():
    buffer = append(buffer, err.Msg)
case Input() <- buffer[0]:
    buffer = buffer[1:] // pop message from buffer
}

(edit: be sure to handle the case when the buffer is empty)

eapache added bug producer labels Nov 27, 2014

eapache mentioned this issue Nov 28, 2014

Consistent partitioning #215

Merged

eapache mentioned this issue Dec 8, 2014

Leader breaker #218

Merged

eapache mentioned this issue Dec 9, 2014

Clean up client's refreshMetadata function #223

Merged

eapache closed this as completed in #218 Dec 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with producing messages to topic with unavailable partitions #213

Problem with producing messages to topic with unavailable partitions #213

wkuranowski commented Nov 27, 2014

eapache commented Nov 28, 2014

wkuranowski commented Dec 1, 2014

eapache commented Dec 1, 2014

wkuranowski commented Dec 9, 2014

eapache commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

eapache commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

eapache commented Dec 9, 2014

Problem with producing messages to topic with unavailable partitions #213

Problem with producing messages to topic with unavailable partitions #213

Comments

wkuranowski commented Nov 27, 2014

eapache commented Nov 28, 2014

wkuranowski commented Dec 1, 2014

eapache commented Dec 1, 2014

wkuranowski commented Dec 9, 2014

eapache commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

eapache commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

wkuranowski commented Dec 9, 2014

eapache commented Dec 9, 2014