You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue is seen when a node is brought down in a 3-node cluster wherein one of the topic partitions had an ISR count of 1 against a min-ISR setting of 2. The Kafka producer client is configured to report errors using error_cb callback setting. It is observed that the error code is reported as _MSG_TIMED_OUT instead of a more intuitive _ISR_INSUFF leading me to believe that a message was being sent to the node that was down.
How to reproduce
Bring down a node in a minimal 3 node cluster and to a state where the ISR count for one/more topic partitions is less than the minimum configured ISR setting for the topic/broker.
Create a producer with acks=all setting
Try to produce messages to a topic-partition with the min-ISR insufficiency
Checklist
Please provide the following information:
confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):
The error _ISR_INSUFF is internally retried in the producer to produce the messages till message.timeout.ms is reached after which _MSG_TIMED_OUT error is thrown. This is the reason, _MSG_TIMED_OUT is thrown and not _ISR_INSUFF. This is by design like this.
@pranavrth Thanks for your explanation. Is there any way the _ISR_INSUFF code can take precedence over timeout? What's happening is that the real issue then escapes attention.
Description
The issue is seen when a node is brought down in a 3-node cluster wherein one of the topic partitions had an ISR count of 1 against a min-ISR setting of 2. The Kafka producer client is configured to report errors using
error_cb
callback setting. It is observed that the error code is reported as_MSG_TIMED_OUT
instead of a more intuitive_ISR_INSUFF
leading me to believe that a message was being sent to the node that was down.How to reproduce
acks=all
settingChecklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):Apache Kafka broker version:
3.6.0
Client configuration:
{ # Error reporting 'error_cb': self._error_notification, #'debug': 'broker,topic,msg', 'debug': 'broker,topic,msg', # Use compression 'compression.codec': 'lz4', # Set acks for reliability 'acks': 'all', 'max.in.flight.requests.per.connection': 1, # Configuration settings for HA 'topic.metadata.refresh.interval.ms': 120000 }
Operating system:
Debian GNU/Linux 11 (bullseye)
Provide client logs (with
'debug': '..'
as necessary)2024-02-18 09:14:18,997 [PID=1:TID=Thread-6482:cfx.rda_messaging.kafka_client:_error_notification:788] ERROR - KafkaError{code=_TRANSPORT,val=-195,str="sasl_ssl://rda-kafka-controller-2.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/2: Connection setup timed out in state CONNECT (after 30025ms in state CONNECT, 1 identical error(s) suppressed)"}
2024-02-18 09:14:18,998 [PID=1:TID=Thread-6482:cfx.rda_messaging.kafka_client:_delivery_notification:759] ERROR - Message delivery failed: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}! Message=b'{"type": "SOURCE-EVENT", "sourceEventId": "f2b4d3de-46a6-4c7c-9243-f57c2489e18b", "id": "f2b4d3de-46a6-4c7c-9243-f57c2489e18b", "sourceSystemId": "56795ec7-f19b-429b-913b-b69bdc4a6baf", "sourceSystemName": "Kafka", "projectId": "337ddf08-cc96-11ee-8d32-0a26cc0b6dec", "eventCategory": "alerts", "customerId": "ac0c16424518460c8c0ef0f632515731", "createdat": 1708244619015.706, "sourceReceivedAt": 1708244619015.6702, "parentEventId": null, "payload": "{\"customerId\": \"58c5fe78-ca58-46a8-8db9-a172fec7d2bc\", \"assetId\": \"021456a7-b56a-4e0b-93a0-017f60a5d75f\", \"componentId\": \"f21056f7-cd02-43f6-b44c-c2fe2bcd12d1\", \"alertCategory\": \"Disk Partition\", \"alertType\": \"Disk Partition\", \"sourceSystemId\": \"CFX_PULSE\", \"projectId\": \"814ab092-83c3-4d0c-8805-063936ba3666\", \"environmentId\": \"a74b7e03-5d48-45fc-95fc-8df9ec47f3b2\", \"severity\": \"MAJOR\", \"assetType\": \"infra-account\", \"assetIpAddress\": \"10.199.0.118\", \"assetName\": \"SIM-m0.0-1-250\", \"componentName\": \"1.1\", \"sourceMechanism\": \"SNMP\", \"message\": \"Disk Partition \\\"flash[Disk:flash]\\\" Utilization [80.95%] exceeded configured value 25.0%\", \"raisedAt\": 1708241018415.1145, \"minimumOccurrence\": 0, \"sourceSystemName\": \"Syslog_udp_Notifications\"}", "status": "Completed"}', Key=b'f2b4d3de-46a6-4c7c-9243-f57c2489e18b'
.................
2024-02-19 05:26:45,329 [PID=1:TID=Thread-70:cfx.rda_messaging.kafka_client:_error_notification:789] ERROR - KafkaError{code=_TRANSPORT,val=-195,str="sasl_ssl://rda-kafka-broker-2.rda-kafka-broker-headless.rda-fabric.svc.cluster.local:9093/bootstrap: Connection setup timed out in state CONNECT (after 30028ms in state CONNECT)"} {"sourceevent": {"id": "06497124-e2fb-4cd3-a348-2cf14b3d849a", "received": 1708320405328.9392}}
%7|1708320417.127|PRODUCE|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ac0c16424518460c8c0ef0f632515731.ingestion-tracker [12]: Produce MessageSet with 1 message(s) (1025 bytes, ApiVersion 7, MsgVersion 2, MsgId 0, BaseSeq -1, PID{Invalid}, lz4)
%7|1708320417.170|REQERR|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ProduceRequest failed: Broker: Not enough in-sync replicas: explicit actions Retry,MsgNotPersisted
%7|1708320417.170|MSGSET|rdkafka#producer-3| [thrd:sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless]: sasl_ssl://rda-kafka-controller-0.rda-kafka-controller-headless.rda-fabric.svc.cluster.local:9093/0: ac0c16424518460c8c0ef0f632515731.ingestion-tracker [12]: MessageSet with 1 message(s) (MsgId 0, BaseSeq -1) encountered error: Broker: Not enough in-sync replicas (actions Retry,MsgNotPersisted)
Provide broker log excerpts
Critical issue
The text was updated successfully, but these errors were encountered: