CrashLoopBackOff caused by Exiting because log truncation is not allowed #98

solsson · 2017-11-10T13:44:30Z

After testing #95 on minikube I shut down the VM and then started it again to test #97 ...

Now kafka-2 goes crashlooping with:

[2017-11-10 13:17:57,563] INFO Truncating test-kafkacat-0 to 2334 has no effect as the largest offset in the log is 2333. (kafka.log.Log)
[2017-11-10 13:17:57,565] INFO Truncating test-produce-consume-0 to 2391 has no effect as the largest offset in the log is 2390. (kafka.log.Log)
[2017-11-10 13:17:57,571] INFO [Partition kafka-monitor-topic-0 broker=2] kafka-monitor-topic-0 starts at Leader Epoch 4 from offset 1. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
[2017-11-10 13:17:57,590] FATAL [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Exiting because log truncation is not allowed for partition test-produce-consume-0, current leader's latest offset 2051 is less than replica's latest offset 2391 (kafka.server.ReplicaFetcherThread)
[2017-11-10 13:17:57,590] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2017-11-10 13:17:57,606] INFO [KafkaServer id=2] shutting down (kafka.server.KafkaServer)
[2017-11-10 13:17:57,607] INFO [KafkaServer id=2] Starting controlled shutdown (kafka.server.KafkaServer)
[2017-11-10 13:17:57,631] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:15, offset:2334}, Current: {epoch:18, offset2304} for Partition: test-kafkacat-0 (kafka.server.epoch.LeaderEpochFileCache)
[2017-11-10 13:17:57,632] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:15, offset:2335}, Current: {epoch:18, offset2304} for Partition: test-kafkacat-0 (kafka.server.epoch.LeaderEpochFileCache)
[2017-11-10 13:17:57,632] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:15, offset:2336}, Current: {epoch:18, offset2304} for Partition: test-kafkacat-0 (kafka.server.epoch.LeaderEpochFileCache)

Further restarts fail immediately with FATAL [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Exiting because log truncation is not allowed for partition test-produce-consume-0, current leader's latest offset 2117 is less than replica's latest offset 2391 (kafka.server.ReplicaFetcherThread).

The text was updated successfully, but these errors were encountered:

solsson · 2017-11-10T14:25:16Z

Re-assigned partitions of the two topics to the two working brokers, using #95 with BROKERS env set to 0,1. The job pod logged:

# reassign-topics.json
{"topics":[
 {"topic":"test-produce-consume"},
 {"topic":"test-kafkacat"}
]}
# proposed-reassignment.json
{"version":1,"partitions":[{"topic":"test-kafkacat","partition":0,"replicas":[1,0],"log_dirs":["any","any"]},{"topic":"test-produce-consume","partition":0,"replicas":[0,1],"log_dirs":["any","any"]}]}
Current partition replica assignment

{"version":1,"partitions":[{"topic":"test-kafkacat","partition":0,"replicas":[2,0],"log_dirs":["any","any"]},{"topic":"test-produce-consume","partition":0,"replicas":[1,2],"log_dirs":["any","any"]}]}

Save this to use as the --reassignment-json-file option during rollback
Successfully started reassignment of partitions.

Now kafka-2 starts but load is unbalanced. Did another reassign, with BROKERS set to 0,1,2 and got proposed reassignment {"version":1,"partitions":[{"topic":"test-kafkacat","partition":0,"replicas":[1,2],"log_dirs":["any","any"]},{"topic":"test-produce-consume","partition":0,"replicas":[2,0],"log_dirs":["any","any"]}]}.

Still seeing this log message though: WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:13, offset:2404}, Current: {epoch:14, offset1852} for Partition: test-produce-consume-0 (kafka.server.epoch.LeaderEpochFileCache).

solsson · 2017-11-10T14:32:29Z

Pod restart got rid of the warning, but showed ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition test-produce-consume-0 to broker 0:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread).

Used maintenance/preferred-replica-election-job.yml.

ramazansakin · 2020-11-11T05:38:48Z

Hi @solsson, could u figure it out why happens the above WARN message exactly?
"WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order."

solsson mentioned this issue Nov 10, 2017

Add Jobs and tests for common maintenance operations #95

Merged

solsson closed this as completed Nov 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrashLoopBackOff caused by Exiting because log truncation is not allowed #98

CrashLoopBackOff caused by Exiting because log truncation is not allowed #98

solsson commented Nov 10, 2017

solsson commented Nov 10, 2017

solsson commented Nov 10, 2017

ramazansakin commented Nov 11, 2020

CrashLoopBackOff caused by Exiting because log truncation is not allowed #98

CrashLoopBackOff caused by Exiting because log truncation is not allowed #98

Comments

solsson commented Nov 10, 2017

solsson commented Nov 10, 2017

solsson commented Nov 10, 2017

ramazansakin commented Nov 11, 2020