Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrashLoopBackOff caused by Exiting because log truncation is not allowed #98

Closed
solsson opened this issue Nov 10, 2017 · 3 comments
Closed

Comments

@solsson
Copy link
Contributor

solsson commented Nov 10, 2017

After testing #95 on minikube I shut down the VM and then started it again to test #97 ...

Now kafka-2 goes crashlooping with:

[2017-11-10 13:17:57,563] INFO Truncating test-kafkacat-0 to 2334 has no effect as the largest offset in the log is 2333. (kafka.log.Log)
[2017-11-10 13:17:57,565] INFO Truncating test-produce-consume-0 to 2391 has no effect as the largest offset in the log is 2390. (kafka.log.Log)
[2017-11-10 13:17:57,571] INFO [Partition kafka-monitor-topic-0 broker=2] kafka-monitor-topic-0 starts at Leader Epoch 4 from offset 1. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
[2017-11-10 13:17:57,590] FATAL [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Exiting because log truncation is not allowed for partition test-produce-consume-0, current leader's latest offset 2051 is less than replica's latest offset 2391 (kafka.server.ReplicaFetcherThread)
[2017-11-10 13:17:57,590] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2017-11-10 13:17:57,606] INFO [KafkaServer id=2] shutting down (kafka.server.KafkaServer)
[2017-11-10 13:17:57,607] INFO [KafkaServer id=2] Starting controlled shutdown (kafka.server.KafkaServer)
[2017-11-10 13:17:57,631] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:15, offset:2334}, Current: {epoch:18, offset2304} for Partition: test-kafkacat-0 (kafka.server.epoch.LeaderEpochFileCache)
[2017-11-10 13:17:57,632] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:15, offset:2335}, Current: {epoch:18, offset2304} for Partition: test-kafkacat-0 (kafka.server.epoch.LeaderEpochFileCache)
[2017-11-10 13:17:57,632] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:15, offset:2336}, Current: {epoch:18, offset2304} for Partition: test-kafkacat-0 (kafka.server.epoch.LeaderEpochFileCache)

Further restarts fail immediately with FATAL [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Exiting because log truncation is not allowed for partition test-produce-consume-0, current leader's latest offset 2117 is less than replica's latest offset 2391 (kafka.server.ReplicaFetcherThread).

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Re-assigned partitions of the two topics to the two working brokers, using #95 with BROKERS env set to 0,1. The job pod logged:

# reassign-topics.json
{"topics":[
 {"topic":"test-produce-consume"},
 {"topic":"test-kafkacat"}
]}
# proposed-reassignment.json
{"version":1,"partitions":[{"topic":"test-kafkacat","partition":0,"replicas":[1,0],"log_dirs":["any","any"]},{"topic":"test-produce-consume","partition":0,"replicas":[0,1],"log_dirs":["any","any"]}]}
Current partition replica assignment

{"version":1,"partitions":[{"topic":"test-kafkacat","partition":0,"replicas":[2,0],"log_dirs":["any","any"]},{"topic":"test-produce-consume","partition":0,"replicas":[1,2],"log_dirs":["any","any"]}]}

Save this to use as the --reassignment-json-file option during rollback
Successfully started reassignment of partitions.

Now kafka-2 starts but load is unbalanced. Did another reassign, with BROKERS set to 0,1,2 and got proposed reassignment {"version":1,"partitions":[{"topic":"test-kafkacat","partition":0,"replicas":[1,2],"log_dirs":["any","any"]},{"topic":"test-produce-consume","partition":0,"replicas":[2,0],"log_dirs":["any","any"]}]}.

Still seeing this log message though: WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:13, offset:2404}, Current: {epoch:14, offset1852} for Partition: test-produce-consume-0 (kafka.server.epoch.LeaderEpochFileCache).

@solsson
Copy link
Contributor Author

solsson commented Nov 10, 2017

Pod restart got rid of the warning, but showed ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition test-produce-consume-0 to broker 0:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread).

Used maintenance/preferred-replica-election-job.yml.

@solsson solsson closed this as completed Nov 10, 2017
@ramazansakin
Copy link

Hi @solsson, could u figure it out why happens the above WARN message exactly?
"WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants