Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llegalStateException: Not leader of partition 2 #9586

Closed
Zelldon opened this issue Jun 23, 2022 · 1 comment · Fixed by #9722
Closed

llegalStateException: Not leader of partition 2 #9586

Zelldon opened this issue Jun 23, 2022 · 1 comment · Fixed by #9722
Assignees
Labels
area/observability Marks an issue as observability related kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@Zelldon
Copy link
Member

Zelldon commented Jun 23, 2022

Related to #9040

We see errors like: IllegalStateException: Not leader of partition 2

Which seems to be more or less expected if I take a look at the log.

Log:

2022-05-04 01:18:12.979 CEST
zeebe
Transition to LEADER on term 36 starting
2022-05-04 01:18:12.979 CEST
zeebe
Transition to LEADER on term 36 - transitioning LogStorage
2022-05-04 01:18:12.979 CEST
zeebe
RaftServer{raft-partition-partition-2}{role=LEADER} - Accepted VoteRequest{term=37, candidate=1, lastLogIndex=970255264, lastLogTerm=36}: candidate's log is up-to-date
2022-05-04 01:18:12.981 CEST
zeebe
Transition to FOLLOWER on term 37 requested.
2022-05-04 01:18:12.982 CEST
zeebe
Partition role transitioning from LEADER to FOLLOWER in term 37
2022-05-04 01:18:12.982 CEST
zeebe
Received cancel signal for transition to LEADER on term 36
2022-05-04 01:18:12.982 CEST
zeebe
Failed to install partition 2

2022-05-04 01:18:12.983 CEST
zeebe
RaftServer{raft-partition-partition-2} - Found leader 1
2022-05-04 01:18:12.994 CEST
zeebe
ZeebePartition-2 failed, marking it as unhealthy: Partition-2{status=UNHEALTHY, issue=ZeebePartition-2{status=UNHEALTHY, issue='Initial state'}}
2022-05-04 01:18:12.994 CEST
zeebe
Detected 'UNHEALTHY' components. The current health status of components: [ZeebePartition-2{status=UNHEALTHY, issue='Services not installed'}, raft-partition-partition-2{status=HEALTHY}]

Logs are stored as well under https://drive.google.com/drive/u/0/folders/1vYJilfkLRlrF9CqTAUwRT9dLeDqTZULb

Exception:

java.lang.IllegalStateException: Not leader of partition 2
	at io.camunda.zeebe.broker.system.partitions.impl.steps.LogStoragePartitionTransitionStep.lambda$createWritableLogStorage$1(LogStoragePartitionTransitionStep.java:111) ~[zeebe-broker-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at java.util.Optional.orElseGet(Optional.java:364) ~[?:?]
	at io.camunda.zeebe.broker.system.partitions.impl.steps.LogStoragePartitionTransitionStep.createWritableLogStorage(LogStoragePartitionTransitionStep.java:107) ~[zeebe-broker-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.broker.system.partitions.impl.steps.LogStoragePartitionTransitionStep.buildAtomixLogStorage(LogStoragePartitionTransitionStep.java:83) ~[zeebe-broker-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.broker.system.partitions.impl.steps.LogStoragePartitionTransitionStep.transitionTo(LogStoragePartitionTransitionStep.java:50) ~[zeebe-broker-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.broker.system.partitions.impl.PartitionTransitionProcess.lambda$proceedWithTransition$1(PartitionTransitionProcess.java:80) ~[zeebe-broker-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:79) ~[zeebe-util-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:44) ~[zeebe-util-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) ~[zeebe-util-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97) ~[zeebe-util-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80) ~[zeebe-util-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189) ~[zeebe-util-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT]

Originally posted by @Zelldon in #9040 (comment)

@Zelldon Zelldon added team/distributed kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. area/observability Marks an issue as observability related labels Jun 23, 2022
@npepinpe
Copy link
Member

Let's do it as it's a low hanging fruit,and we don't want to confuse users with errors where they have nothing to do/act on. Please lower this to a warning.

@deepthidevaki deepthidevaki self-assigned this Jul 6, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Jul 7, 2022
9722: fix(broker): throw recoverable exception when LogStorageSteps fails due to newer transitions r=deepthidevaki a=deepthidevaki

## Description

Since the role transitions in raft and zeebe partitions are asychronous, it is possible that while zeebe partition is installing leader services the raft has already transitioned to follower role. In this case, LogStorageStep fails because it cannot get the appender as the raft is not in the leader role anymore. Previously this fails the installation and the partition goes to an unhealthy state. But since this is an expected case, we could ignore it and continue with the installation of the new role. To enable this, we now throw a recoverable exception which won't be considered as a fatal error by ZeebePartition.

## Related issues

closes #9586 



Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Marks an issue as observability related kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
3 participants