Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to write to logstream during stepdown is logged as error #12780

Closed
deepthidevaki opened this issue May 16, 2023 · 5 comments · Fixed by #12910
Closed

Failing to write to logstream during stepdown is logged as error #12780

deepthidevaki opened this issue May 16, 2023 · 5 comments · Fixed by #12910
Assignees
Labels
area/resilience component/gateway kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/low Marks a bug as having little to no noticeable impact for the user version:8.2.6 Marks an issue as being completely or in parts released in 8.2.6 version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0

Comments

@deepthidevaki
Copy link
Contributor

Describe the bug

ERROR 2023-05-16T00:30:04.104402170Z [resource.labels.containerName: zeebe] Unexpected error on writing CREATE command Failed to write request to logstream

The error is from CommandAPIHandler when it tries to write a user request to the leader's logstream. This happened while the leader is transition to follower, and the logstream has already closed. Before this error we see that Sequencer rejects the record because it is closed.

This is a new error message introduced in #12676. Previously this error was ignored. So we never got the error message.

logs

Expected behavior

  • Reduce the log level to warn/debug
  • logstream#tryWrite should return specific error code instead of -1, and use that to log more meaningful message.
  • If we can recognize that this is during the leader transition we can chose to not log the error. Instead return a PARTITION_LEADER_MISMATCH code back to the gateway so that it can retry the command with the new leader before sending an error to the client.
@deepthidevaki deepthidevaki added kind/bug Categorizes an issue or PR as a bug severity/low Marks a bug as having little to no noticeable impact for the user area/resilience labels May 16, 2023
@megglos
Copy link
Contributor

megglos commented May 25, 2023

ZDP-Triage:

  • mostly noise
  • it's expected and shouldn't be logged as error in the particular scenario
  • as it's new (last or next patch) it can be considered a regression => could be confusing after update

@megglos megglos added the planning/discuss To be discussed at the next planning. label May 25, 2023
@megglos
Copy link
Contributor

megglos commented May 26, 2023

ZDP-Planning:

  • we will look into it before the next release
  • also affects 8.2,8.1,8.0 due to a backporrt

@megglos megglos removed the planning/discuss To be discussed at the next planning. label May 26, 2023
zeebe-bors-camunda bot added a commit that referenced this issue Jun 1, 2023
12928: [Backport stable/8.2] Add specific error codes for logstream write failure r=oleschoenburg a=deepthidevaki

Backport #12910 

closes #12780 

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
@lenaschoenburg lenaschoenburg added the version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 label Jun 7, 2023
@Zelldon
Copy link
Member

Zelldon commented Jun 7, 2023

I feel this is not 100% resolved. We see a lot of errors messages also in the gateway, which is also in this case a lot of noise.

Example of a current medic benchmark

io.camunda.zeebe.gateway.cmd.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Failed writing request: Failed to write request to logstream
	at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.handleResponse(BrokerRequestManager.java:194) ~[zeebe-gateway-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$2(BrokerRequestManager.java:143) ~[zeebe-gateway-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:28) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:94) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) [zeebe-scheduler-8.3.0-SNAPSHOT.jar:8.3.0-SNAPSHOT]
"

Where at this time a role change happens
role

@Zelldon Zelldon reopened this Jun 7, 2023
@Zelldon Zelldon added component/gateway scope/broker Marks an issue or PR to appear in the broker section of the changelog labels Jun 7, 2023
@deepthidevaki
Copy link
Contributor Author

@Zelldon That is an old benchmark before the bug fix.

@Zelldon
Copy link
Member

Zelldon commented Jun 7, 2023

Ups thanks @deepthidevaki you're right 👍

@lenaschoenburg lenaschoenburg added the version:8.2.6 Marks an issue as being completely or in parts released in 8.2.6 label Jun 7, 2023
@megglos megglos added the version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0 label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/resilience component/gateway kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/low Marks a bug as having little to no noticeable impact for the user version:8.2.6 Marks an issue as being completely or in parts released in 8.2.6 version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants