Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.2.3 Degradation: Creating an oversized BPMN causes unrecoverable failure #12591

Closed
sergeylebed opened this issue Apr 27, 2023 · 9 comments · Fixed by #12676
Closed

8.2.3 Degradation: Creating an oversized BPMN causes unrecoverable failure #12591

sergeylebed opened this issue Apr 27, 2023 · 9 comments · Fixed by #12676
Assignees
Labels
area/ux Marks an issue as related to improving the user experience component/stream-platform kind/bug Categorizes an issue or PR as a bug version:8.2.6 Marks an issue as being completely or in parts released in 8.2.6 version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0

Comments

@sergeylebed
Copy link

Describe the bug

An attempt to upload oversized BPMN (other the segment limit) causes unrecoverable failure of Zeebe:

  • further BPMN upload of proper sizes are not possible
  • eventually partitions becomes unhealthy and not recovered

It is degradation from version 8.1.6 that just rejected incorrect BPMN without further problems.

To Reproduce

Upload the BMPN bigger than configured
maxMessageSize: 64KB

Expected behavior

BMPN Upload is rejected

Log/Stacktrace

2023-04-27 17:03:37.989 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] ERROR
      io.camunda.zeebe.broker.process - Unexpected error while processing resource 'f6c0b39d-8357-40ea-8d79-7e2611a89677.bpmn'
io.camunda.zeebe.stream.api.records.ExceededBatchRecordSizeException: Can't append entry: 'RecordBatchEntry[recordMetadata=RecordMetadata{recordType=EVENT, valueType=PROCESS, intent=CREATED}, key=2251799852104669, sourceIndex=-1, unifiedRecordValue={"bpmnProcessId":"id_f6c0b39d-8357-40ea-8d79-7e2611a89677","version":1,"processDefinitionKey":2251799852104669,"resourceName":"f6c0b39d-8357-40ea-8d79-7e2611a89677.bpmn","checksum":"uG1QH8XcklrgFGYZI3HhPg==","resource":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4NCjxicG1uOmRlZmluaXRpb25zIHhtbG5zOmJwbW5kaT0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9ESSIgeG1sbnM6ZGM9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RDIiB4bWxuczp6ZWViZT0iaHR0cDovL2NhbXVuZGEub3JnL3NjaGVtYS96ZWViZS8xLjAiIHhtbG5zOmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9ERC8yMDEwMDUyNC9ESSIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgaWQ9ImlkXzZmNTk2NTlmLWE3OGMtNDMyMy04M2VmLTdlODMwYmJlNTUwNSIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IkNvbmZpcm1pdCBCUE1OIEJ1aWxkZXIiIGV4cG9ydGVyVmVyc2lvbj0iMS4wLjAuMCIgeG1sbnM6YnBtbj0iaHR0cD...' with size: 1010867 this would exceed the maximum batch size. [ currentBatchEntryCount: 0, currentBatchSize: 0]
	at io.camunda.zeebe.stream.impl.records.RecordBatch.appendRecord(RecordBatch.java:67) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.BufferedProcessingResultBuilder.appendRecordReturnEither(BufferedProcessingResultBuilder.java:62) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.api.ProcessingResultBuilder.appendRecord(ProcessingResultBuilder.java:38) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.ResultBuilderBackedEventApplyingStateWriter.appendFollowUpEvent(ResultBuilderBackedEventApplyingStateWriter.java:40) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.transformProcessResource(BpmnResourceTransformer.java:162) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.lambda$transformResource$0(BpmnResourceTransformer.java:77) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.util.Either$Right.map(Either.java:355) ~[zeebe-util-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.lambda$transformResource$1(BpmnResourceTransformer.java:75) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.util.Either$Right.flatMap(Either.java:366) ~[zeebe-util-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.transformResource(BpmnResourceTransformer.java:65) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.DeploymentTransformer.transformResource(DeploymentTransformer.java:122) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.DeploymentTransformer.transform(DeploymentTransformer.java:98) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.DeploymentCreateProcessor.processRecord(DeploymentCreateProcessor.java:87) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.Engine.process(Engine.java:142) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.batchProcessing(ProcessingStateMachine.java:346) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.lambda$processCommand$2(ProcessingStateMachine.java:268) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.run(ZeebeTransaction.java:84) ~[zeebe-db-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.processCommand(ProcessingStateMachine.java:268) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.tryToReadNextRecord(ProcessingStateMachine.java:227) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.readNextRecord(ProcessingStateMachine.java:203) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:92) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) [zeebe-scheduler-8.2.3.jar:8.2.3]


Environment:

  • OS: Windows
  • Zeebe Version: 8.2.3

In version 8.1.6 the behavior is correct. Here's stack traces from this version:

2023-04-27 17:52:05.506 [io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler] [Broker-0-zb-actors-1] ERROR
      io.camunda.zeebe.broker.transport - Unexpected error on writing CREATE command
java.lang.IllegalArgumentException: Expected to claim segment of size 1010866, but can't claim more than 65536 bytes.
	at io.camunda.zeebe.dispatcher.Dispatcher.offer(Dispatcher.java:207) ~[zeebe-dispatcher-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.dispatcher.Dispatcher.claimSingleFragment(Dispatcher.java:143) ~[zeebe-dispatcher-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamWriterImpl.claimLogEntry(LogStreamWriterImpl.java:165) ~[zeebe-logstreams-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamWriterImpl.tryWrite(LogStreamWriterImpl.java:124) ~[zeebe-logstreams-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler.writeCommand(CommandApiRequestHandler.java:141) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler.handleExecuteCommandRequest(CommandApiRequestHandler.java:114) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler.handle(CommandApiRequestHandler.java:58) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler.handleAsync(CommandApiRequestHandler.java:49) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler.handleAsync(CommandApiRequestHandler.java:27) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.AsyncApiRequestHandler.handleRequest(AsyncApiRequestHandler.java:110) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.broker.transport.AsyncApiRequestHandler.lambda$onRequest$0(AsyncApiRequestHandler.java:75) ~[zeebe-broker-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:92) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
2023-04-27 17:52:05.508 [io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-0] ERROR
      io.camunda.zeebe.gateway - Expected to handle gRPC request, but received an internal error from broker: BrokerError{code=INTERNAL_ERROR, message='Failed writing response: java.lang.IllegalArgumentException: Expected to claim segment of size 1010866, but can't claim more than 65536 bytes.'}
io.camunda.zeebe.gateway.cmd.BrokerErrorException: Received error from broker (INTERNAL_ERROR): Failed writing response: java.lang.IllegalArgumentException: Expected to claim segment of size 1010866, but can't claim more than 65536 bytes.
	at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.handleResponse(BrokerRequestManager.java:194) ~[zeebe-gateway-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$2(BrokerRequestManager.java:143) ~[zeebe-gateway-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:28) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:94) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) ~[zeebe-scheduler-8.1.6.jar:8.1.6]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) ~[zeebe-scheduler-8.1.6.jar:8.1.6]

@sergeylebed sergeylebed added the kind/bug Categorizes an issue or PR as a bug label Apr 27, 2023
@Zelldon
Copy link
Member

Zelldon commented Apr 28, 2023

Hey @sergeylebed thanks for reporting this!

Looks like a regression @megglos

@Zelldon
Copy link
Member

Zelldon commented Apr 28, 2023

@sergeylebed can you confirm that you didn't get an error response in the client? 🤔

It looks like, based on the stacktrace that it just failed on a different place before.

at io.camunda.zeebe.broker.transport.commandapi.CommandApiRequestHandler.writeCommand(CommandApiRequestHandler.java:141) ~[zeebe-broker-8.1.6.jar:8.1.6]

Is at the CommandAPI, when receiving the Command and writing to the dispatcher (before replicating and processing). We replaced the dispatcher in 8.2. Meaning it is now possible to write larger entries, but as you see processing is still not possible. But I think at least I would expect you get an error response. Did you?

@sergeylebed
Copy link
Author

I got an error on the client but
a) it is a generic error about timeout
b) the system becomes inoperable

Grpc: 'DeadlineExceeded' 'Status(StatusCode=\"DeadlineExceeded\", Detail=\"Time out between gateway and broker: Request timed out after PT15S\", DebugException=\"Grpc.Core.Internal.CoreErrorDetailException: {\"created\":\"@1682543232.485000000\",\"description\":\"Error received from peer ipv6:[::1]:26500\",\"file\":\"..\\..\\..\\src\\core\\lib\\surface\\call.cc\",\"file_line\":953,\"grpc_message\":\"Time out between gateway and broker: Request timed out after PT15S\",\"grpc_status\":4}\")'","Exception":"Grpc.Core.RpcException: Status(StatusCode=\"DeadlineExceeded\", Detail=\"Time out between gateway and broker: Request timed out after PT15S\", DebugException=\"Grpc.Core.Internal.CoreErrorDetailException: {\"created\":\"@1682543232.485000000\",\"description\":\"Error received from peer ipv6:[::1]:26500\",\"file\":\"..\\..\\..\\src\\core\\lib\\surface\\call.cc\",\"file_line\":953,\"grpc_message\":\"Time out between gateway and broker: Request timed out after PT15S\",\"grpc_status\":4}\")\r\n   at Zeebe.Client.Impl.Commands.DeployProcessCommand.Send(Nullable`1 timeout, CancellationToken token)\r\n   at  

@sergeylebed
Copy link
Author

The very first error in the log:

2023-04-26 12:49:17.323 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-1] ERROR
      io.camunda.zeebe.broker.process - Unexpected error while processing resource 'f6c0b39d-8357-40ea-8d79-7e2611a89677.bpmn'
io.camunda.zeebe.stream.api.records.ExceededBatchRecordSizeException: Can't append entry: 'RecordBatchEntry[recordMetadata=RecordMetadata{recordType=EVENT, valueType=PROCESS, intent=CREATED}, key=2251799852104669, sourceIndex=-1, unifiedRecordValue={"bpmnProcessId":"id_f6c0b39d-8357-40ea-8d79-7e2611a89677","version":1,"processDefinitionKey":2251799852104669,"resourceName":"f6c0b39d-8357-40ea-8d79-7e2611a89677.bpmn","checksum":"uG1QH8XcklrgFGYZI3HhPg==","resource":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4NCjxicG1uOmRlZmluaXRpb25zIHhtbG5zOmJwbW5kaT0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9ESSIgeG1sbnM6ZGM9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RDIiB4bWxuczp6ZWViZT0iaHR0cDovL2NhbXVuZGEub3JnL3NjaGVtYS96ZWViZS8xLjAiIHhtbG5zOmRpPSJodHRwOi8vd3d3Lm9tZy5vcmcvc3BlYy9ERC8yMDEwMDUyNC9ESSIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgaWQ9ImlkXzZmNTk2NTlmLWE3OGMtNDMyMy04M2VmLTdlODMwYmJlNTUwNSIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgZXhwb3J0ZXI9IkNvbmZpcm1pdCBCUE1OIEJ1aWxkZXIiIGV4cG9ydGVyVmVyc2lvbj0iMS4wLjAuMCIgeG1sbnM6YnBtbj0iaHR0cD...' with size: 1010867 this would exceed the maximum batch size. [ currentBatchEntryCount: 0, currentBatchSize: 0]
	at io.camunda.zeebe.stream.impl.records.RecordBatch.appendRecord(RecordBatch.java:67) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.BufferedProcessingResultBuilder.appendRecordReturnEither(BufferedProcessingResultBuilder.java:62) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.api.ProcessingResultBuilder.appendRecord(ProcessingResultBuilder.java:38) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.ResultBuilderBackedEventApplyingStateWriter.appendFollowUpEvent(ResultBuilderBackedEventApplyingStateWriter.java:40) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.transformProcessResource(BpmnResourceTransformer.java:162) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.lambda$transformResource$0(BpmnResourceTransformer.java:77) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.util.Either$Right.map(Either.java:355) ~[zeebe-util-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.lambda$transformResource$1(BpmnResourceTransformer.java:75) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.util.Either$Right.flatMap(Either.java:366) ~[zeebe-util-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.BpmnResourceTransformer.transformResource(BpmnResourceTransformer.java:65) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.DeploymentTransformer.transformResource(DeploymentTransformer.java:122) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.transform.DeploymentTransformer.transform(DeploymentTransformer.java:98) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.processing.deployment.DeploymentCreateProcessor.processRecord(DeploymentCreateProcessor.java:87) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.engine.Engine.process(Engine.java:142) ~[zeebe-workflow-engine-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.batchProcessing(ProcessingStateMachine.java:346) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.lambda$processCommand$2(ProcessingStateMachine.java:268) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.run(ZeebeTransaction.java:84) ~[zeebe-db-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.processCommand(ProcessingStateMachine.java:268) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.tryToReadNextRecord(ProcessingStateMachine.java:227) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.stream.impl.ProcessingStateMachine.readNextRecord(ProcessingStateMachine.java:203) ~[zeebe-stream-platform-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorJob.invoke(ActorJob.java:92) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorJob.execute(ActorJob.java:45) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorTask.execute(ActorTask.java:119) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorThread.executeCurrentTask(ActorThread.java:106) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorThread.doWork(ActorThread.java:87) [zeebe-scheduler-8.2.3.jar:8.2.3]
	at io.camunda.zeebe.scheduler.ActorThread.run(ActorThread.java:198) [zeebe-scheduler-8.2.3.jar:8.2.3]

@sergeylebed
Copy link
Author

In version 8.1.6 the client error is different

Status(StatusCode=\"Internal\", Detail=\"Unexpected error occurred between gateway and broker (code: INTERNAL_ERROR)\", DebugException=\"Grpc.Core.Internal.CoreErrorDetailException: {\"created\":\"@1682632476.120000000\",\"description\":\"Error received from peer ipv6:[::1]:26500\",\"file\":\"..\\..\\..\\src\\core\\lib\\surface\\call.cc\",\"file_line\":953,\"grpc_message\":\"Unexpected error occurred between gateway and broker (code: INTERNAL_ERROR)\",\"grpc_status\":13}\")","Exception":"Grpc.Core.RpcException: Status(StatusCode=\"Internal\", Detail=\"Unexpected error occurred between gateway and broker (code: INTERNAL_ERROR)\", DebugException=\"Grpc.Core.Internal.CoreErrorDetailException: {\"created\":\"@1682632476.120000000\",\"description\":\"Error received from peer ipv6:[::1]:26500\",\"file\":\"..\\..\\..\\src\\core\\lib\\surface\\call.cc\",\"file_line\":953,\"grpc_message\":\"Unexpected error occurred between gateway and broker (code: INTERNAL_ERROR)\",\"grpc_status\":13}\")\r\n   at Zeebe.Client.Impl.Commands.DeployProcessCommand.Send(Nullable`1 timeout, CancellationToken token)

@npepinpe npepinpe added the area/ux Marks an issue as related to improving the user experience label May 4, 2023
@npepinpe
Copy link
Member

npepinpe commented May 4, 2023

Under the assumption that doing so bricks your partition unrecoverably, we'll prioritize it as a blocker/critical issue.

@deepthidevaki
Copy link
Contributor

I was able to reproduce this by setting maxMessageSize to 1MB in the test CreateDeploymentTest::shouldRejectDeployIfResourceIsTooLarge() https://github.com/camunda/zeebe/blob/main/qa/integration-tests/src/test/java/io/camunda/zeebe/it/client/command/CreateDeploymentTest.java#L28

I see multiple issues here:

  1. CommandAPI is not rejecting requests which exceeds maxMessageSize. This result in oversized command to be written to the log stream.
  2. When engine cannot write the follow up event because it is above batch size limit, it attempts to write a rejection record which contains the whole command. Since the command is already above batch size it cannot write rejection to the log stream. This result in a loop in the processing machine where it tries to handle this error endlessly.

For fixing this issue, I propose to reject the request in CommandAPI. So it is never written to the logstream. However, I would also suggest to revisit if we really have to write the whole command in the rejection record.

zeebe-bors-camunda bot added a commit that referenced this issue May 22, 2023
12700: [Backport stable/8.2] fix(broker): reject requests larger than max message size r=deepthidevaki a=backport-action

# Description
Backport of #12676 to `stable/8.2`.

relates to #12591

Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
@sergeylebed
Copy link
Author

It does not seem to be fixed in 8.2.5. Do you think it can be merged into the latest versions?

@deepthidevaki
Copy link
Contributor

@sergeylebed The fix will be included in 8.2.6.

@lenaschoenburg lenaschoenburg added version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.2.6 Marks an issue as being completely or in parts released in 8.2.6 labels Jun 7, 2023
zeebe-bors-camunda bot added a commit that referenced this issue Jun 30, 2023
13239: Deployment payload size regression tests r=korthout a=korthout

## Description

<!-- Please explain the changes you made here. -->
Adds a regression test for the maximum deployment payload size currently possible on a 3-partition cluster.

This also cleans up the other tests related to the size of the deployment's payload in an effort to clarify what they truly verify:
- `CreateLargeDeploymentTest.shouldRejectDeployIfResourceIsTooLarge`
  - tests that the incoming request's maxMessageSize is tested
  - is a regression test against #12591
- `DeploymentRejectionTest.shouldRejectDeploymentIfResourceIsTooLarge`
  - verifies that the deployment command is rejected when its resources fit the max message size but still exceed the batch record size (due to follow-up records)
- `CreateDeploymentTest.shouldRejectDeployIfResourceIsTooLarge`
  - verifies that the client's request is rejected when its resources fit the max message size but still exceed the batch record size (due to follow-up records)

> **Note** This pull request explicitly targets `stable/8.2` as all these tests succeed there, but some will not succeed on `main` due to #13233.

## Related issues

<!-- Which issues are closed by this PR or are related -->

relates to #13233 



Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
@megglos megglos added the version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0 label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ux Marks an issue as related to improving the user experience component/stream-platform kind/bug Categorizes an issue or PR as a bug version:8.2.6 Marks an issue as being completely or in parts released in 8.2.6 version:8.3.0-alpha2 Marks an issue as being completely or in parts released in 8.3.0-alpha2 version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants