Multi-instance parallel subprocess times out and never completes for large input collections #8687

lfourky · 2022-01-30T11:27:16Z

Describe the bug
When running a multi-instance parallel subprocess, I've experienced issues such as stalling - for example, with an input count of 5000 elements, Zeebe seems to process a few, then never process the rest.

This can sometime stall all the future processes for the same process ID, as well, I think.

Hint: when run as a sequential subprocess, it runs fine.

To Reproduce
https://github.com/lfourky/zeebe-multiinstance-issue
I've created a repository to demonstrate this issue. The BPMN is quite a simple one and it's using a Go client.
Assuming you've got zbctl installed, you can run:

   make deploy
   make run

to reproduce this locally.

Expected behavior
Expected the process to complete in near future.

Log/Stacktrace

Full Stacktrace

12:11:38.887 [] [grpc-default-worker-ELG-5-6] WARN  io.grpc.netty.NettyServerHandler - Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
        at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:172) ~[netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:963) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:515) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:521) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:613) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:444) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:259) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [netty-transport-classes-epoll-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [netty-transport-classes-epoll-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [netty-transport-classes-epoll-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.73.Final.jar:4.1.73.Final]
        at java.lang.Thread.run(Unknown Source) [?:?]



12:11:49.564 [io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-0] WARN  io.camunda.zeebe.gateway - Failed to activate jobs for type second_data_processor from partition 1
java.util.concurrent.TimeoutException: Request ProtocolRequest{id=5022, subject=command-api-1, sender=0.0.0.0:26502, payload=byte[]{length=136, hash=-1458081928}} to 0.0.0.0:26501 timed out in PT15S
        at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:230) ~[zeebe-atomix-cluster-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.73.Final.jar:4.1.73.Final]
        at java.lang.Thread.run(Unknown Source) ~[?:?]

Environment:

OS: Ubuntu 20 (WSL2)
Zeebe Version: latest develop branch 5932521
Configuration: Standalone broker with default settings, but when deployed in Kubernetes (a bit more custom deployment) it behaves the same way.

The text was updated successfully, but these errors were encountered:

Zelldon · 2022-01-30T11:35:59Z

Hey @lfourky thanks for reporting.

It is very likely that your process instances are blacklisted, due to big input collection. Could you verify that in your broker log? Do you see any error logs? This is related to #2890 and #5221

lfourky · 2022-01-31T17:24:40Z

When calling the process with 5.000 input collection variables, I do not see the blacklist log. I keep seeing these timeout issues which keep repeating over and over again:

Stacktrace

18:18:56.805 [io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-0] WARN  io.camunda.zeebe.gateway - Failed to activate jobs for type second_data_processor from partition 1
java.util.concurrent.TimeoutException: Request ProtocolRequest{id=5015, subject=command-api-1, sender=0.0.0.0:26502, payload=byte[]{length=136, hash=-1458081928}} to 0.0.0.0:26501 timed out in PT14.962S
	at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:230) ~[zeebe-atomix-cluster-1.4.0-20220128.182414-45.jar:1.4.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.73.Final.jar:4.1.73.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
18:18:56.943 [io.camunda.zeebe.gateway.impl.broker.BrokerRequestManager] [Broker-0-zb-actors-1] WARN  io.camunda.zeebe.gateway - Failed to activate jobs for type second_data_processor from partition 1
java.util.concurrent.TimeoutException: Request ProtocolRequest{id=5016, subject=command-api-1, sender=0.0.0.0:26502, payload=byte[]{length=136, hash=-677107123}} to 0.0.0.0:26501 timed out in PT15S
	at io.atomix.cluster.messaging.impl.NettyMessagingService.lambda$sendAndReceive$4(NettyMessagingService.java:230) ~[zeebe-atomix-cluster-1.4.0-20220128.182414-45.jar:1.4.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.73.Final.jar:4.1.73.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
18:18:57.055 [] [grpc-nio-worker-ELG-5-3] WARN  io.grpc.netty.NettyServerHandler - Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
	at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:172) ~[netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:963) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:515) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:521) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:613) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:444) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:259) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
18:18:57.261 [] [grpc-nio-worker-ELG-5-3] WARN  io.grpc.netty.NettyServerHandler - Stream Error

When calling the process with 20.000 input collection variables, I do see the blacklist log:

Stacktrace

23:08:15.440 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] ERROR io.camunda.zeebe.processor - Expected to write one or more follow up events for event 'LoggedEvent [type=0, version=16384, streamId=1, position=6615, key=2251799813687535, timestamp=1643666895256, sourceEventPosition=6610] RecordMetadata{recordType=COMMAND, intentValue=255, intent=ACTIVATE_ELEMENT, requestStreamId=-2147483648, requestId=-1, protocolVersion=3, valueType=PROCESS_INSTANCE, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=1.4.0}' without errors, but exception was thrown.
java.lang.IllegalArgumentException: Expected to claim segment of size 6500888, but can't claim more than 4194304 bytes.
	at io.camunda.zeebe.dispatcher.Dispatcher.offer(Dispatcher.java:194) ~[zeebe-dispatcher-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.dispatcher.Dispatcher.claimFragmentBatch(Dispatcher.java:164) ~[zeebe-dispatcher-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.claimBatchForEvents(LogStreamBatchWriterImpl.java:222) ~[zeebe-logstreams-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.tryWrite(LogStreamBatchWriterImpl.java:199) ~[zeebe-logstreams-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.TypedStreamWriterImpl.flush(TypedStreamWriterImpl.java:106) ~[zeebe-workflow-engine-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.TypedStreamWriterProxy.flush(TypedStreamWriterProxy.java:59) ~[zeebe-workflow-engine-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.lambda$writeEvent$7(ProcessingStateMachine.java:374) ~[zeebe-workflow-engine-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.retry.AbortableRetryStrategy.run(AbortableRetryStrategy.java:44) ~[zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:79) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:44) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
23:08:15.461 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] ERROR io.camunda.zeebe.processor - Expected to process event 'TypedEventImpl{metadata=RecordMetadata{recordType=COMMAND, intentValue=255, intent=ACTIVATE_ELEMENT, requestStreamId=-2147483648, requestId=-1, protocolVersion=3, valueType=PROCESS_INSTANCE, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=1.4.0}, value={"bpmnProcessId":"my_process","version":1,"processDefinitionKey":2251799813685251,"processInstanceKey":2251799813687523,"elementId":"Activity_0kpdzv0","flowScopeKey":2251799813687523,"bpmnElementType":"MULTI_INSTANCE_BODY","parentProcessInstanceKey":-1,"parentElementInstanceKey":-1}}' without errors, but exception occurred with message 'Expected to claim segment of size 6500888, but can't claim more than 4194304 bytes.'.
java.lang.IllegalArgumentException: Expected to claim segment of size 6500888, but can't claim more than 4194304 bytes.
	at io.camunda.zeebe.dispatcher.Dispatcher.offer(Dispatcher.java:194) ~[zeebe-dispatcher-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.dispatcher.Dispatcher.claimFragmentBatch(Dispatcher.java:164) ~[zeebe-dispatcher-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.claimBatchForEvents(LogStreamBatchWriterImpl.java:222) ~[zeebe-logstreams-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.logstreams.impl.log.LogStreamBatchWriterImpl.tryWrite(LogStreamBatchWriterImpl.java:199) ~[zeebe-logstreams-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.engine.processing.streamprocessor.writers.TypedStreamWriterImpl.flush(TypedStreamWriterImpl.java:106) ~[zeebe-workflow-engine-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.TypedStreamWriterProxy.flush(TypedStreamWriterProxy.java:59) ~[zeebe-workflow-engine-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.lambda$writeEvent$7(ProcessingStateMachine.java:374) ~[zeebe-workflow-engine-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.retry.AbortableRetryStrategy.run(AbortableRetryStrategy.java:44) ~[zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:79) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:44) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189) [zeebe-util-1.4.0-20220131.134754-49.jar:1.4.0-SNAPSHOT]
23:08:15.462 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] WARN  io.camunda.zeebe.broker.logstreams - Blacklist process instance 2251799813687523, due to previous errors.

npepinpe · 2022-02-01T08:59:00Z

Let's try to reproduce it and timebox root causing it as we would normally in our medic process.

oleschoenburg · 2022-02-01T10:27:28Z

@lfourky Thanks to https://github.com/lfourky/zeebe-multiinstance-issue I was able to reproduce the issue locally. It looks like feel is taking a long time to evaluate the input collection expression.

oleschoenburg · 2022-02-01T11:36:28Z

I've looked into this a bit more:
MutliInstanceBodyProcessor.onChildActivating is called for each child instance (so in this example 5000 times). Each time we call ExpressionProcessor.evaluateArrayExpression which makes up the majority of the time spent.

I tried to improve the performance of evaluateArrayExpression (e.g. giving size hints while building up the collections), but couldn't make enough of an impact to solve this issue.

To me this looks like a limitation of how MultiInstanceBodyProcessor is implemented: For each child instance that is created, it has to evaluate the entire input collection.

/cc @npepinpe

lfourky · 2022-02-01T14:33:24Z

Could this be a related issue? #8622

npepinpe · 2022-02-02T09:41:00Z

That one is a bit different, it had to do with the jobs being activated but not making it to the worker, then timing out, as far as I remember.

/cc @saig0 / @korthout - are there any low hanging fruits to help improve the evaluation of the expression? If not, then I classify this as a current limitation related to multi instances (for now anyway). Is there any workaround we can suggest for users running into this?

korthout · 2022-02-02T10:26:41Z

I shortly discussed it with @saig0. We wonder what the actual numbers are on the performance of the evaluateArrayExpression and whether this is really the reason that the process can't make progress.

If it would just be slow, then it would just take some additional time for the processing to finish, but this shouldn't block anything. However, the stacktrace shows that the activate job batch request times out after 15 seconds. @oleschoenburg Did you already try increasing the request timeout or decreasing the job timeout?

Finally, if slowness is the reason, then we have some ideas on improving the performance of this specific expression evaluation. But again there we should first measure before making changes.

An idea is that currently, it evaluates the expression and then transforms each element in the array to message pack. However, on activation of the child, only 1 specific element is needed from this list (the one at the loop counter's index). Perhaps, if that part is taking too much time, we could change how we retrieve this 1 item from the expression result.

oleschoenburg · 2022-02-02T20:18:22Z

I've tried playing around with the timeouts in the go worker:

- client.NewJobWorker().JobType("second_data_processor").Handler(secondParallelMultiInstance).Open()
+ client.NewJobWorker().JobType("second_data_processor").Handler(secondParallelMultiInstance).RequestTimeout(60 * time.Second).Timeout(5 * time.Second).PollInterval(1 * time.Second).Concurrency(1).Open()

but that did not improve the situation at all: no jobs of the multi-instance were activated (or at least they didn't reach the worker). I waited for ~10 minutes.

So as a sanity check I implemented the worker with the java client:

import io.camunda.zeebe.client.ZeebeClient;
import io.camunda.zeebe.client.api.response.ActivatedJob;
import io.camunda.zeebe.client.api.worker.JobClient;

import java.time.Duration;

public class App {
  public static void main(String[] args) throws InterruptedException {
    try (final var client =
        ZeebeClient.newClientBuilder().gatewayAddress("localhost:26500").usePlaintext().build()) {
      try (final var worker =
          client
              .newWorker()
              .jobType("second_data_processor")
              .handler(
                  (JobClient jobClient, ActivatedJob job) -> {
                    System.out.println("Handling job: " + job.getKey());
                    client.newCompleteCommand(job.getKey()).send().join();
                  })
              .timeout(Duration.ofSeconds(5))
              .requestTimeout(Duration.ofSeconds(60))
              .maxJobsActive(1)
              .pollInterval(Duration.ofSeconds(1))
              .open()) {
        System.out.println("Job worker started");
        Thread.currentThread().join();
      }
    }
  }
}

I'm using the same settings for timeout, requestTimeout, concurrency and pollInterval. Now we are finally activating jobs!
It still takes a good while for the first job to appear (which might be due to #8622) but after the first one we are making steady progress and finish all jobs.

So this leads me to believe that there is either a bug in the go worker implementation, a bug in the go client, or a subtle difference in the (default) configuration used for the java and go clients.

saig0 · 2022-02-03T05:25:34Z

@oleschoenburg can you share some numbers? It would be interesting to see how long it takes to complete the process instance. Also, for different configurations. This could guide users in the right direction.

We could tweak the job worker a bit by increasing the maxJobsActive (e.g. 100 jobs - we expect a lot of them).

npepinpe · 2022-02-03T08:27:03Z

We had some issues in the C# clients with job activation, which we only discovered by using Wireshark - we could then confirm the gateway was sending a response with the jobs, but somehow the job handlers on the C# side never observed them. Could be one lead here to figure out if the bug is on the client or gateway side.

oleschoenburg · 2022-02-03T08:30:19Z

@saig0 Increasing maxJobsActive breaks the java worker - no jobs are activated, at least from the workers point of view. I've only tried 1 and 100 so far, so there is probably a middle ground that works.
Activating the first of the 5000 jobs happens after 2 minutes.

oleschoenburg · 2022-02-03T08:38:29Z

When using maxJobsActive=100 with the java worker, I'm seeing lot's of grpc debug logs that look like the gateway wants to send over data to the worker but can't because the stream is closed.

Worker logs:

Feb 03, 2022 9:35:52 AM io.grpc.internal.AbstractClientStream$TransportState inboundDataReceived
INFO: Received data on closed stream

Zeebe logs:

2022-02-03 09:37:13.181 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] INBOUND HEADERS: streamId=17 headers=GrpcHttp2RequestHeaders[:path: /gateway_protocol.Gateway/ActivateJobs, :authority: localhost:26500, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: zeebe-client-java/1.3.2 grpc-java-netty/1.43.0, grpc-accept-encoding: gzip, grpc-timeout: 69999686u] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
2022-02-03 09:37:13.182 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] INBOUND DATA: streamId=17 padding=0 endStream=true length=46 bytes=00000000290a157365636f6e645f646174615f70726f636573736f72120764656661756c74188827206430e0d403
2022-02-03 09:37:13.182 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND PING: ack=false bytes=1234
2022-02-03 09:37:13.184 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] INBOUND PING: ack=true bytes=1234
2022-02-03 09:37:13.246 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND HEADERS: streamId=17 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-encoding: identity, grpc-accept-encoding: gzip] padding=0 endStream=false
2022-02-03 09:37:13.256 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=65536 bytes=00004a44ee0aef801b08989382808080800412157365636f6e645f646174615f70726f636573736f72188680808080808004220a6d795f70726f636573732801...
2022-02-03 09:37:13.256 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=3a226e616d655f343934222c22696e646578223a3439342c22657870697265734174223a22323032322d30322d30335430393a34333a32322e30303037353437...
2022-02-03 09:37:13.257 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=34353332372b30313a3030227d2c7b226e616d65223a226e616d655f31303930222c22696e646578223a313039302c22657870697265734174223a2232303232...
2022-02-03 09:37:13.257 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=2d30335431303a30333a30332e3030353532353935322b30313a3030227d2c7b226e616d65223a226e616d655f31363736222c22696e646578223a313637362c...
2022-02-03 09:37:13.258 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=3235392c22657870697265734174223a22323032322d30322d30335431303a31323a34372e3030373031333136332b30313a3030227d2c7b226e616d65223a22...
2022-02-03 09:37:13.258 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=22657870697265734174223a22323032322d30322d30335431303a32323a33332e3030383337303533382b30313a3030227d2c7b226e616d65223a226e616d65...
2022-02-03 09:37:13.259 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=362e3030393734343637352b30313a3030227d2c7b226e616d65223a226e616d655f33343239222c22696e646578223a333432392c2265787069726573417422...
2022-02-03 09:37:13.259 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=30313a3030227d2c7b226e616d65223a226e616d655f34303135222c22696e646578223a343031352c22657870697265734174223a22323032322d30322d3033...
2022-02-03 09:37:13.260 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=70697265734174223a22323032322d30322d30335431303a35313a34362e3031323630353034362b30313a3030227d2c7b226e616d65223a226e616d655f3435...
2022-02-03 09:37:13.261 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=6c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e...
2022-02-03 09:37:13.261 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=70697265734174223a22323032322d30322d30335430393a34333a30382e3030303731353137392b30313a3030227d2c7b226e616d65223a226e616d655f3438...
2022-02-03 09:37:13.262 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=3331372b30313a3030227d2c7b226e616d65223a226e616d655f31303739222c22696e646578223a313037392c22657870697265734174223a22323032322d30...
2022-02-03 09:37:13.263 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=6578223a313636322c22657870697265734174223a22323032322d30322d30335431303a30323a35302e3030353439333936342b30313a3030227d2c7b226e61...
2022-02-03 09:37:13.263 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=22657870697265734174223a22323032322d30322d30335431303a31323a33362e3030363938353932352b30313a3030227d2c7b226e616d65223a226e616d65...
2022-02-03 09:37:13.264 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=226e616d65223a226e616d655f32383332222c22696e646578223a323833322c22657870697265734174223a22323032322d30322d30335431303a32323a3230...
2022-02-03 09:37:13.264 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=393732313833372b30313a3030227d2c7b226e616d65223a226e616d655f33343138222c22696e646578223a333431382c22657870697265734174223a223230...
2022-02-03 09:37:13.265 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=3032322d30322d30335431303a34313a34392e3031313231333330392b30313a3030227d2c7b226e616d65223a226e616d655f34303032222c22696e64657822...
2022-02-03 09:37:13.266 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=734174223a22323032322d30322d30335431303a35313a33352e3031323538313738392b30313a3030227d2c7b226e616d65223a226e616d655f34353838222c...
2022-02-03 09:37:13.267 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=6c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e...
2022-02-03 09:37:13.268 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=3639222c22696e646578223a3436392c22657870697265734174223a22323032322d30322d30335430393a34323a35372e3030303638353633362b30313a3030...
2022-02-03 09:37:13.269 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=4174223a22323032322d30322d30335430393a35323a35332e3030323537383133392b30313a3030227d2c7b226e616d65223a226e616d655f31303636222c22...
2022-02-03 09:37:13.269 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=3a313635312c22657870697265734174223a22323032322d30322d30335431303a30323a33392e3030353436383638322b30313a3030227d2c7b226e616d6522...
2022-02-03 09:37:13.271 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=616d65223a226e616d655f32323335222c22696e646578223a323233352c22657870697265734174223a22323032322d30322d30335431303a31323a32332e30...
2022-02-03 09:37:13.272 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=65223a226e616d655f32383231222c22696e646578223a323832312c22657870697265734174223a22323032322d30322d30335431303a32323a30392e303038...
2022-02-03 09:37:13.273 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=65734174223a22323032322d30322d30335431303a33313a35322e3030393639333230322b30313a3030227d2c7b226e616d65223a226e616d655f3334303522...
2022-02-03 09:37:13.273 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=322d30335431303a34313a33382e3031313138393335332b30313a3030227d2c7b226e616d65223a226e616d655f33393931222c22696e646578223a33393931...
2022-02-03 09:37:13.274 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=655f34353734222c22696e646578223a343537342c22657870697265734174223a22323032322d30322d30335431303a35313a32322e3031323535343036322b...
2022-02-03 09:37:13.275 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e756c6c2c6e75...
2022-02-03 09:37:13.275 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49152 bytes=223a22323032322d30322d30335430393a34323a34332e3030303634323638342b30313a3030227d2c7b226e616d65223a226e616d655f343536222c22696e64...
2022-02-03 09:37:13.276 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=48951 bytes=323032322d30322d30335430393a35323a34322e3030323534303938342b30313a3030227d2c7b226e616d65223a226e616d655f31303535222c22696e646578...
2022-02-03 09:37:13.277 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=47142 bytes=7d2c7b226e616d65223a226e616d655f31363338222c22696e646578223a313633382c22657870697265734174223a22323032322d30322d30335431303a3032...
2022-02-03 09:37:13.289 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] INBOUND PING: ack=false bytes=1234
2022-02-03 09:37:13.289 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND PING: ack=true bytes=1234
2022-02-03 09:37:13.290 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] INBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=1015808
2022-02-03 09:37:13.290 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=65423 bytes=65223a226e616d655f32323030222c22696e646578223a323230302c22657870697265734174223a22323032322d30322d30335431303a31313a34382e303036...
2022-02-03 09:37:13.290 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49064 bytes=65223a226e616d655f32393830222c22696e646578223a323938302c22657870697265734174223a22323032322d30322d30335431303a32343a34382e303038...
2022-02-03 09:37:13.290 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49039 bytes=2b30313a3030227d2c7b226e616d65223a226e616d655f33353635222c22696e646578223a333536352c22657870697265734174223a22323032322d30322d30...
2022-02-03 09:37:13.290 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49064 bytes=30335431303a34343a31372e3031313539303539342b30313a3030227d2c7b226e616d65223a226e616d655f34313530222c22696e646578223a343135302c22...
2022-02-03 09:37:13.290 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49039 bytes=70697265734174223a22323032322d30322d30335431303a35343a30322e3031323930363632332b30313a3030227d2c7b226e616d65223a226e616d655f3437...
2022-02-03 09:37:13.291 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49064 bytes=322d30335430393a33353a32352e3939393039323132362b30313a3030227d2c7b226e616d65223a226e616d655f3139222c22696e646578223a31392c226578...
2022-02-03 09:37:13.291 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49039 bytes=30313138303734352b30313a3030227d2c7b226e616d65223a226e616d655f363230222c22696e646578223a3632302c22657870697265734174223a22323032...
2022-02-03 09:37:13.291 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49064 bytes=30322d30335430393a35353a32312e3030333030303832322b30313a3030227d2c7b226e616d65223a226e616d655f31323134222c22696e646578223a313231...
2022-02-03 09:37:13.291 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=49039 bytes=382c22657870697265734174223a22323032322d30322d30335431303a30353a30362e30303538323638332b30313a3030227d2c7b226e616d65223a226e616d...
2022-02-03 09:37:13.291 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] OUTBOUND DATA: streamId=17 padding=0 endStream=false length=34690 bytes=226e616d655f32333833222c22696e646578223a323338332c22657870697265734174223a22323032322d30322d30335431303a31343a35312e303037323939...
2022-02-03 09:37:13.292 [] [grpc-default-worker-ELG-5-5] DEBUG
      io.grpc.netty.NettyServerHandler - [id: 0xb53c85b6, L:/127.0.0.1:26500 - R:/127.0.0.1:56856] INBOUND RST_STREAM: streamId=17 errorCode=8
2022-02-03 09:37:13.292 [] [grpc-default-worker-ELG-5-5] WARN 
      io.grpc.netty.NettyServerHandler - Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
	at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:172) ~[netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:963) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:515) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:521) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:613) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:444) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:509) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:259) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) [netty-codec-http2-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [netty-transport-classes-epoll-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [netty-transport-classes-epoll-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [netty-transport-classes-epoll-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.73.Final.jar:4.1.73.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]

npepinpe · 2022-02-03T08:47:31Z

Yes, that's a known issue, but unless we know of a quick fix I'd rather try to convince our product team to allocate resources to redesign the job worker end-to-end (i.e. including gateway/broker) for a smoother experience. I guess my point here is, don't dig too deep into that unless it's immediately relevant to the issue at hand, if that makes sense 😄

npepinpe · 2022-02-08T08:45:14Z

I'll leave it in the backlog for now. It seems to be issues mostly with job activation, and I would like that we work on the job activation pipeline in general, as it has more than one such issue, and no real quick fixes.

npepinpe · 2022-04-14T07:44:55Z

Would this be fixed by #8879 ? What was the end consensus, the issue was with the expression or the job activation?

oleschoenburg · 2022-04-14T07:54:26Z

I think it was mostly the job activation that was the problem here. Thanks for the reminder, I actually wanted to test this again 👍

oleschoenburg · 2022-04-14T09:04:39Z

For an input collection of 1000 elements everything looks as expected but for 2000 I'm still getting pretty much the same behavior.

Zeebe logs a bunch of StreamExceptions and the worker is only getting a handful of jobs.

2022-04-14 09:00:27.903 [] [grpc-default-worker-ELG-5-3] WARN 
      io.grpc.netty.NettyServerHandler - Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place

2022/04/14 11:00:12 started process: 2251799813696325
2022/04/14 11:00:12 first job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:39 second job handler called
2022/04/14 11:00:40 second job handler called
2022/04/14 11:00:40 second job handler called
2022/04/14 11:00:40 second job handler called
2022/04/14 11:00:40 second job handler called

npepinpe · 2022-04-14T09:10:48Z

Yes, but now the jobs should be returned to the broker and be available for activation again, so overall it the hope is perceived performance is better now. Is that the case?

oleschoenburg · 2022-04-14T09:24:24Z

I don't think so. There's just nothing happening. I'd expect that the same worker gets job once they are re-activated, right?

npepinpe · 2022-04-14T12:50:07Z

Yeah, that's the expectation 😞

korthout · 2022-07-21T14:30:28Z

This issue came up in a support case. As a workaround users can try to reduce the maxActiveJobs configuration setting (default in java client, config in java client) to a low value (e.g. 1, 10, or 20).

npepinpe · 2022-07-22T07:22:00Z

Can you refresh my memory here - the issue is that we spend too long busy-looping over the same multi-instance activity because the collection is too big? Correct?

korthout · 2022-07-25T12:42:51Z

I'm not sure, I just tried the workaround and thought it makes sense to highlight it here. @oleschoenburg do you remember the cause?

oleschoenburg · 2022-07-25T13:01:10Z

If I remember correctly, the cause was unclear to me. I had initially suspected that evaluating the input collection (which involves FEEL) was too slow but I later discarded that theory, probably based on what I saw when I experimented with smaller input collections.

korthout · 2022-08-30T09:20:39Z

Marking priority as later because improving 'job activation' and 'multi-instance for large collections' are not the main concerns for the process automation team right now. However, we should work on this when we work on

I can spawn inner instances for a large input collection #2890

Note that we're currently already improving multi-instance for large collections, with:

Create incident on multi instance creation to avoid banned instances #5221

So we could consider increasing the priority to upcoming. @saig0 @remcowesterhoud What do you think?

Please comment if you think this should have a higher priority.

korthout · 2023-08-04T17:24:19Z

@lfourky I've had another look at this bug. Using your example, I'm able to reproduce this on zeebe version 8.2.11.

I had to make some small adjustments:

go.mod:

-require github.com/camunda-cloud/zeebe/clients/go v1.3.2
+require github.com/camunda/zeebe/clients/go/v8 v8.2.11

main.go:

-       "github.com/camunda-cloud/zeebe/clients/go/pkg/entities"
-       "github.com/camunda-cloud/zeebe/clients/go/pkg/worker"
-       "github.com/camunda-cloud/zeebe/clients/go/pkg/zbc"
+       "github.com/camunda/zeebe/clients/go/v8/pkg/entities"
+       "github.com/camunda/zeebe/clients/go/v8/pkg/worker"
+       "github.com/camunda/zeebe/clients/go/v8/pkg/zbc"

Running it, I noticed that the problem is not the large input collection but the large number of variables passed along with each job. The case works as follows:

a large variable called inputCollection is created at the process scope by the Data Loader's job worker
this variable is used as input collection for the parallel multi-instance, so far so good
this variable is sent along to each activated job in an activate jobs batch response
the gateway fails to hand over the jobs to the client

Solution 1:

So, then I got to the first idea. Just don't send the inputCollection along with each job. You can achieve this by setting the FetchVariables for the worker so that only the inputElement variable is fetched:

client.NewJobWorker().JobType("second_data_processor").Handler(secondParallelMultiInstance).FetchVariables("inputElement").Open()

This solves the problem by reducing the amount of data that needs to be transmitted to the worker for each batch of activated jobs.

Solution 2:

But I wondered, how can it be that the gateway cannot send the jobs to the client when there are large variables sent along with the activated jobs?

I was able to complete the process by specifying the client's inbound MaxMsgSize to 8 MB:

client, err := zbc.NewClient(&zbc.ClientConfig{
	GatewayAddress:         gatewayAddress,
	UsePlaintextConnection: true,
	DialOpts:               []grpc.DialOption{grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(8 * 1024 * 1024))},
})

So, it seems that the problem is that the broker can produce responses larger than the expected maximum of 4MB (default). I'm not yet sure whether this is a bug or expected behavior.

@npepinpe What do you think?

@lfourky You can find a working version of your example using solution 2 here.

korthout · 2023-08-04T18:27:12Z

So, we have two workarounds available:

don't fetch so many variables
increase the inbound max message size

🤔 However, I'm still unsure whether it is expected that the gateway can send the client an ActivateJobBatch response that is larger than the default 4MB.

npepinpe · 2023-08-07T08:33:00Z

So, it seems that the problem is that the broker can produce responses larger than the expected maximum of 4MB (default). I'm not yet sure whether this is a bug or expected behavior.

Since the response goes through different encodings, it's possible that it was very close to 4MB on the broker/Raft side, but once transformed to gRPC is larger than 4MB. I can't really think of another reason (other than misconfiguration with different max message sizes, but I assume that's not it here).

I would expect that if we fail to forward the activated job, we would yield it back to the engine.

korthout · 2023-08-08T16:30:17Z

I would expect that if we fail to forward the activated job, we would yield it back to the engine.

That happens as expected, yielding this way is still implemented using FailJob. In the situation above, this simply leads to loads of traffic:

collecting and activating jobs with large variables
failing them when unable to forward
collecting again
etc
sometimes some jobs make it through (somehow the size of available jobs at that moment in time or the size of the variables together fits in a response that the client can receive)

I'm not yet sure whether this is a bug or expected behavior.

it's possible that it was very close to 4MB on the broker/Raft side, but once transformed to gRPC is larger than 4MB.

@npepinpe Do you think that's expected behavior? Or should we avoid this from happening? For example, we could further limit the max size of jobs collected in a job batch record. That would be at the cost of payload size (users might expect to activate a single job with a single large variable near the configured max message size)

EDIT: If we believe this is expected behavior. Then I'll close this issue

remcowesterhoud · 2023-08-08T16:48:48Z

For example, we could further limit the max size of jobs collected in a job batch record.

This sounds a lot like what I did recently for a different bug 😄

#13624

lfourky added the kind/bug Categorizes an issue or PR as a bug label Jan 30, 2022

npepinpe added the blocker/info Marks an issue as blocked, awaiting more information from the author label Jan 31, 2022

oleschoenburg self-assigned this Feb 1, 2022

lfourky changed the title ~~Multi-instance parallel subprocess times out and never completes when for large inputs~~ Multi-instance parallel subprocess times out and never completes for large inputs Feb 1, 2022

lfourky changed the title ~~Multi-instance parallel subprocess times out and never completes for large inputs~~ Multi-instance parallel subprocess times out and never completes for large input collections Feb 1, 2022

oleschoenburg removed their assignment Feb 1, 2022

pihme added the team/process-automation label Jun 3, 2022

menski removed the team/process-automation label Jul 11, 2022

korthout mentioned this issue Aug 30, 2022

I can spawn inner instances for a large input collection #2890

Closed

Zelldon added the component/engine label Jan 5, 2023

romansmirnov added the component/zeebe Related to the Zeebe component/team label Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-instance parallel subprocess times out and never completes for large input collections #8687

Multi-instance parallel subprocess times out and never completes for large input collections #8687

lfourky commented Jan 30, 2022 •

edited

Zelldon commented Jan 30, 2022

lfourky commented Jan 31, 2022 •

edited

npepinpe commented Feb 1, 2022

oleschoenburg commented Feb 1, 2022

oleschoenburg commented Feb 1, 2022

lfourky commented Feb 1, 2022

npepinpe commented Feb 2, 2022 •

edited

korthout commented Feb 2, 2022

oleschoenburg commented Feb 2, 2022 •

edited

saig0 commented Feb 3, 2022

npepinpe commented Feb 3, 2022

oleschoenburg commented Feb 3, 2022

oleschoenburg commented Feb 3, 2022

npepinpe commented Feb 3, 2022

npepinpe commented Feb 8, 2022

npepinpe commented Apr 14, 2022 •

edited

oleschoenburg commented Apr 14, 2022

oleschoenburg commented Apr 14, 2022

npepinpe commented Apr 14, 2022

oleschoenburg commented Apr 14, 2022

npepinpe commented Apr 14, 2022

korthout commented Jul 21, 2022

npepinpe commented Jul 22, 2022

korthout commented Jul 25, 2022

oleschoenburg commented Jul 25, 2022

korthout commented Aug 30, 2022

korthout commented Aug 4, 2023 •

edited

korthout commented Aug 4, 2023

npepinpe commented Aug 7, 2023

korthout commented Aug 8, 2023 •

edited

remcowesterhoud commented Aug 8, 2023

Multi-instance parallel subprocess times out and never completes for large input collections #8687

Multi-instance parallel subprocess times out and never completes for large input collections #8687

Comments

lfourky commented Jan 30, 2022 • edited

Zelldon commented Jan 30, 2022

lfourky commented Jan 31, 2022 • edited

npepinpe commented Feb 1, 2022

oleschoenburg commented Feb 1, 2022

oleschoenburg commented Feb 1, 2022

lfourky commented Feb 1, 2022

npepinpe commented Feb 2, 2022 • edited

korthout commented Feb 2, 2022

oleschoenburg commented Feb 2, 2022 • edited

saig0 commented Feb 3, 2022

npepinpe commented Feb 3, 2022

oleschoenburg commented Feb 3, 2022

oleschoenburg commented Feb 3, 2022

npepinpe commented Feb 3, 2022

npepinpe commented Feb 8, 2022

npepinpe commented Apr 14, 2022 • edited

oleschoenburg commented Apr 14, 2022

oleschoenburg commented Apr 14, 2022

npepinpe commented Apr 14, 2022

oleschoenburg commented Apr 14, 2022

npepinpe commented Apr 14, 2022

korthout commented Jul 21, 2022

npepinpe commented Jul 22, 2022

korthout commented Jul 25, 2022

oleschoenburg commented Jul 25, 2022

korthout commented Aug 30, 2022

korthout commented Aug 4, 2023 • edited

Solution 1:

Solution 2:

korthout commented Aug 4, 2023

npepinpe commented Aug 7, 2023

korthout commented Aug 8, 2023 • edited

remcowesterhoud commented Aug 8, 2023

lfourky commented Jan 30, 2022 •

edited

lfourky commented Jan 31, 2022 •

edited

npepinpe commented Feb 2, 2022 •

edited

oleschoenburg commented Feb 2, 2022 •

edited

npepinpe commented Apr 14, 2022 •

edited

korthout commented Aug 4, 2023 •

edited

korthout commented Aug 8, 2023 •

edited