To big Deployment can cause problems on distribution #5776

Zelldon · 2020-11-06T14:59:05Z

Describe the bug

If we reach with a deployment the maxMessageSize almost then we can get issues with the distributions, since the CREATE command which will be distributed contains more content then the initial CREATE. This means we deploy a workflow, which can only be started on partition one but not on other partitions.

To Reproduce

public final class DeploymentClusteredTest {

  private static final BpmnModelInstance WORKFLOW =
      Bpmn.createExecutableProcess("process").startEvent().endEvent().done();

  public final Timeout testTimeout = Timeout.seconds(120);
  public final ClusteringRule clusteringRule =
      new ClusteringRule(3, 3, 3, cfg -> cfg.getData().setUseMmap(false));
  public final GrpcClientRule clientRule = new GrpcClientRule(clusteringRule);

  @Rule
  public RuleChain ruleChain =
      RuleChain.outerRule(testTimeout).around(clusteringRule).around(clientRule);

  @Test
  public void shouldDeployWorkflowAndCreateInstances() {
    // when
    final var workflowKey =
        clientRule.deployWorkflow(
            Bpmn.readModelFromStream(
                this.getClass().getResourceAsStream("/workflows/bigone-task-process.bpmn")));

    final var workflowInstanceKeys =
        clusteringRule.getPartitionIds().stream()
            .map(
                partitionId ->
                    clusteringRule.createWorkflowInstanceOnPartition(partitionId, "process"))
            .collect(Collectors.toList());

    // then
    assertThat(
            RecordingExporter.workflowInstanceRecords(WorkflowInstanceIntent.ELEMENT_COMPLETED)
                .filterRootScope()
                .withWorkflowKey(workflowKey)
                .limit(clusteringRule.getPartitionCount()))
        .extracting(Record::getKey)
        .containsExactlyInAnyOrderElementsOf(workflowInstanceKeys);
  }
}

model.zip

Expected behavior

That the deployment is rejected, maybe?

Environment:

OS: arch
Zeebe Version: snapshot
Configuration: [e.g. exporters etc.]

The text was updated successfully, but these errors were encountered:

npepinpe · 2020-11-09T08:38:57Z

What does the user/client see in this case? Is this something users can "fix"?

npepinpe · 2021-07-14T15:17:34Z

It looks like this is the error the user gets:

Command 'CREATE' rejected with code 'PROCESSING_ERROR': Expected to process event 'TypedEventImpl{metadata=RecordMetadata{recordType=COMMAND, intentValue=255, intent=CREATE, requestStreamId=1, requestId=0, protocolVersion=3, valueType=DEPLOYMENT, rejectionType=NULL_VAL, rejectionReason=, brokerVersion=1.2.0}, value={"resources":[{"resourceName":"process.bpmn","resource":"PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8+CjxicG1uOmRlZmluaXRpb25zIHhtbG5zOmJwbW49Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0JQTU4vMjAxMDA1MjQvTU9ERUwiIHhtbG5zOmJwbW5kaT0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvQlBNTi8yMDEwMDUyNC9ESSIgeG1sbnM6ZGM9Imh0dHA6Ly93d3cub21nLm9yZy9zcGVjL0RELzIwMTAwNTI0L0RDIiB4bWxuczpkaT0iaHR0cDovL3d3dy5vbWcub3JnL3NwZWMvREQvMjAxMDA1MjQvREkiIHhtbG5zOnhzaT0iaHR0cDovL3d3dy53My5vcmcvMjAwMS9YTUxTY2hlbWEtaW5zdGFuY2UiIHhtbG5zOnplZWJlPSJodHRwOi8vY2FtdW5kYS5vcmcvc2NoZW1hL3plZWJlLzEuMCIgZXhwb3J0ZXI9IkNhbXVuZGEgTW9kZWxlciIgZXhwb3J0ZXJWZXJzaW9uPSIxLjguMiIgZXhwcmVzc2lvbkxhbmd1YWdlPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L1hQYXRoIiBpZD0iRGVmaW5pdGlvbnNfMSIgdGFyZ2V0TmFtZXNwYWNlPSJodHRwOi8vYnBtbi5pby9zY2hlbWEvYnBtbiIgdHlwZUxhbmd1YWdlPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYSI+CiAgICAKICA8YnBtbjpwcm9jZXNzIGlkPSJwcm9jZXNzIiBpc0Nsb3NlZD0iZmFsc2UiIGlzRXhlY3V0YWJsZT0idHJ1ZSIgcHJvY2Vzc1R5cGU9Ik5vbmUiPgogICAgICAgIAogICAgPGJwbW46c3RhcnRFdmVudCB...}' without errors, but exception occurred with message 'Expected to claim segment of size 8358688, but can't claim more than 4194304 bytes.'.

Which somewhat hints that it's a size issue, but I don't think it's very clear for the user what they have to do. I would argue this isn't deployment specific - any time we would fail to grab a segment on the dispatcher due to size, this will be an issue and this error will be returned. I imagine this can happen during processing of other commands as well (even if it's less likely).

In this specific case, this could be solved by checking the size before even writing it to the dispatcher - since it's much bigger than a segment, it's pretty obvious it will fail. If it happens later during enriching this is then internal and it's a bit of a problem, and I'm not sure how it will end up being reported. imo this falls under the whole topic of dealing with maximum message sizes and so on, and I'd like to tackle that at a higher level. For this particular issue, I'm not sure how we could improve the report to the user, as this is a generic error handling.

Zelldon · 2022-08-01T18:48:23Z

Seems to be no longer the case. @korthout tried to reproduce this issue during the preparation of a game day. I will close this for now.

* feat(feature-flagged): add header for left diagram * refactor: rename computed value from isLastStep to isSummaryStep * chore: set IS_INSTANCE_MIGRATION_ENABLED to false

Zelldon added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog Impact: Availability severity/mid Marks a bug as having a noticeable impact but with a known workaround labels Nov 6, 2020

npepinpe added Priority: Mid and removed Status: Needs Priority labels Nov 9, 2020

npepinpe removed Status: Planned labels May 6, 2021

npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Availability labels Apr 11, 2022

pihme added the team/process-automation label Jun 3, 2022

menski removed the team/process-automation label Jul 11, 2022

korthout mentioned this issue Aug 1, 2022

Too big Deployment is no longer rejected #9946

Closed

Zelldon closed this as completed Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To big Deployment can cause problems on distribution #5776

To big Deployment can cause problems on distribution #5776

Zelldon commented Nov 6, 2020

npepinpe commented Nov 9, 2020

npepinpe commented Jul 14, 2021

Zelldon commented Aug 1, 2022

To big Deployment can cause problems on distribution #5776

To big Deployment can cause problems on distribution #5776

Comments

Zelldon commented Nov 6, 2020

npepinpe commented Nov 9, 2020

npepinpe commented Jul 14, 2021

Zelldon commented Aug 1, 2022