Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalStateException when writing decision evaluation event #9272

Closed
remcowesterhoud opened this issue May 2, 2022 · 6 comments · Fixed by #9466
Closed

IllegalStateException when writing decision evaluation event #9272

remcowesterhoud opened this issue May 2, 2022 · 6 comments · Fixed by #9466
Assignees
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug severity/mid Marks a bug as having a noticeable impact but with a known workaround version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Milestone

Comments

@remcowesterhoud
Copy link
Contributor

remcowesterhoud commented May 2, 2022

Describe the bug

When trying to write the decision evaluation event an IllegalArgumentException is thrown. This is because when searching for decision by decision requirements key multiple results with the same decision id are returned:

    final var decisionKeysByDecisionId =
        decisionState
            .findDecisionsByDecisionRequirementsKey(decision.getDecisionRequirementsKey())
            .stream()
            .collect(
                Collectors.toMap(
                    persistedDecision -> bufferAsString(persistedDecision.getDecisionId()),
                    DecisionInfo::new));

These duplicate decision id cause the toMap function to fail, as no merge function is provided.

The found decisions do all have a different version.

To Reproduce

It was a challenge to reproduce this issue but I found a way to do this. It requires 2 DRD's that both contain a decision with the same id and a process which contains a business rule task referencing this decision id.

Repro files.zip

Next follow these steps:

  1. Deploy translateDay.dmn
  2. Deploy translateMonth.dmn
  3. Without making any changes redeploy translateDay.dmn
  4. Deploy translateProcess.dmn
  5. Start a PI: zbctl create instance translateProcess --insecure --variables '{"day":"monday","month":"april"}'

At this point an exception should be thrown.

Expected behavior

No exception should occur.

Log/Stacktrace

Full Stacktrace

java.lang.IllegalStateException: Duplicate key translate (attempted merging values DecisionInfo[key=2251799813685250, version=1] and DecisionInfo[key=2251799813685255, version=3])
	at java.util.stream.Collectors.duplicateKeyException(Collectors.java:135) ~[?:?]
	at java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:182) ~[?:?]
	at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[?:?]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnDecisionBehavior.writeDecisionEvaluationEvent(BpmnDecisionBehavior.java:233) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnDecisionBehavior.lambda$evaluateDecision$3(BpmnDecisionBehavior.java:114) ~[classes/:?]
	at io.camunda.zeebe.util.Either$Right.flatMap(Either.java:366) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.behavior.BpmnDecisionBehavior.evaluateDecision(BpmnDecisionBehavior.java:109) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.task.BusinessRuleTaskProcessor$CalledDecisionBehavior.lambda$onActivate$0(BusinessRuleTaskProcessor.java:89) ~[classes/:?]
	at io.camunda.zeebe.util.Either$Right.flatMap(Either.java:366) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.task.BusinessRuleTaskProcessor$CalledDecisionBehavior.onActivate(BusinessRuleTaskProcessor.java:89) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.task.BusinessRuleTaskProcessor.onActivate(BusinessRuleTaskProcessor.java:40) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.task.BusinessRuleTaskProcessor.onActivate(BusinessRuleTaskProcessor.java:21) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.lambda$processEvent$2(BpmnStreamProcessor.java:128) ~[classes/:?]
	at io.camunda.zeebe.util.Either$Right.ifRightOrLeft(Either.java:381) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processEvent(BpmnStreamProcessor.java:127) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.lambda$processRecord$0(BpmnStreamProcessor.java:110) ~[classes/:?]
	at io.camunda.zeebe.util.Either$Right.ifRightOrLeft(Either.java:381) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.bpmn.BpmnStreamProcessor.processRecord(BpmnStreamProcessor.java:107) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.TypedRecordProcessor.processRecord(TypedRecordProcessor.java:58) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.lambda$processInTransaction$3(ProcessingStateMachine.java:300) ~[classes/:?]
	at io.camunda.zeebe.db.impl.rocksdb.transaction.ZeebeTransaction.run(ZeebeTransaction.java:84) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.processInTransaction(ProcessingStateMachine.java:290) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.processCommand(ProcessingStateMachine.java:253) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.tryToReadNextRecord(ProcessingStateMachine.java:213) ~[classes/:?]
	at io.camunda.zeebe.engine.processing.streamprocessor.ProcessingStateMachine.readNextRecord(ProcessingStateMachine.java:189) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorJob.invoke(ActorJob.java:79) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorJob.execute(ActorJob.java:44) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:97) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorThread.doWork(ActorThread.java:80) ~[classes/:?]
	at io.camunda.zeebe.util.sched.ActorThread.run(ActorThread.java:189) ~[classes/:?]

Environment:

  • OS: Cloud
  • Zeebe Version: Seen on 8.0.0, also happens on latest main
@remcowesterhoud remcowesterhoud added kind/bug Categorizes an issue or PR as a bug area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) team/process-automation labels May 2, 2022
@remcowesterhoud
Copy link
Contributor Author

remcowesterhoud commented May 2, 2022

My hypotheses on what happens:

  1. translateDay is deployed. A new key is generated for the DRG. The translate decision is deployed with version 1.
  2. translateMonth is deployed. A new key is generated for the DRG. The translate decision already exists, so the version gets incremented to version 2.
  3. translateDay gets redeployed. There were no changed so the DRG keeps the same key. The translate decision is different from the one in version 2. The version gets incremented to version 3.
  4. When we search for the decisions by the DRG key we will find 2 as the key didn't change on the deploy. We will find both versions 1 and 3.

It is a bit of a strange situation which occurs when there are multiple decisions containing the same id. I see a couple of solutions here:

  • We don't allow a DRG to be deployed if it contains decisions with an id that's already being used. This causes unpredictable behaviour anyway as it's unclear which of the decision gets called. It seems like it is always the latest deployed (highest version).
  • We generate a new key for the DRG when a change in one of the decision has been detected.
  • We add a merge function which grabs the latest version available.

Thoughts @korthout & @saig0?

@remcowesterhoud remcowesterhoud added the severity/mid Marks a bug as having a noticeable impact but with a known workaround label May 2, 2022
@remcowesterhoud
Copy link
Contributor Author

Marked as mid severity as it is an edge-case that's easily resolvable if it does occur.

@npepinpe npepinpe added this to the 8.1 milestone May 3, 2022
@korthout
Copy link
Member

korthout commented May 3, 2022

Nice work figuring out what happened @remcowesterhoud 🎉 I really like it

We don't allow a DRG to be deployed if it contains decisions with an id that's already being used. This causes unpredictable behaviour anyway as it's unclear which of the decision gets called. It seems like it is always the latest deployed (highest version).

I don't think it's unpredictable behavior. We wanted it to evaluate the latest deployed version of the decision with the decision id. Similar to call activity that starts an instance of the latest deployed process with the process id. I understand that these are somewhat different because the decision is inside another layer (the DRG), but I think that keeping the behavior aligned between processes and decisions keeps things the easiest to understand for users.

In addition, we should not reject DRGs with decision ids that are already in use, because it allows a user to replace the existing decision which might be useful while we don't have undeploy decision api.

We generate a new key for the DRG when a change in one of the decisions has been detected.

Again, I would align with Process deployments. Please correct me if I'm wrong, but I think deploying a process a second time with some differences leads to the same key with a different version.

We add a merge function which grabs the latest version available.

I think we should do this as it fixes the problem without changing the key and version logic 👍

@saig0
Copy link
Member

saig0 commented May 4, 2022

Please correct me if I'm wrong, but I think deploying a process a second time with some differences leads to the same key with a different version.

No. The key of a process is unique. A new version of the process gets a new key.

@remcowesterhoud
Copy link
Contributor Author

@saig0 Would you prefer option 2 over option 3 in this case?

@saig0
Copy link
Member

saig0 commented May 4, 2022

Would you prefer option 2 over option 3 in this case?

Yes. I prefer to generate a new key and increase the version of the DRG if the version of one of the containing decisions is increased.

zeebe-bors-camunda bot added a commit that referenced this issue Jun 1, 2022
9475: [Backport stable/8.0] Increase DRG version if another DRG was deployed with the same decisions r=saig0 a=backport-action

# Description
Backport of #9466 to `stable/8.0`.

relates to #9272

Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug severity/mid Marks a bug as having a noticeable impact but with a known workaround version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants