Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A workflow with start event message will deploy but fail to start if it has an event sub process #4099

Closed
allanbarklie opened this issue Mar 20, 2020 · 12 comments · Fixed by #9047
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request version:1.3.8 version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@allanbarklie
Copy link

allanbarklie commented Mar 20, 2020

Describe the bug
A workflow that is started with a start event message will deploy but fail to start if it has an event sub process (this is not the same as the deploy time issue caused by the modeller leaving out correlationkey ).

Variables added to the process start message don't seem to visible when the broker tries to subscribe the event sub process to its start message so it it complains it can't see the correlation key (it looks for the right variable but can't find it).

This also sometimes causes the operate UI to get in bad state where it always reports the incident but can't always show the instance details.

Workaround: Placing the event subprocess within a wrapping subprocess fixes the issue.

To Reproduce
Steps to reproduce the behavior (using attached files)
zbctl --insecure deploy correlationIssue.bpmn
zbctl --insecure deploy correlationIssueWorkaround.bpmn

(escape as needed for your shell)
zbctl --insecure publish message myStartMessage --variables '{"orderID":123}' --correlationKey 123

observe behaviour in operate UI
Process_CorrelationIssue fails with an incident (operate may or may not let you inspect it)
Process_CorrelationIssueWorkaround successfully starts

Expected behavior
both workflows start without error

Log/Stacktrace
Process_CorrelationIssue Failed to extract the correlation-key by 'orderID': no value found

Environment:

  • OS: Linux

Related Support Case: SUPPORT-12762

correlationIssue.bpmn.txt
issue

correlationIssueWorkaround.bpmn.txt
workaround

  • Zeebe Version: 0.22.1
  • Configuration: simple setup based on standard helm chart with reduced resources
@allanbarklie allanbarklie added the kind/bug Categorizes an issue or PR as a bug label Mar 20, 2020
@saig0
Copy link
Member

saig0 commented Mar 23, 2020

@allanbarklie, thank you for reporting 👍

I can reproduce the behavior with the given workflow. The variables are not created (i.e. no variable records are present). So, an incident is raising when try to open the message subscription.

@saig0 saig0 added scope/broker Marks an issue or PR to appear in the broker section of the changelog Status: Ready labels Mar 23, 2020
@pihme pihme self-assigned this Apr 2, 2020
@pihme
Copy link
Contributor

pihme commented Apr 2, 2020

@saig0 Should we clone this issue so that the Operate team can look at what is happening on their side?

If this issue it fixed, it might hide the behavior of Operate again.

@saig0
Copy link
Member

saig0 commented Apr 2, 2020

Good point 👍

@pihme
Copy link
Contributor

pihme commented Apr 15, 2020

Summary of analysis results:

The error occurs, because the variables are transferred into the state after the subscriptions are created. The bug is valid, but there is no immediately obvious fix to this.

The sequence of events is as follows:

  • StartEventEventOccurredHandler is called for the start event
    • This appends a follow up event ELEMENT_ACTVATING for the process
    • It also adds a deferred event ELEMENT_ACTIVATING for the start event itself
  • ActivityElementActivatingHandler is called for the process (processing the ELEMENT_ACTIVATING intent for the process)
    • This tries to create the event subscriptions
    • This fails, because no variables have been copied yet

If it wouldn't fail, it would proceed as follows:

  • ELEMENT_ACTIVATED for the process
  • ELEMENT_ACTIVATING for the start event
  • ELEMENT_ACTIVATED for the start event
  • ELEMENT_COMPLETING for the start event
  • ELEMENT_COMPLETED for the start event

The variables would be copied during the completion of the start event (either in COMPLETING or COMPLETED stage). This happens at this time, because there might be input/output mappings for the start event. The workflow scope should only be filled with variables, after input/output mapping has been applied to the message.

The correlation key expression in the message on the event sub process should work in a scope after input/output mapping of the start event has been applied.

There is no obvious fix, because several constraints are in conflict here:

  • Process activation must fail if event subscriptions fail
  • correlation key must be evaluated against scope after input/output mapping of start event has been applied
  • Output mapping is applied when the start event is completed
  • Process activation shall happen while start event is being activated

We decided to not proceed with the bugfix at this moment.

@pihme
Copy link
Contributor

pihme commented Apr 15, 2020

Regarding the related issue for the Operate team:

  • It was clarified that incidents can also be raised on the process level (not just in this scenario, but in general)
  • an issue was added to the Operate backlog to visualize such incidents

@pihme pihme removed their assignment Apr 15, 2020
@allanbarklie
Copy link
Author

@pihme Thanks for looking into this. From your analysis can you confirm if my above workaround (using an extra sub process) is safe/guaranteed to work?

Would there be anything about how that works that might help inspire a fix?

@pihme
Copy link
Contributor

pihme commented Apr 15, 2020

tldr; I can confirm your workaround is safe.

Long version:

First a little caveat, I joined the Zeebe team two months ago. So I am by no means an expert of BPMN, or Zeebe.

To the best of my knowledge, your work around is safe to work. It displays a good separation of concerns in the sense, that there is one message to start the process and another message to trigger the event sub process. This makes intuitive sense to me, and I can see how this can be used in practical projects. Also, by moving the event sub process into a dedicated sub process, you avoid any ambiguities about the precise sequencing of steps when the process is instantiated.

I yet have to understand the practical application of the process for which you filed this bug report. What is the use case where the very same message should create a process and be processed by one of the newly created processes' event sub process?

The fact, that I cannot imagine such a use case is not meant to be derogatory. Instead, it is my lack of experience and also my curiosity. In particular, if there is a concrete use case, it would be great to know about it. Not least, because I have a sense that this specific constellation is perhaps not sufficiently detailed spec-wise - so knowing about user's expectation could weigh in when looking for solutions.

Whether your work around will inspire a more general solution, I can only speculate about this. It is conceivable that we implicitly add a sub process on the fly. However, the behavior would change in subtle ways if we do so. And I still need to learn more about what the BPMN spec mandates, and where we have wiggle room.

My only caution about adopting this strategy today, is that I don't know if the work around can be generalized. Workflows can have several start events, and different types of start events. I am not sure whether this work around works for the general case.

On the flip side, the problem you found is a general one. Like, you could also draw a process with a message start event and a timer based event sub process, with FEEL expressions to configure the timer. I haven't tested it, but it should run into the same issue you reported.

So nice catch!

@npepinpe npepinpe added Impact: Usability severity/mid Marks a bug as having a noticeable impact but with a known workaround labels Apr 28, 2020
@allanbarklie
Copy link
Author

@pihme
Thanks for the update.
(I am also new to this set of technologies)

I'd like to explain my use case to help you understand/judge the validity/severity of the defect.

One clarification to start with - the detail in the file correlationIssue.bpmn.txt where both the main and sub process start messages have the same name and ID was not intended. Sorry for that. The defect can also be observed where the main and sub process have different names and IDs. My use case expects different ones.
The important details to reproduce the defect are that both the main process and sub process are started by events and that the sub process is not shielded by a further sub process.
Here is a corrected file that also reproduces the problem.
correlationIssue2.bpmn.txt

The use case is one where the sub process actually starts a call activity that defines some other workflow.
Thee main process is a workflow designed to start the call activity and to also (in parallel) monitor the call activity and perform actions based on events from the call activity.
There is a desire to start the call activity as quickly as possible.

Looking at this again I now realise that the same pattern can be achieved by simply using two parallel gateways one at the beginning and one at the end.

correlationIssueWorkaround2
correlationIssueWorkaround2.bpmn.txt

I think the only difference is clutter on the diagram (not obvious in the simplified versions here).
In which case the defect remains but is probably even more of a corner case and probably no longer an issue for me.

@pihme
Copy link
Contributor

pihme commented Apr 28, 2020

@allanbarklie
Thanks for detailing your use case.

I am just going to quickly react to some keywords I read in your description:

Anyway, I will shut up now. As much fun as it is to talk to a user of our software, there are people better qualified than me to accompany you on your learning journey.

In particular, I can recommend

Both are free and we have moderators/DevRel specialists for each. Feel free to check it out!

@npepinpe
Copy link
Member

npepinpe commented Jul 5, 2021

@saig0 - in #4400, you mentioned that both this issue and 4400 had the same root cause. 4400 is marked as fixed, therefore can I assume this issue is fixed as well?

@saig0
Copy link
Member

saig0 commented Jul 5, 2021

@npepinpe no, this is a different problem. Here, we're not able to handle a workflow with a message start event and a message event subprocess.

@npepinpe npepinpe moved this from Planned to Ready in Zeebe Feb 11, 2022
@npepinpe npepinpe added the support Marks an issue as related to a customer support request label Feb 21, 2022
@npepinpe
Copy link
Member

Added support label for: https://jira.camunda.com/browse/SUPPORT-12762

@pihme pihme self-assigned this Mar 15, 2022
@KerstinHebel KerstinHebel removed this from Ready in Zeebe Mar 23, 2022
@pihme pihme mentioned this issue Mar 31, 2022
10 tasks
pihme added a commit that referenced this issue Apr 4, 2022
author: Philipp Ossler

The subscriptions will now be opened after the output mappings of the start event have been applied. This is in line with user expectations and fixes issue #4099
zeebe-bors-camunda bot added a commit that referenced this issue Apr 6, 2022
9047: Fix 4099 by opening event subscriptions after start event's output mappings were applied r=pihme a=pihme

## Description

This moves the step when event subscriptions are opened from `ProcessProcessor` and `SubprocessProcessor` to `StartEventProcessor`. This way, subscriptions and the values used for them will only be evaluated after output mappings of the start event have been applied.

This fits better to user expectations and fixes #4099 

## Related issues

closes #4099



Co-authored-by: pihme <pihme@users.noreply.github.com>
github-actions bot pushed a commit that referenced this issue Apr 6, 2022
author: Philipp Ossler

The subscriptions will now be opened after the output mappings of the start event have been applied. This is in line with user expectations and fixes issue #4099

(cherry picked from commit f09c61e)
github-actions bot pushed a commit that referenced this issue Apr 6, 2022
author: Philipp Ossler

The subscriptions will now be opened after the output mappings of the start event have been applied. This is in line with user expectations and fixes issue #4099

(cherry picked from commit f09c61e)
zeebe-bors-camunda bot added a commit that referenced this issue Apr 11, 2022
9064: [Backport stable/1.3] Fix 4099 by opening event subscriptions after start event's output mappings were applied r=korthout a=github-actions[bot]

# Description
Backport of #9047 to `stable/1.3`.

relates to #4099 #4099

Co-authored-by: pihme <pihme@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Apr 11, 2022
9065: [Backport stable/8.0] Fix 4099 by opening event subscriptions after start event's output mappings were applied r=korthout a=github-actions[bot]

# Description
Backport of #9047 to `stable/8.0`.

relates to #4099 #4099

Co-authored-by: pihme <pihme@users.noreply.github.com>
@deepthidevaki deepthidevaki added the version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 label May 3, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
j-lindner added a commit to camunda-community-hub/message-correlator that referenced this issue Feb 13, 2023
Migrating to 8.1.X required to have this bugfix which is required: camunda/camunda#4099
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request version:1.3.8 version:8.1.0-alpha1 Marks an issue as being completely or in parts released in 8.1.0-alpha1 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
7 participants