Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include bpmnProcessId as part of exported variable records #8181

Closed
RomanJRW opened this issue Nov 9, 2021 · 6 comments · Fixed by #8803
Closed

Include bpmnProcessId as part of exported variable records #8181

RomanJRW opened this issue Nov 9, 2021 · 6 comments · Fixed by #8803
Assignees
Labels
kind/feature Categorizes an issue or PR as a feature, i.e. new behavior

Comments

@RomanJRW
Copy link
Contributor

RomanJRW commented Nov 9, 2021

Is your feature request related to a problem? Please describe.
When importing variable records and converting them to Optimize variables, we have to fetch the definition from ES to extract the bpmnProcessId in order to route it to the correct Optimize index. This can be expensive, and results in retries if the process hasn't been been exported by zeebe/imported by Optimize. Furthermore, if the process has been exported but the record is no longer available (cleaned up), then the import gets stuck, as seen in a recent support issue.

Describe the solution you'd like
If the bpmnProcessId (like how it is included on the incident records for example) was included as part of the exported variable record, we could improve the performance of import by removing the fetch, and also guard against the possibility of the import getting stuck. If we have the key, Optimize is resilient enough to continue the import even if that process never can be imported.

Describe alternatives you've considered
The current implementation sort of is our alternative, but it is not ideal. The performance hit is somewhat acceptable, but the import blocking issue is more awkward for us to work around. With some refactoring, skipping this data import after x retries could potentially avoid the skip but also exposes the risk of missing data that the user would expect in Optimize

Additional context
Raised as a result of investigation from https://jira.camunda.com/browse/SUPPORT-12111

@RomanJRW RomanJRW added the kind/feature Categorizes an issue or PR as a feature, i.e. new behavior label Nov 9, 2021
@npepinpe
Copy link
Member

Just to make sure I understand, does this mean you correlate variables to their processes via the bpmnProcessId? I'm just asking, because the record already contains the processDefinitionKey, which is what I would recommend you index process definitions on, since each version of the same definition will have a different key. Or is versioning not suitable for your use case, and you want to just aggregate per definition regardless of the version?

@RomanJRW
Copy link
Contributor Author

Exactly. We store all instance data for a given definition across all versions (so by the bpmnProcessId) to a single index. We still store the key and other properties to the document, but it is this bpmnProcessId that is used to create the index or route documents to the correct index.

One of the reasons for this is that in Optimize, it is quite typical for a report to evaluate instances for a given process from across all versions of that process, so it makes sense for these to exist in one index. Another is that splitting this up by version would/could result in too many indices. We have runtime engine customers that have thousands of definitions, with even hundreds of versions for each definition.

@menski menski added this to Planned in Zeebe Nov 15, 2021
@menski
Copy link
Contributor

menski commented Nov 15, 2021

Thanks Josh, we try to solve this during the quarter, but it doesn't have the highest priority. Please let us know if this is an blocking issue so that we have to increase the priority.

@RomanJRW
Copy link
Contributor Author

Thanks @menski. So far we only know of it affecting one customer - if it becomes more frequent we will let you know that it needs bumping 👍

@RomanJRW
Copy link
Contributor Author

Hi @menski, do you think this is likely to be picked up in Q1? We have a focus on import performance this quarter and resolving this would be a nice improvement for Optimize. Thanks 🙂

@npepinpe
Copy link
Member

I can make sure it's there for alpha2 at the latest. Of course we're also happy to review a PR from someone on your team 😉

@npepinpe npepinpe moved this from Planned to Ready in Zeebe Feb 11, 2022
@npepinpe npepinpe self-assigned this Feb 16, 2022
@npepinpe npepinpe moved this from Ready to In progress in Zeebe Feb 16, 2022
@npepinpe npepinpe moved this from In progress to Review in progress in Zeebe Feb 16, 2022
ghost pushed a commit that referenced this issue Feb 17, 2022
8803: Include bpmnProcessId as part of exported variable records r=npepinpe a=npepinpe

## Description

This PR adds a new `bpmnProcessId` property to all `VariableRecordValue` instances. This property is set during processing before the record is written in all cases. However, it is not stored in the state, but only persisted in the log, much like the `VariableRecordValue#getProcessInstanceKey` property.

Doing this allows consumers of the exported record stream to index variables over a process' ID. This use case  is mostly for users who want to aggregate variables over a specific process without having to care about its version (and thus the unique keys for each process/version pair). See the linked issue for more.

NOTE: a small thing jumped out, which is that variables now have a few properties which are really only used for the consumers of the exported stream. I don't expect we add more of these, but if we do, at some point we might want to revise our approach instead of just tacking on the property over each record.

NOTE: the `bpmnProcessId` has an empty string as default value. While we should ensure it's always set in newer versions, I assumed for backwards compatibility this is necessary. Let me know if that's not the case, maybe I forgot how the message pack decoding works.

## Related issues

closes #8181 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
ghost pushed a commit that referenced this issue Feb 17, 2022
8803: Include bpmnProcessId as part of exported variable records r=npepinpe a=npepinpe

## Description

This PR adds a new `bpmnProcessId` property to all `VariableRecordValue` instances. This property is set during processing before the record is written in all cases. However, it is not stored in the state, but only persisted in the log, much like the `VariableRecordValue#getProcessInstanceKey` property.

Doing this allows consumers of the exported record stream to index variables over a process' ID. This use case  is mostly for users who want to aggregate variables over a specific process without having to care about its version (and thus the unique keys for each process/version pair). See the linked issue for more.

NOTE: a small thing jumped out, which is that variables now have a few properties which are really only used for the consumers of the exported stream. I don't expect we add more of these, but if we do, at some point we might want to revise our approach instead of just tacking on the property over each record.

NOTE: the `bpmnProcessId` has an empty string as default value. While we should ensure it's always set in newer versions, I assumed for backwards compatibility this is necessary. Let me know if that's not the case, maybe I forgot how the message pack decoding works.

## Related issues

closes #8181 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@ghost ghost closed this as completed in 650a3cb Feb 17, 2022
Zeebe automation moved this from Review in progress to Done Feb 17, 2022
@KerstinHebel KerstinHebel removed this from Done in Zeebe Mar 23, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes an issue or PR as a feature, i.e. new behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants