Weird throughput pattern #8244

Zelldon · 2021-11-19T08:06:53Z

Describe the bug

Due to recent discussions I have run a benchmark with different process models, actually I just started a benchmark with make all. This causes to run process models with only start-end, process model with intermediate timer catch event (PT1S) and a process model

Models

Observation

We can observe a weird throughput pattern, which is recurring every ~20 mins

If we take a look what happens on the drops/spikes we can see that a lot of jobs are completed at once

Interesting is that the state seems to grow and then shrink again, after all jobs are completed

If we take a look at the created - completed instance, we can see that we accumulate instances/jobs, which are then at some point released/completed again.

This sounds like it is related to issues like #7955 #8132

To Reproduce

just run a benchmark with make all, but make sure to configure the starters to a rate of 100.

Expected behavior

We expect to complete 300 process instances per second and that the throughput graph is stable.

Log/Stacktrace

Nothing

Environment:

OS:
Zeebe Version: develop
Configuration: benchmark

The text was updated successfully, but these errors were encountered:

Zelldon · 2021-11-22T08:28:49Z

After scaling the workers down it looks much better

It seems that due to lot of workers we can disrupt the throughput, question would be for me whether we want to investigate here more or just document it better for users, which will stumble over this.

menski · 2021-11-22T08:45:06Z

I think we need more investigation to better understand the client behavior in this case, i.e. what are the metrics of the clients, why are there bursts of activation and completion. Is there an impact of back-off in the client.

menski · 2021-11-22T08:46:32Z

Maybe a potential topics for chaos days, i.e. the number of workers should not impact the throughput of the cluster /cc @Zelldon

deepthidevaki · 2021-11-26T08:22:52Z

I see similar behaviours in our weekly benchmark.

It is not consistent. Sometimes it occurs frequently, othertimes the throughput is flat.

pihme · 2022-05-31T08:42:57Z

Is the timer set to 20 mins by any chance? Then it might be related to the blocking looping over all timers

Zelldon · 2022-06-10T09:19:54Z

Lets try to reproduce again, if it still happens lets try with our new shiny due date feature flag and document the result here. If it is not failing let's just close it.

Zelldon · 2022-06-21T12:51:02Z

I'm not able to reproduce this. I run make all which deploys the timers, simple starter and normal starters.

Throughput looks stable. I will close this.

Zelldon added kind/bug Categorizes an issue or PR as a bug area/performance Marks an issue as performance related severity/low Marks a bug as having little to no noticeable impact for the user labels Nov 19, 2021

menski added the blocker/info Marks an issue as blocked, awaiting more information from the author label Nov 22, 2021

Zelldon mentioned this issue Nov 24, 2021

Too many workers can break cluster performance #8267

Closed

oleschoenburg mentioned this issue Dec 21, 2021

Performance regression in 1.3.0-alpha2 #8425

Closed

pihme added team/distributed labels May 31, 2022

Zelldon closed this as completed Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird throughput pattern #8244

Weird throughput pattern #8244

Zelldon commented Nov 19, 2021

Zelldon commented Nov 22, 2021

menski commented Nov 22, 2021

menski commented Nov 22, 2021

deepthidevaki commented Nov 26, 2021

pihme commented May 31, 2022

Zelldon commented Jun 10, 2022 •

edited

Zelldon commented Jun 21, 2022

Weird throughput pattern #8244

Weird throughput pattern #8244

Comments

Zelldon commented Nov 19, 2021

Models

Observation

Zelldon commented Nov 22, 2021

menski commented Nov 22, 2021

menski commented Nov 22, 2021

deepthidevaki commented Nov 26, 2021

pihme commented May 31, 2022

Zelldon commented Jun 10, 2022 • edited

Zelldon commented Jun 21, 2022

Zelldon commented Jun 10, 2022 •

edited