Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Artillery prevent coordinated omission? #721

Closed
danielcompton opened this issue Jul 17, 2019 · 13 comments
Closed

Does Artillery prevent coordinated omission? #721

danielcompton opened this issue Jul 17, 2019 · 13 comments

Comments

@danielcompton
Copy link

danielcompton commented Jul 17, 2019

Coordinated omission is a term coined by Gil Tene to describe the phenomenon when the measuring system inadvertently coordinates with the system being measured in a way that avoids measuring outliers.

One example of how this can happen would be if a load tester waits to send a request until the previous one has completed. If the load tester is testing 10 req/s and a request normally takes 50ms each request will return before the next one is due to be sent. However if the whole system occasionally pauses for 5 seconds, the load tester would not send any requests during this 5 second period. The load test would record a single bad outlier that took 5 seconds.

If the load tester was firing requests consistently then it would have made 100 requests during the 5 second pause time, these requests were omitted. If these requests were made during the pause time, then the latency percentiles would look very different and more accurately capture the systems behaviour under load.

There are a number of good videos and blog posts which discuss this more. I was evaluating artillery and wanted to see if it accounted for coordinated omission, but couldn't see any discussion of it in issues or code. Is this something that artillery tries to prevent?

@hassy
Copy link
Member

hassy commented Sep 7, 2019

In a word, 'no', it doesn't suffer from coordinated omission. Artillery has a concept of virtual users which arrive and make requests independently of other virtual users. A given user will indeed wait for request 1 to complete before sending request 2, but another virtual user will be trying to send its own requests at the same time too.

@bbros-dev
Copy link

@hassy I stand to be corrected, but I don't believe that is enough to avoid CO.

Specifically, how is the response time for user 1 measured when they make their next request? Or does the response time get measured elsewhere?

@bbros-dev
Copy link

@hassy, on reflection you might sidestep CO if each virtual user only ever makes one request - but then how realistic are the test results as a representation of a user experience?

Ack Artillery seems to focus on generating loads to test. But at least this use case is currently ruled out:

Run smoke tests continuously against production to catch issues (also known as production scripted testing or synthetic monitoring)

And this Feature is not one:

Performance metrics: get detailed performance metrics (latency, requests per second, concurrency, throughput). Track custom metrics with high precision (histograms, counters and rates)

@hassy
Copy link
Member

hassy commented Jun 14, 2021

@bbros-dev - a single virtual user sends requests in a sequence, just like most real-world clients would, so a single VU is a "closed loop", and would indeed "coordinate" with an overloaded/slow server, however other VUs are completely independent - so new VUs will continue to arrive, and existing VUs will continue trying to send their next request regardless.

Re smoke testing - this official plugin enables smoke testing with Artillery:
https://github.com/artilleryio/artillery-plugin-expect

This doc describes how you can track custom metrics:
https://artillery.io/docs/guides/guides/extending.html#Tracking-custom-metrics

@bbros-dev
Copy link

bbros-dev commented Jun 15, 2021

@hassy, appreciate the clarification.

however other VUs are completely independent - so new VUs will continue to arrive, and existing VUs will continue trying to send their next request regardless.

Again, happy to be corrected: while VU arrivals maybe independent of each other I don't see that eliminating the effects on CO - correcting myself here: single request independent VUs don't mean that:

In a word, 'no', it doesn't suffer from coordinated omission.

You're infrastructure is knotted/slow/stressed VU:A has their response data being CO. VU:B arrives independently of VU:A.... say it arrives after the cause of the CO of VU:A has passed.... surely this is a counter example to your claim that multiple VU protects against CO?

however other VUs are completely independent

I don't believe this has the effect you hope - the counter example above proved that - but this should make clear the opposite would need to be true:
In fact for the VUs setup/feature to eliminate CO they would have to arrive dependent on the previous arrival.
Specifically, simplify by considering a single-request VU configuration: If you're testing 1000 req/s then each VU will have to arrive spaced 1/1000 sec apart, in order to mitigate the effects of CO.

Now the data you report from these requests likely is less subject to CO - as soon as you relax the 1-req per VU you'll be back to reporting noise.

The dependent VU arrival suggests a possible quick fix for the 1-req per VU scenario.

Anyway this issue is open, so it seem to be acknowledged as an open issue.

@hassy
Copy link
Member

hassy commented Jun 15, 2021

Artillery's arrival rates are an open system. CO does not affect open loop systems because new work arrives regardless of what's being processed. CO can happen only when new work is not submitted until work already in the queue has been completed. This is why it affects tools like ab and wrk which use a fixed number of threads on which to send requests, and that's how Gil Tene's wrk2 gets around the problem by sending requests at a constant rate.

Consider what would happen if you ran Artillery with arrival rate = 50 on a service which locks up completely for 5s at some point, but responds in 1ms at all other times. You'd have up to 50 * 5 = 250 VUs recording and reporting outsized response times in that time period.

Anyway this issue is open, so it seem to be acknowledged as an open issue.

Not really, the issue status does not necessarily mean acknowledgement (especially before Github had support for separate discussions area). Just for the sake for clarity for anyone else reading this discussion - Artillery does not suffer from coordinated omission.

I will write this up for Artillery's docs, and we can then close the issue.

@hassy
Copy link
Member

hassy commented Jun 15, 2021

Something else worth clarifying - if you need to test a single endpoint with constant RPS, don't use Artillery, use wrk2 or Vegeta or autocannon instead. Artillery is designed for testing transactional scenarios with dependent steps, e.g. mimicking a large number of clients using an e-commerce API where each client:

  1. Searches for a product
  2. Loads product details for one or more search results returned in (1)
  3. Adds some products from (2) to cart
  4. Starts a checkout process

Such real-world scenarios by definition cannot be tested at pre-set constant RPS, because requests depend on each other. But just like in the real-world, Artillery lets you model a large number of those clients arriving to use the service independently of each other.


And on another note, for anyone looking to understand the difference between open and closed models, and why Artillery's hybrid open-closed model is the best model for most real-world scenarios, this paper is a good read:

https://www.usenix.org/legacy/event/nsdi06/tech/full_papers/schroeder/schroeder.pdf

@bbros-dev
Copy link

bbros-dev commented Jun 15, 2021

@hassy thanks for taking the time to clarify. I believe we differ by degree: I think if you carefully configure and operate artillery you can likely mitigate CO. If I understand correctly you suggest regardless of configuration and use artillery is immune from CO?

Restating the problem: We're trying to uncover the unknown distribution of the latency of some system, possibly made up of many sub-systems, steps or processes that run sequentially and/or in parallel. But from the PoV of an end user.
Gil Tene noticed, implicitly, that actually you have a mixture of (conditional) distributions problem. You're recovering a mixed distribution as the distribution of your systems latency. What you observe for your latency distribution is conditional on the request arrival distribution - his CO point was this is fatal if you make the next request further conditional on the preceding request completing. Now, by making assumptions and doing a whole lot of estimation and analytics you probably could recover a 'pure' latency distribution (would this be the true distribution is an age old discussion, not for here). But that takes time, is imprecise, and is difficult.

Better solution is to strip out the randomness of the arrival process. You can do that by ensuring requests are wholly predictable. Any deterministic process would be fine, but the easiest is to use a fixed interval at which requests are started. With this you recover the empirical distribution of the system latency, and leave to the side the question of whether this is the true distribution - this is why we use PRS in the 100's or 1000's because tail estimation is hard and you need very, very large samples before detecting changes in your tail behavior start to become reliable.

That solution has nothing to do with thread counts. It has nothing to do with open or closed any thing.

Higher thread counts, conditional on how many requests they queue, can mitigate the effects of CO, can that mitigation be enough to consider the effect eliminated? Depends on what is happening on each thread in terms of sample sizes being generated and the CO event(s) distribution.

Perhaps to help users, who maybe unaware of the problem, you'd consider adding to the docs Gil's succinct and practical description of the problem with just thinking you can throw more threads at the problem (notice his premise, in the first quote, is that you've accepted the solution is a particular deterministic arrival process):

Gil Tene:
A single client system is the most likely to exhibit this problem, but most multi-threaded testers exhibit it as well. It's actually a per-thread (local to the testing thread) problem, which occurs whenever an observed response time is longer than the interval that the testing thread is trying to maintain between requests. ....
.... The expectation is that if a request was sent with no warning, and at a random point in time, it will still have a roughly 99.99% chance of seeing the touted latency. But what each tester thread is actually doing as a result of waiting for a long response before sending the next one is the equivalent of asking the system "is it ok for me to test your latency right now? Because if it's not, I'm fine waiting and testing it only when you are ready..."

Yes systems have many components, some sequential some parallel.
Yes users have to design their tests so that they are measuring the latency they care about.

@hassy:
Something else worth clarifying - if you need to test a single endpoint with constant RPS, don't use Artillery, use wrk2 or Vegeta or autocannon instead.

That clarification would help, but from experience not all testers have internalized Gil's insights, or similar ones, and don't understand that "constant RPS" means strip out the randomness of the arrivals process (especially the effect of queuing arrivals).
Maybe the docs could guide them a little more. Perhaps even adding Gil's scenario to the docs:

"is it ok for me to test your latency right now? Because if it's not, I'm fine waiting and testing it only when you are ready..."

If that scenario makes business sense in your testing case using artillery is (should mostly be?) fine for your purposes.
If that scenario does not make business sense in your testing case you likely want to test with a constant RPS, and you should instead use wrk2, Vegeta, autocannon or any tool that is immune to, or corrects for, CO.

@bbros-dev
Copy link

bbros-dev commented Jun 16, 2021

Artillery lets you model a large number of those clients arriving to use the service independently of each other.

What I struggled to find and answer for is does artillery allow me to guarantee those "clients" are one shot clients?
That is guarantee that once the have competed 1), 2), 3), 4). They don't try to do that sequence again?

@hassy
Copy link
Member

hassy commented Jun 16, 2021

@bbros-dev Thank you for the discussion! It's an interesting subject, and the thread will help us make our docs better!

"is it ok for me to test your latency right now? Because if it's not, I'm fine waiting and testing it only when you are ready..."

That's not how Artillery works - new VUs continue arriving at sending requests regardless of whether other VUs are waiting for some request to complete or not. This characterisation of workload generation describes a closed-loop system - Artillery uses a hybrid model (as described in the paper I linked above).

What I struggled to find and answer for is does artillery allow me to guarantee those "clients" are one shot clients?

No, because that does not make sense in the context of systems that Artillery is designed to test.

Take Github for example:

  • your browser loads the homepage
  • then navigates to one of the repos in the sidebar
  • then goes to the Issues tab from the repo page

Say you want to see the effects of 1,000 users arriving every minute for an hour. That's where you reach for Artillery.

Constant RPS makes no sense in this scenario, as there's implicit back-pressure from the server (local to each user) because requests are dependent on each other. It's impossible to try to impose a constant rate of requests in a scenario like this (but you can set a constant rate of arrivals). This describes every real-world system which supports anything resembling transactions (i.e. do thing A, then do thing B depending on the result in A).

Does this mean that latency outliers due to a temporary server stall will get hidden? No - because whilst VU A is waiting for a response, a number of other VUs will arrive, send their initial requests, and record outsized response times. Artillery also outputs latency metrics at a configurable interval (10s by default) rather than a single aggregate report at the very end, so stalls like that are visible immediately and don't get smoothed over by smaller measurements from the rest of the duration of the test run.

There is of course another class of systems, where all requests are idempotent and commutative - an nginx instance serving up static files for example. For those one is typically interested in finding out max RPS. If that describes the service you're testing, by all means use wrk2, Vegeta or autocannon with a constant RPS setting to avoid the effects of CO on skewing your results.

@hassy
Copy link
Member

hassy commented Jun 16, 2021

That solution has nothing to do with thread counts. It has nothing to do with open or closed any thing.

I'd argue the opposite - load generation model is everything, in general and in the context of CO. To restate my points in a slightly different way:

Any closed-loop load generator will suffer from CO.

A full open-loop load generator will not. An example of open-loop load generation is sending requests at a fixed rate. Gil Tene pioneered that in wrk2 and it has since been adopted by other tools, such as Vegeta, and autocannon.

The problem you run into with fully open-loop load generators is that the type of systems you can test with those is extremely narrow. Your system must satisfy two requirements to be testable at constant RPS:

  1. All requests have to be idempotent - sending the same request over and over again yields the same result
  2. All requests have to be commutative - order of requests is not important

With that in mind, what do you do when you want to test a system which does not satisfy those requirements?

Well, you end up with something like this:

  • New clients can arrive at any time, according to some probability distribution (uniform, Poisson, etc) - that's your open-loop
  • Individual clients will be subject to backpressure from the service by definition of usage model of such a system - that's a number of closed loops within an open loop.

This is Artillery's hybrid model, which also maps exactly onto how such a system would be used in the real world.

@bbros-dev
Copy link

@hassy thanks for the clarifications and additional insights. I agree much of this can be distilled into something generally useful for a novice tester/test-team.

Perhaps it is worth framing the guidance in two categories:

  • Testing one endpoint/system.
  • Testing a sequence of endpoints/systems.

Like most situations/configurations, here one is faced with trade-offs and the difficulty is knowing what the impact of those trade-offs is.
I also think a picture is worth a thousand words, and since it is trivial to generate data showing the presence/absence of CO, it might be worth illustrating what they can expect from any set-up or recommendations are. Of course this is how you train SRE's in what certain events look like, but for people from multi-disciplinary backgrounds might find it helpful/comforting.

Something like this illustration would allow users assess how to they want to test-up and configure their testing infrastructure. In these scenarios absent CO you have a continuous curve starting when the system was frozen. For CO vulnerable configurations or tools you have anything not a smooth curve, usually disjoint segments of lines and curves and in worse cases some sort of jump behavior.

Of course, the premise of this thread is extremely narrow - someone wants to reliably recover the latency distribution of some system(s).

There are many use cases where that is not the primary interest, and I'm not suggesting those be treated as less important, and there are situations where it makes sense to trade-off some CO for other data/insights you get.

@rapiz1
Copy link

rapiz1 commented Dec 28, 2021

Any closed-loop load generator will suffer from CO.

I think the term CO is not particular useful if no model of the traffic and the interaction between the server and the client are given. CO is really the gap between the traffic/interaction model you use to measure and what happens in the real world. So it's not quite fair to say a test tool will suffer from CO, which means it's inherently flawed to omit things.

Take a second to think about it: what has really been omitted, for a full closed-loop load generator? The answer is surprisingly nothing. The tool faithfully records latency it observed without modification or omission. But why does people feel they're vulnerable to the so-called CO, while they doesn't omit any data? Because the fully closed-loop model is far away from what happens in the real world, where individual visitors don't wait or block for other visitors. So when consider a load generator, you need to think about both the model used to test and the model of the real world. There could just be a niche for a fully-closed loop load generator, where the real world model has back pressure.

With that in mind, what do you do when you want to test a system which does not satisfy those requirements?

Well, you end up with something like this:
New clients can arrive at any time, according to some probability distribution (uniform, Poisson, etc) - that's your open-loop
Individual clients will be subject to backpressure from the service by definition of usage model of such a system - that's a number of closed loops within an open loop.

This is Artillery's hybrid model, which also maps exactly onto how such a system would be used in the real world.

This is a brilliant model that I think applies to a majority of real world systems, and I don't think it has the CO issue in most cases. Note that there's a closed-loop generator for each visitor. But it's fine to have the back pressure for them, if the interaction model is that they need the response to decide what to do next, instead of visitors send requests at a fixed rate. But again, it really depends on what you expect for the real world.

In conclusion, it's better to ask "Does the load model that Artillery uses fits my system?" instead of "Does Artillery prevent coordinated omission?".

And both answers are "Probably, depending on what you expect for the real world".

@artilleryio artilleryio locked and limited conversation to collaborators Jun 4, 2022
@hassy hassy converted this issue into discussion #1472 Jun 4, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants