Make intermittently failing concurrencyTests more stable.#4281
Make intermittently failing concurrencyTests more stable.#4281cbickel wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4281 +/- ##
===========================================
+ Coverage 59.81% 80.97% +21.16%
===========================================
Files 163 163
Lines 7594 7594
Branches 502 502
===========================================
+ Hits 4542 6149 +1607
+ Misses 3052 1445 -1607
Continue to review full report at Codecov.
|
|
I guess this is OK, but I'm not sure what the test failure looks like?
But the test is for explicitly reusing the container - are you saying a different test is unexpectedly using the same container? If so, there is still a risk I think in case that other test uses the container before the interval is up that resets the counter. I don't see other tests sharing the same action that uses Or is this a case where of the 4 activations - one (or more) take > 30s to arrive, which results in rejection? If one arrives after another is rejected, I expect the container to be destroyed and not reused. |
| // Licensed to the Apache Software Foundation (ASF) under one or more contributor | ||
| // license agreements; and to You under the Apache License, Version 2.0. | ||
|
|
||
| const Promise = require('bluebird'); |
There was a problem hiding this comment.
I wanted to use finally to decrement the counter again. But finally is only part of node10. The alternative on handling the counter would be to duplicate the code to decrement it (once in error case and once in success case).
That means I only had the choice between using Promises of bluebird in default node version or to use native Promises with node10. And I decided to use bluebird as it seems to be already part of our action container and I didn't want to hardcode the version for these tests.
| }).finally(() => { | ||
| // Before leaving the container, decrement the counter again. But wait twice the interval, before decrementing it. Otherwise, the | ||
| // other requests might not realise, that all requests were inside the container. | ||
| setTimeout(() => counter--, 2 * interval); |
There was a problem hiding this comment.
when is finally executed - ie is there a guarantee the container isn't paused first?
There was a problem hiding this comment.
Finally is executed after the promise, regardless of the promises fade. It is the equivalent of scalas Future.andThen { ... }.
There was a problem hiding this comment.
I think @rabbah's comment is towards: Is it guaranteed to be executed before the next .then block? (Scala's andThen forces that ordering).
There was a problem hiding this comment.
Is there any reason not to:
- remove the finally (and the
setTimeoutwithin) - add
counter=0;to immediately beforeresolve(result)(withincheckRequests)
This way there is no timing issue (potentially racing pause grace timeout), and the counter always gets reset unless a reject occurs.
There was a problem hiding this comment.
@markusthoemmes @rabbah
If I understand the following documentation of http://bluebirdjs.com/docs/api/finally.html correctly, I would say, that the finally will always be executed after the next then:
If the handler function passed to .finally returns a promise, the promise returned by .finally will not be settled until the promise returned by the handler is settled. If the handler fulfills its promise, the returned promise will be fulfilled or rejected with the original value. If the handler rejects its promise, the returned promise will be rejected with the handler's value.
As finally returns a new promise, which is required for the next .then-block, I should be executed first.
@tysonnorris
I don't know, if I get your proposal completely, but this action is also used for other tests, which expect, that there are concurrent requests within the same container.
By removing the setTimeout before decrementing, only one of these requests has the chance to realise, that all expected requests are inside the container now. By setting the counter to 0, this would not be the correct behaviour for tests, that expect several requests inside this container. And adapting the counter only in success case would also be the wrong behaviour.
There was a problem hiding this comment.
@cbickel How about:
- add a new variable to track pending requests count
- decrement that before resolve
- reset counter when pending reaches 0
let pending = 0;
...
//when request begins:
counter++;
pending++;
...
//in checkRequests:
pending--;
if (pending == 0){
counter=0; //counter will be reset exactly once, when pending reaches 0
}
Just trying to make it even less sensitive to timing configs.
And adapting the counter only in success case would also be the wrong behaviour.
Is it? In case of rejections, won't the container not be used as warm anymore (should be destroyed instead of reused)?
There was a problem hiding this comment.
I see the following problems with this implementation:
- the counter doesn't represent the amount of requests in the container anymore.
- as the action is only failed with an application-error, the container will not be removed and will potentially be reused. So we need to duplicate the code of decrementing the counter.
There was a problem hiding this comment.
Not sure I follow, this seems to work testing locally.
the counter doesn't represent the amount of requests in the container anymore.
How so? we only reset counter at time of resolve, and resolve only if the counter is expected value (n in case of n concurrent supported activations)
as the action is only failed with an application-error, the container will not be removed
I see - I get the error behavior wrong all the time :(... so I guess regardless of any reject happening, the test where reject occurs will fail, and within that test, at least the rejected activation will fail, but additionally another may fail (due to container reuse), which seems ok within the same action+test? (the tests all use uniquely named actions, so there should not be reuse between tests).
WDYT?
|
@tysonnorris |
ahhhh. sorry I misunderstood. I see the issue now, just have a question about the impl. |
The test `Action concurrency limits should execute activations sequentially when concurrency = 1` fails intermittently. The reason is, that 4 activations are started in parallel. If there is a delay to one of these activations, the last activation will reuse an existing container, which has already an incremented counter. This PR changes the action to decrement the counter before leaving the action. This will also enhance local debugging, as the warm container can be reused.
|
@cbickel |
The test
Action concurrency limits should execute activations sequentially when concurrency = 1fails intermittently. The reason is, that 4 activations are started in parallel.If there is a delay to one of these activations, the last activation will reuse an existing container, which has already an incremented counter.
This PR changes the action to decrement the counter before leaving the action.
This will also enhance local debugging, as the warm container can be reused.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: