Silent (from user perspective) scenario application failure #514

abyrd · 2019-05-04T14:22:20Z

In a recent Analysis run there are stripe artifacts:

This might be the same thing we saw in #483: errors happen when finding/applying scenarios. In the log in #483 we see: TransportNetworkCache - No scenario provided or loaded. Replacing with empty scenario.

This means that when an error happens, instead of just failing to submit results for those origin points, and the backend redelivering and retrying them, it instead applies no scenario semi-silently and continues to calculate and return results. We need to change that behavior.

The text was updated successfully, but these errors were encountered:

addresses #514 other related tickets are #301 #424 #483

abyrd · 2019-05-04T14:49:48Z

Although I've patched R5 to avoid returning bogus regional results with no scenario applied, it would still be interesting to identify what was causing the scenario application failures in the first place. It might be something to do with fetching scenarios from S3 or saving them locally on the workers. We will need to dig through the staging worker logs to see.

abyrd · 2019-05-05T04:32:46Z

It turns out that the striping was not due to a limited number of workers failing; it was due to only one of 100 workers actually applying the scenario. All other workers were failing to apply the scenario and falling back on the baseline. This was not happening silently in the sense that the problem was logged on the worker, but from an end user perspective and from the backend's perspective it was invisible.

The patch ensures that workers fail hard and refuse to submit work results instead of just failing to apply the scenario. It causes analyses to stall forever instead of appearing to complete normally, without applying any scenario.

Generally we want things to fail fast and loudly whenever something is amiss. Also throughout R5 I think we should avoid falling back on / initializing with defaults - I recently also encountered a problem with a field that was being initialized to a default value and never overwritten.

trevorgerhardt · 2019-08-23T04:08:00Z

We believe this issue is solved, the only known cause of the issue has been fixed. Re-open this issue if the striping has been seen again.

abyrd added a commit that referenced this issue May 4, 2019

fix(TransportNetworkCache): fail hard on missing scenarios

e69c84a

addresses #514 other related tickets are #301 #424 #483

abyrd self-assigned this May 4, 2019

abyrd added the bug label May 4, 2019

abyrd mentioned this issue May 5, 2019

Modification deserialization for regional analyses fails on type field #518

Closed

abyrd changed the title ~~Stripe artifacts from silent scenario application failure~~ Silent (from user perspective) scenario application failure May 5, 2019

abyrd mentioned this issue May 5, 2019

Auto-scaled workers repeatedly perform car linking #503

Closed

abyrd mentioned this issue Jun 12, 2019

Basic support for pick-up delay for autonomous or hailed vehicles #525

Merged

trevorgerhardt closed this as completed Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent (from user perspective) scenario application failure #514

Silent (from user perspective) scenario application failure #514

abyrd commented May 4, 2019 •

edited

Loading

abyrd commented May 4, 2019

abyrd commented May 5, 2019

trevorgerhardt commented Aug 23, 2019

Silent (from user perspective) scenario application failure #514

Silent (from user perspective) scenario application failure #514

Comments

abyrd commented May 4, 2019 • edited Loading

abyrd commented May 4, 2019

abyrd commented May 5, 2019

trevorgerhardt commented Aug 23, 2019

abyrd commented May 4, 2019 •

edited

Loading