Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tile error handling #3104

Merged
merged 11 commits into from Oct 10, 2018
Merged

Improve tile error handling #3104

merged 11 commits into from Oct 10, 2018

Conversation

kring
Copy link
Member

@kring kring commented Oct 4, 2018

  • Fixed a bug that could cause endless tile requests with certain types incorrect server responses.
  • Fixed a bug that could cause endless region tile requests when loading a CSV with a time column where none of the column values could actually be interpreted as a time.
  • Added automatic retry with jittered, exponential backoff for tile requests that result in a 5xx HTTP status code. This is especially useful for servers that return 503 or 504 under load. Previously, TerriaJS would frequently disable the layer and hit the user with an error message when accessing such servers.

Fixes #2928 (both the WMTS problem and the unrelated CSV problem)
Closes #3083
Is probably a reasonable workaround for #3092

Sadly I didn't really make this code any more sane. It's still totally crazy.

@meh9
Copy link
Contributor

meh9 commented Oct 4, 2018

Can confirm WMTS problem is fixed.

@meh9
Copy link
Contributor

meh9 commented Oct 5, 2018

Can confirm unrelated CSV problem fixed.

@meh9
Copy link
Contributor

meh9 commented Oct 5, 2018

Can confirm that we are now very much more resistant to 503's at least, tested on AREMI with the following layers plus Australian Topography basemap, and it took a very large number of 503's before the layers started to actually fail. These show the ones that did and did not fail + Australian Topography:

screen shot 2018-10-05 at 10 57 28

I probably got 1000 or so 503's before things started failing, so I'd say we should go ahead and merge.

@@ -451,8 +451,10 @@ function calculateFinishDatesFromStartDates(startJulianDates, localDefaultFinalD
var n = sortedUniqueJulianDates.length;
if (n > 1) {
finalDurationSeconds = JulianDate.secondsDifference(sortedUniqueJulianDates[n - 1], sortedUniqueJulianDates[0]) / (n - 1);
endDates.push(JulianDate.addSeconds(sortedUniqueJulianDates[n - 1], finalDurationSeconds, new JulianDate()));
} else {
endDates.push(undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kring Is this the right thing to do for n === 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 I think so. The way endDates is used below, it looks like there has to be something added, or the indices will be wrong.

tileFailures = 0;
tileRetriesByMap = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retriesPerTile would be a better name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it definitely correct to reset it whenever there is any "intervening success"? If tile X has failed 3 times, then some other random tile succeeds, will this (incorrectly) get reset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not correct, but the effect should be that TerriaJS is more forgiving of errors, which is closer to a :feature: than a 🐛.

});
} else if (!defined(e.statusCode)) {
// On a modern-ish browser, this is almost certainly a CORS problem.
// Note that ignoreUnknownTileErrors is only for genuinely unknown (no status code) errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This situation also arises on DNS failures (eg, non existent domain)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@kring
Copy link
Member Author

kring commented Oct 8, 2018

@stevage reports that https://tiles.openaerialmap.org/5abae68e65bd8f00110f3e42/0/5abae68e65bd8f00110f3e43/wmts doesn't work on http://ci.terria.io/fail-better/
It does the GetCapabilities successfully, but then doesn't request any tiles for some reason.

@stevage
Copy link
Contributor

stevage commented Oct 8, 2018

Actually the last time I looked at the CI, it seemed to be fetching tiles, but then failing silently. I really don't know what's going on. Possibly totally spurious.

@kring
Copy link
Member Author

kring commented Oct 8, 2018

Ok here's what's happening with that openaerialmap server...

The extent of the one layer on that server is a tiny little square near Cape Code, MA, USA. So when you're looking at Australia, Cesium only downloads a couple of tiles. But those tiles all return XML, probably because of #2927, which mentions this exact server. Because we only download a couple of tiles, though, it doesn't fail and show the user a message immediately. But if you try to zoom to the layer's extent, you'll get an error like this pretty quickly:

image

So I think everything is working as expected, but let me know if anything still seems off to you @stevage.

@kring
Copy link
Member Author

kring commented Oct 8, 2018

I think I've addressed all the review comments @steve9164 and @stevage.

@stevage
Copy link
Contributor

stevage commented Oct 9, 2018

The server denied Cross-Origin Resource Sharing (CORS) access to this URL

More often than not, it will be the browser denying access on the basis that the server didn't provide any CORS instructions at all, right?

* stevage/fail-better:
  Comments and other minor improvements.

# Conflicts:
#	lib/Models/ImageryLayerCatalogItem.js
@stevage
Copy link
Contributor

stevage commented Oct 9, 2018

(I thought I had already pushed these commits, but I pushed them to the wrong branch...)

@kring
Copy link
Member Author

kring commented Oct 9, 2018

More often than not, it will be the browser denying access on the basis that the server didn't provide any CORS instructions at all, right?

Very true, but I think explaining that just muddies the water. The browser can't be changed, but the server potentially can be, so it's reasonable to explain it as something that has to change on the server. From the browser's perspective, a lack of explicit approval is the same as denial.

@kring kring merged commit 65c90fc into master Oct 10, 2018
@kring kring deleted the fail-better branch October 10, 2018 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants