New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty elements in S3 JSON for large bbox changesets #650
Comments
@nrenner thank you so much for digging into this and flagging! It's possible the server running Overpass for OSMCha has gotten a bit rusty and needs a bit of a kick. But yea, this would require logging things inside the AWS infrastructure that runs osm-adiff-parser, etc to figure out where these elements are getting dropped. Thanks really for the detailed report - we should hopefully be able to follow up on this and debug in a proper way soon. |
@batpad thanks for the quick answer!
Before making any bigger changes, it might be worth considering alternatives to the current setup. I'm planning to open a separate issue for that. |
As an example for checking minutely augmented diffs (see my comment in #651), we can use the empty 134177840.json (OSMCha, OSM), which was open for one second: The corresponding sequence id for that minute is Querying and parsing the augmented diff for that sequence returns the expected curl "https://overpass.osmcha.org/api/augmented_diff?id=5541507" \
| zx -e "import parser from 'osm-adiff-parser'; let xml = await stdin(); parser(xml, null, (e, json) => { console.log(JSON.stringify(json['134177840'], null, 2)); })" \
| grep 134177840 | wc -l The query only takes five seconds, so all good and no bbox involved whatsoever. The geohacker diary says
It might be interesting to check the contents of that cached sequence (maybe some 5541507.xml or so?). Are they public? |
@nrenner from comments from @geohacker in the diary post:
Not sure if this gives you what you're looking for exactly. |
Oh, thanks! I hadn't looked in the comments. Unfortunately the latest call gives me the sequence 2554267 and that is from 2017 (https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/2554267.osc). Later sequences seem not to be available there. |
@nrenner @batpad this is the current S3 URL https://s3-eu-west-1.amazonaws.com/overpass-db-eu-west-1/augmented-diffs/ |
Thanks! All changes there: curl -s https://s3-eu-west-1.amazonaws.com/overpass-db-eu-west-1/augmented-diffs/5541507.osc \
| grep 134177840 | wc -l
54 So, if this was used, the query part isn't the problem. Maybe writing/updating the JSON fails for some reason or it gets overwritten later, but as there are no further changes in later minutely diffs for this changeset, I can't see a reason why. |
I'm submitting a bug report
Brief Description
Changesets with a large bounding box have an empty
elements
property in the real-changesets JSON file on S3, e.g.https://s3.amazonaws.com/mapbox/real-changesets/production/133792960.json (OSMCha, OSM)
What is the current behaviour ?
When opening a large bbox changeset, a spinning wheel appears for three minutes, then the map and changes tabs are empty.
Technically, the client requests the cached real-changesets JSON from S3 to get the diffs for the changed features. As the
elements
property contains no features, the client sends a fallback adiff query directly to the Overpass API, which times out after 180 seconds.What is the expected behaviour ?
OSMCha used to support large bbox changesets by processing world-wide minutely augmented diffs from Overpass API, as described in these posts:
So I wonder why this is no longer the case for some time now? I suspect augmented diffs were replaced by individual adiff queries - like in the client - at some point? If so, what was the reason?
When does this occur ?
Seemingly for bounding boxes larger than about 5 "square degrees" (simple width * height from bbox coordinates). Probably also depends on other factors like how long the changeset was open for (created_at - closed_at time span).
How do we replicate the issue ?
elements
property in Response tab when clicking the "133792960.json" request for details (filter requests by "s3")elements
propertySome recent examples:
actual
expected
seconds
deg²
Largest working cases I found in my samples:
actual
expected
seconds
deg²
Other Information / context:
I'm collecting issues related to Overpass and found three existing issues for failing large bbox changesets. These discuss the obvious client-side adiff query that runs into a timeout, but that is only a fallback.
Instead, I wanted to focus on the missing features in the S3 JSON and that this is really an issue of the server-side processing. Which seems not to be public (?), apart from the parsing part (osm-adiff-parser), so opening here.
The text was updated successfully, but these errors were encountered: