Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty elements in S3 JSON for large bbox changesets #650

Open
nrenner opened this issue Mar 29, 2023 · 7 comments
Open

Empty elements in S3 JSON for large bbox changesets #650

nrenner opened this issue Mar 29, 2023 · 7 comments

Comments

@nrenner
Copy link

nrenner commented Mar 29, 2023

I'm submitting a bug report

Brief Description

Changesets with a large bounding box have an empty elements property in the real-changesets JSON file on S3, e.g.
https://s3.amazonaws.com/mapbox/real-changesets/production/133792960.json (OSMCha, OSM)

What is the current behaviour ?

When opening a large bbox changeset, a spinning wheel appears for three minutes, then the map and changes tabs are empty.

Technically, the client requests the cached real-changesets JSON from S3 to get the diffs for the changed features. As the elements property contains no features, the client sends a fallback adiff query directly to the Overpass API, which times out after 180 seconds.

screenshot

What is the expected behaviour ?

OSMCha used to support large bbox changesets by processing world-wide minutely augmented diffs from Overpass API, as described in these posts:

So I wonder why this is no longer the case for some time now? I suspect augmented diffs were replaced by individual adiff queries - like in the client - at some point? If so, what was the reason?

When does this occur ?

Seemingly for bounding boxes larger than about 5 "square degrees" (simple width * height from bbox coordinates). Probably also depends on other factors like how long the changeset was open for (created_at - closed_at time span).

How do we replicate the issue ?

  1. open Network tab in browser dev tools (F12)
  2. paste and submit large bbox changeset link in browser adress bar, like https://osmcha.org/changesets/133792960
  3. observe empty elements property in Response tab when clicking the "133792960.json" request for details (filter requests by "s3")
  4. observe timeout result after 180s in Response tab when clicking the "interpreter?..." Overpass request for details (filter requests by "adiff")
  5. for more examples, click on the "JSON" link in the table below to see the empty elements property

Some recent examples:

changeset changes
actual

expected
open for
seconds
bbox size
deg²
editor
133928860 OSM JSON 0 366 2 5 iD 2.12.1
134140877 OSM JSON 0 8674 215 10 JOSM/1.5 (18678 de)
133768685 OSM JSON 0 3 3749 288 rosemary v0.4.4
134177458 OSM JSON 0 51 2 575 JOSM/1.5 (18678 ru)
134177840 OSM JSON 0 54 1 3945 iD 2.25.1
133792960 OSM JSON 0 11 4031 4722 rosemary v0.4.4

Largest working cases I found in my samples:

changeset changes
actual

expected
open for
seconds
bbox size
deg²
editor
133926419 OSM JSON 5 5 1 25 JOSM/1.5 (18678 de)
134178093 OSM JSON 3 3 1 33 iD 2.25.1
133926888 OSM JSON 27 27 1 54 RapiD 1.1.9

Other Information / context:

I'm collecting issues related to Overpass and found three existing issues for failing large bbox changesets. These discuss the obvious client-side adiff query that runs into a timeout, but that is only a fallback.

Instead, I wanted to focus on the missing features in the S3 JSON and that this is really an issue of the server-side processing. Which seems not to be public (?), apart from the parsing part (osm-adiff-parser), so opening here.

@batpad
Copy link
Contributor

batpad commented Mar 29, 2023

@nrenner thank you so much for digging into this and flagging!

It's possible the server running Overpass for OSMCha has gotten a bit rusty and needs a bit of a kick. But yea, this would require logging things inside the AWS infrastructure that runs osm-adiff-parser, etc to figure out where these elements are getting dropped.

Thanks really for the detailed report - we should hopefully be able to follow up on this and debug in a proper way soon.

@nrenner
Copy link
Author

nrenner commented Mar 29, 2023

@batpad thanks for the quick answer!

It's possible the server running Overpass for OSMCha has gotten a bit rusty

Before making any bigger changes, it might be worth considering alternatives to the current setup. I'm planning to open a separate issue for that.

@nrenner
Copy link
Author

nrenner commented Mar 31, 2023

As an example for checking minutely augmented diffs (see my comment in #651), we can use the empty 134177840.json (OSMCha, OSM), which was open for one second: "created_at":"2023-03-27T13:22:17Z","closed_at":"2023-03-27T13:22:18Z".

The corresponding sequence id for that minute is 5541507.

Querying and parsing the augmented diff for that sequence returns the expected 54 changes:

curl "https://overpass.osmcha.org/api/augmented_diff?id=5541507" \
  | zx -e "import parser from 'osm-adiff-parser'; let xml = await stdin(); parser(xml, null, (e, json) => { console.log(JSON.stringify(json['134177840'], null, 2)); })" \
  | grep 134177840 | wc -l

The query only takes five seconds, so all good and no bbox involved whatsoever.

The geohacker diary says

The augmented diffs are also cached on S3.

It might be interesting to check the contents of that cached sequence (maybe some 5541507.xml or so?). Are they public?

@batpad
Copy link
Contributor

batpad commented Mar 31, 2023

@nrenner from comments from @geohacker in the diary post:

The state of the latest augmented diff is in a file called latest, like https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/latest.

You can request for an augmented diff this way: https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/2409184.osc

Not sure if this gives you what you're looking for exactly.

@nrenner
Copy link
Author

nrenner commented Mar 31, 2023

Oh, thanks! I hadn't looked in the comments.

Unfortunately the latest call gives me the sequence 2554267 and that is from 2017 (https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/2554267.osc). Later sequences seem not to be available there.

@willemarcel
Copy link
Collaborator

@nrenner
Copy link
Author

nrenner commented Mar 31, 2023

Thanks!

All changes there:

curl -s https://s3-eu-west-1.amazonaws.com/overpass-db-eu-west-1/augmented-diffs/5541507.osc \
   | grep 134177840 | wc -l
54

So, if this was used, the query part isn't the problem. Maybe writing/updating the JSON fails for some reason or it gets overwritten later, but as there are no further changes in later minutely diffs for this changeset, I can't see a reason why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants