Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Areas and Osm bases must be the same #258

Closed
Vanuan opened this issue Mar 14, 2016 · 20 comments
Closed

Areas and Osm bases must be the same #258

Vanuan opened this issue Mar 14, 2016 · 20 comments

Comments

@Vanuan
Copy link

Vanuan commented Mar 14, 2016

Look here:
http://overpass-turbo.eu/s/eZr

Corresponding relation is http://www.openstreetmap.org/relation/72634

What is it? Turbo error? Or overpass areas are broken?

@mmd-osm
Copy link
Contributor

mmd-osm commented Mar 14, 2016

This issue is caused by outdated areas on overpass-api.de. Please report this issue on the Status page instead of creating a Github ticket (which is only for software issues rather than operational issues): https://wiki.openstreetmap.org/wiki/Overpass_API/status

For the time being, you can also switch to the French server in overpass turbo, which has up-to-date areas.

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

How can I verify the date areas are last updated?

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

Ok, got it.

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

Still seeing wrong boundaries even on the French server. Wrong area algorithm?

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

I don't see any changes in relation history in the past 6 months. So it must be a software issue.

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

Looks like if timestamps are the same, it's correct. If one of them is different, it's borked:

http://overpass-api.de/api/ (incorrect)

  "osm3s": {
    "timestamp_osm_base": "2016-03-14T22:52:01Z",
    "timestamp_areas_base": "2016-02-11T09:13:02Z",
  },

http://api.openstreetmap.fr/oapi/ (incorrect)

  "osm3s": {
    "timestamp_osm_base": "2016-03-14T22:29:02Z",
    "timestamp_areas_base": "2016-03-14T18:11:02Z",
  },

http://overpass.osm.rambler.ru/cgi/ (correct)

  "osm3s": {
    "timestamp_osm_base": "2016-03-10T21:02:02Z",
    "timestamp_areas_base": "2016-03-10T21:02:02Z",
  },

@Vanuan
Copy link
Author

Vanuan commented Mar 14, 2016

So both areas and osm must have the same timestamp. Otherwise its data is unreliable.

@mmd-osm
Copy link
Contributor

mmd-osm commented Mar 15, 2016

I don't see any changes in relation history in the past 6 months. So it must be a software issue.

That conclusion is not quite accurate. You can move a node around, still it doesn't affect a way or even relations's history at all. Likewise, if you change a way (add new nodes, remove nodes, change tags), this has exactly zero impact on any relation's history this way is in.

If you look at the result from the French server right now, you see that relation 1702215 is not included in the result.

15-03-2016 09-38-55

By design, there might be some slight differences. Usually, it is best to wait some time until areas have been recreated.

I suggest to close this ticket, as there's nothing else to do here.

@Vanuan
Copy link
Author

Vanuan commented Mar 15, 2016

That conclusion is not quite accurate. You can move a node around, still it doesn't affect a way or even relations's history at all.

So yeah, looks like my assumption is correct. If area and osm timestamps don't match, wrong boundaries are used, even though relation was never included in other relation.

french server timestamps still don't match, so potentially, there could be problems with other relations:

    "timestamp_osm_base": "2016-03-15T15:40:02Z",
    "timestamp_areas_base": "2016-03-15T11:28:02Z",

By design, there might be some slight differences. Usually, it is best to wait some time until areas have been recreated.

It is unacceptable. It makes a use case of scheduled automatic downloads impossible. osm and areas must be consistent.

I'm fine with 5 days old data, but I can't tolerate "slight differences" leading to completely wrong regions selected. OSM data is fine. It's Overpass's problem.

@Vanuan Vanuan changed the title Relation 72634 has wrong area boundaries Areas and Osm bases must be the same Mar 15, 2016
@Vanuan
Copy link
Author

Vanuan commented Mar 15, 2016

Now, the problem is with http://overpass.osm.rambler.ru/cgi/ server (which worked well yesterday):

    "timestamp_osm_base": "2016-03-15T15:54:01Z",
    "timestamp_areas_base": "2016-03-10T21:02:02Z",

So all servers are broken currently.

@Zverik
Copy link

Zverik commented Mar 15, 2016

Did you update osmconvert? Because in version released recently the author broke relation updating.

@mmd-osm
Copy link
Contributor

mmd-osm commented Mar 15, 2016

@Vanuan : In reality the delay between osm base and areas is usually not an issue, as borders are in most cases not changing every 5 minutes. There are many project who use this mechanism without issue and I wonder why this is really an issue for you. See it this way, the delay is even a feature: while mapper often screw up boundary relations, the areas are still around for some time and are still working. For most users, this is the preferred behavior and this has been always this way in Overpass API.

If you don't want to use areas or any of the official Overpass instances because of the reasons you stated, you have a few options:

  • Switch to the (poly: ... ) filter and provide your own multipolygon via a list of lat/lon coordinates.
  • Set up your own Overpass API instance and control when updates of osm_base and areas occur. This means, that you will have several hours of a delay anyway, as area creation just takes some time.

Again, this all highly depends on your particular use case, which you didn't state in detail so far. Even with scheduled automatic downloads, there's always the risk that even the OSM main database has broken or inconsistent boundaries at the time. If you follow the daily analysis done by User:wambacher closely, you probably know what I'm talking about.

BTW: the rambler.ru instance is recovering from hard disk failure, that's why there's some lag at the moment.

@Zverik : osmconvert is typically not used for Overpass API, in particular not for the update process, where diff files are fetched from planet.openstreetmap.org, uncompressed and applied via an Overpass binary called update_database or update_from_dir respectively.

@Vanuan
Copy link
Author

Vanuan commented Mar 15, 2016

Again, this all highly depends on your particular use case, which you didn't state in detail so far.

My use case is to download new administrative boundaries of a particular country/region at least once a week.

You see, it might break any moment. Move a boundary node just a few meters in any direction and boom - "area" query is selecting wrong regions. So I can only be sure that it's correct if area timestamp is the same as osm timestamp.

Probably I'd be better with downloading bounding box and filtering relations manually. But it makes me wonder is this "area" feature is even useful to anyone? As it can't be trusted.

"Area" would be much more useful/reliable if it's updated the same time OSM updates are done. Even if it means data being not realtime.

In reality the delay between osm base and areas is usually not an issue, as borders are in most cases not changing every 5 minutes

I'm sure that on global scale, borders are actually changing every 5 minutes. It's just that not everyone is downloading all the borders. Moreover, difference between osm and area bases could be from 12 hours to 5 days or even months.

@Vanuan
Copy link
Author

Vanuan commented Mar 15, 2016

Look, I've just broken it again:

screenshot from 2016-03-16 01 01 24

Minor change here: http://www.openstreetmap.org/changeset/37858980#map=11/47.8995/30.3868
Huge difference in overall picture.

@TheFive
Copy link

TheFive commented Mar 16, 2016

Hi Vanuan,

i am not a Overpass Developer, but i have learned, that the team is doing their best to support the community.
I expect the decision in updating areas only every 6 month is based on the effort, that this rebuild of this "database index" needs, so it is a compromise between cost (material and(or time) & up-to-dateness.

A shorter cycle wont help in your case (only if the cycle is equal to zero, as you have remarked correct), because you can not garantee that between area generation and data timestamp of the relation there is no movement of a point, defining the border.

So you have to take complete other approach in selecting the borders you would like to download.

In Germany we have the concept or special border IDs. That is a "talking number" with information about the structure of the border (part of string <=> part of region).

Just checking the one relation you have mentioned above the field "koatuu" looks similar to what i have used, to download german boarders from overpass, may this can be an approach in downloading your data.

You have asked, where overpass areas can be used for ?

As there are much more use cases, than downloading border relations from overpass, the question is simple. I have put on a small counting app, that just counts pharmacies (and missing pharmacy tagging) structured by areas. I have no problem, that the areas are not updated every 5 days, as i do not expect, german borders to change so often. Of course they will become better and better over the time, as we receiving much more open data to correct our information, but the case, that a pharmacy is jumping from one area to the neighborhood area is more seldom.

Christoph
P.S. As you are working on boundaries: Do you know Walters service https://osm.wno-edv-service.de/boundaries/ ?. That is not for downloading the boarders on a weekly basis, but offers boundaries in different formats (e.g. to use that for QGis).

@mmd-osm
Copy link
Contributor

mmd-osm commented Mar 16, 2016

There are 2 more options:

  • use [date: ...] to retrieve data exactly with the areas_base timestamp. While areas always refer to a certain areas_base timestamp, [date: ...] will retrieve osm data at the given point in time. This would only work on instances with attic data (French instance doesn't have attic at this time).
  • More general: something I called Ad-hoc area creation, which would create the area on the fly as part of the query. By definition this would also use matching osm data and areas timestamp as both of them are then in the same database transaction. That's all deep in concept state at the moment, see: nodes inside a closed way #77 (comment)

decision in updating areas only every 6 month

Areas should normally be updated every 6-12 hours by an automated job.

@Vanuan
Copy link
Author

Vanuan commented Mar 16, 2016

use [date: ...] to retrieve data exactly with the areas_base timestamp. While areas always refer to a certain areas_base timestamp, [date: ...] will retrieve osm data at the given point in time. This would only work on instances with attic data (French instance doesn't have attic at this time).

Looks promising. Do you have an example?
So I'd need 2 HTTP requests?

@TheFive
Yeah, using koatuu is more stable approach. I'd be fetching more data than needed though. And add some code to filter out features myself.

Ok, so if areas feature is not precise and isn't guaranteed to work, it's fine. I just felt it should be described in documentation.

@mmd-osm
Copy link
Contributor

mmd-osm commented Mar 16, 2016

Looks promising. Do you have an example?
So I'd need 2 HTTP requests?

Yes, exactly.

Step 1: Get current areas_base timestamp

I use the following simple query. This should be ideally something not too complex.

[out:json][timeout:25];
{{geocodeArea:Одесская область}};
out ids;

Result:

{
  "version": 0.6,
  "generator": "Overpass API",
  "osm3s": {
    "timestamp_osm_base": "2016-03-16T18:38:02Z",
    "timestamp_areas_base": "2016-03-15T18:21:02Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [

{
  "type": "area",
  "id": 3600072634
}


  ]
}

We're looking for the timestamp_areas_base value "2016-03-15T18:21:02Z" and use it for the next query:

Step 2: Actual query with [date: ...] to return data at time of timestamp_areas_base

[date:"2016-03-15T18:21:02Z"][out:json][timeout:25];
{{geocodeArea:Одесская область}}->.searchArea;

(
  node["admin_level"="6"](area.searchArea);
  way["admin_level"="6"](area.searchArea);
  relation["admin_level"="6"](area.searchArea);
);
out body;
>;
out skel qt;

Areas and returned OSM data should be pretty much in sync now. Note that "timestamp_osm_base" and "timestamp_areas_base" always refer to the last timestamp, they're not affected by the [date: ..] setting!

As querying attic data might take more time, you might want to increase the [timeout:25] setting as well.

If you're really paranoid, you can also compare "timestamp_areas_base" returned by the first query vs. the second query. If both values differ, then you hit a time frame where the area calculation just finished on the server. If that's the case, you need to run query 2 again with the latest "timestamp_areas_base".

@Vanuan
Copy link
Author

Vanuan commented Mar 17, 2016

Not paranoid, maybe pedantic a bit :)
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants