Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Optimisation Question #224

Open
TheFive opened this issue Jul 15, 2015 · 4 comments
Open

Query Optimisation Question #224

TheFive opened this issue Jul 15, 2015 · 4 comments
Labels

Comments

@TheFive
Copy link

TheFive commented Jul 15, 2015

Hi,

for the current "weekly task" GuidePost i would like to count the tagging of destination:* on ways of type path, track, cycleways and footways.

the query is simple:

[out:json][date:":timestamp:"];
area["key"="value"]->.a;
(way(area.a)["destination._""."][highway=path];
way(area.a)[~"destination.
""._"][highway=footway];
way(area.a)[
"destination."~"."][highway=track];
way(area.a)["destination._""._"][highway=cycleway];);
out tags;

(Key and Value are changes for every overpass run to something like de:amtlicher_gemeindeschluessel and 16. current actual results can be found here. http://thefive.sabic.uberspace.de/table/GuidePost_Path.html)
But it takes very long (for no results).

What is the way to optimize it ?
Throwing out the regex and do an or over all possibilities ? As mueschel has suggested 8 destination tags, see: http://blog.openstreetmap.de/blog/2015/07/ferienaufgabe-wanderwegweiser/, or in english: http://www.weeklyosm.eu/archives/4504 , there are 8*4 compinations i have to put in the "or".

The "destination" restriction is very hard (0 results yet), is there somewhat like an index in overpass, that is not used here ?

Christoph

@mmd-osm
Copy link
Contributor

mmd-osm commented Jul 18, 2015

Hi Christoph,

let's take a look at the dev instance first: here your query just takes 12 seconds, if the data is already in the buffer cache. The response time will be a bit higher in case of additional hard disk accesses.

overpass turbo link for testing: http://overpass-turbo.eu/s/aus

So, from a query point of view, I would say you're fine.

Unfortunately, the situation is very different on both production machines:

  • overpass-api.de returns an HTTP 504 error after almost 7 minutes with no result
  • rambler instance also returns with an HTTP 504 error after 5.5 minutes with no result

My recommendation is to discuss this topic with Roland via email.

@TheFive
Copy link
Author

TheFive commented Jul 19, 2015

Thanks, i just switched my "productive" osmcount for that case to your Development Instance, please inform me, when this results in problems.
I just have to introduce a automatic check, wether the data is actual or not.
It is impressive much faster.

@drolbr
Copy link
Owner

drolbr commented Jul 30, 2015

The "destination" restriction is very hard (0 results yet), is there
somewhat like an index in overpass, that is not used here ?

The fact that the "destination" restriction is hard unfortenately
doesn't help here. The processing of the regular expression goes to an
external library, and hence we don't know before how many keys would
match it.

Basically, there are three things that could be checked for first:

  1. the exact tag condition
  2. all tags that have a key matching the regular expression
  3. all ways within the area

Apparently, all three things could be large or small. As the original
use case for 2. have been "name:XX" or "name" (of which very much
elements exist, hence a bad first filter) and areas like cities or
suburbs, it is very likely that the engine will filter for areas first.

I think Daniel has turned that around which would fit much better to
this use case. Daniel, does this explain the observation?

@mmd-osm
Copy link
Contributor

mmd-osm commented Aug 1, 2015

@drolbr : well, the branch running on api_mmd endpoint doesn't include any changes to the evaluation of filters, so this part should really be identical to main instance. Also, I used Christoph's query as is.

I think the difference in response time only depends on two changes:

There's an additional patch for attic scenarios in place, which avoids collecting unnecessary elements (#174). It is primarily targeting large memory consumption, but could also have some runtime implications (didn't measure this part).

My branch also runs on PCRE instead of Posix Regex library, but I don't think this has any relevance for the speed up we see here.

@drolbr drolbr added the question label Jun 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants