Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid query results with regexp filter conditions #192

Closed
mmd-osm opened this issue Jan 25, 2015 · 11 comments
Closed

Invalid query results with regexp filter conditions #192

mmd-osm opened this issue Jan 25, 2015 · 11 comments
Labels

Comments

@mmd-osm
Copy link
Contributor

mmd-osm commented Jan 25, 2015

According to the Wiki:

"The result set is the set of all elements that match the conditions of all the filters. "

Let's take a look at the following examples:

Statement Way 271221475 is ... by out geom;
way(271221475); shown
way(271221475) ["addr:city"]; shown
way(271221475) ["addr:city"] ["addr:hamlet"!~"."]; not shown
way(271221475) ["addr:city"] ["addr:hamlet"!~"."] [~"^addr:hamlet:.*$"~"."]; shown

In example 3, the condition ["addr:hamlet"!~"."] is not met. To my surprise, adding another key-regexp filter in example 4 overrides this. Is this a bug, undocumented behavior or did I miss something here?

Reference data:

  <way id="271221475">
    <bounds minlat="46.6877029" minlon="11.8496157" maxlat="46.6878615" maxlon="11.8498448"/>
    <nd ref="2761802524" lat="46.6877984" lon="11.8496157"/>
    <nd ref="2761802876" lat="46.6877710" lon="11.8496460"/>
    <nd ref="2761802583" lat="46.6877600" lon="11.8496249"/>
    <nd ref="2761802032" lat="46.6877178" lon="11.8496716"/>
    <nd ref="2761802205" lat="46.6877310" lon="11.8496969"/>
    <nd ref="2761802372" lat="46.6877029" lon="11.8497280"/>
    <nd ref="2761802069" lat="46.6877638" lon="11.8498448"/>
    <nd ref="2761802026" lat="46.6878615" lon="11.8497366"/>
    <nd ref="2761802524" lat="46.6877984" lon="11.8496157"/>
    <tag k="addr:city" v="San Martin de Tor - St.Martin in Thurn - S.Martino in Badia"/>
    <tag k="addr:city:de" v="St.Martin in Thurn"/>
    <tag k="addr:city:it" v="S.Martino in Badia"/>
    <tag k="addr:city:lld" v="San Martin de Tor"/>
    <tag k="addr:hamlet" v="Antermoia"/>
    <tag k="addr:hamlet:de" v="Untermoj"/>
    <tag k="addr:hamlet:it" v="Antermoia"/>
    <tag k="addr:hamlet:lld" v="Antermoia"/>
    <tag k="addr:housenumber" v="40"/>
    <tag k="addr:postcode" v="39030"/>
    <tag k="addr:street" v="Strada Plan Murin"/>
    <tag k="addr:street:de" v="Straße Plan Murin"/>
    <tag k="addr:street:it" v="Strada Plan Murin"/>
    <tag k="addr:street:lld" v="Strada Plan Murin"/>
    <tag k="building" v="yes"/>
  </way>
@drolbr
Copy link
Owner

drolbr commented Jan 25, 2015

Thank you for finding the issue. That's a bug.

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Aug 17, 2015

Test query: node(1530957491)["name"!~"."][~"^name:.*$"~"."];out;

Bad behavior is caused by coding in void filter_ids_by_ntags in query.cc in Line 608:

  sort(removed_ids.begin(), removed_ids.end());
  removed_ids.erase(unique(removed_ids.begin(), removed_ids.end()), removed_ids.end());
  new_ids.erase(set_difference(new_ids.begin(), new_ids.end(),
                removed_ids.begin(), removed_ids.end(), new_ids.begin()), new_ids.end());

Situation before new_ids.erase:
new_ids: (6 entries): 1530957491 1530957491 1530957491 1530957491 1530957491 1530957491

Situation after new_ids.erase:
new_ids: (5 entries): 1530957491 1530957491 1530957491 1530957491 1530957491

Although the object ids are all identical, the erase statement fails to delete all relevant ones - it just deletes a single entry. Effectively, way 1530957491 still survives in the result, although it has been identified to be removed by the negative key/value_regex.

The following pull request fixes this by first removing duplicate entries in new_ids:

 sort(new_ids.begin(), new_ids.end());
 new_ids.erase(unique(new_ids.begin(), new_ids.end()), new_ids.end());

This fix is also deployed for testing on the dev instance: http://overpass-turbo.eu/s/b03

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Mar 20, 2016

Issue was fixed in 3935810

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Jul 23, 2016

I'm reopening this issue, as the patch in 3935810 causes performance regressions. The following very simple query used to take 50ms, and now takes 500ms.

Test case:

[out:json];node["name"~"[33][22][22][55]"]["phone"](3.2299951,-76.6815258,3.6299951000000004,-76.2815258);out body;

With an additional instrumentation in method generate_ids_by_coarse, we can check the index used for ids_by_coarse, as well as the vector size before / after removing duplicates. Results are shown below.

It's pretty evident, that we're sorting and removing the same elements (plus a few new ones) over and over again for the same coarse index.

I'm proposing to change this in a way that sorting / removing duplicates is only done once per it->first.val() & 0x7fffff00. See referenced pull request.

ids_by_coarse - index: 461641984
Before de-dup: 38
After de-dup: 38
ids_by_coarse - index: 461641984
Before de-dup: 63
After de-dup: 63
ids_by_coarse - index: 461641984
Before de-dup: 100
After de-dup: 100
ids_by_coarse - index: 461641984
Before de-dup: 103
After de-dup: 103
ids_by_coarse - index: 461641984
Before de-dup: 155
After de-dup: 155
ids_by_coarse - index: 461641984
Before de-dup: 163
After de-dup: 163
ids_by_coarse - index: 461641984
Before de-dup: 202
After de-dup: 202
ids_by_coarse - index: 461641984
Before de-dup: 250
After de-dup: 250
ids_by_coarse - index: 461641984
Before de-dup: 281
After de-dup: 281
ids_by_coarse - index: 461641984
Before de-dup: 319
After de-dup: 319
ids_by_coarse - index: 461641984
Before de-dup: 450
After de-dup: 450
ids_by_coarse - index: 461641984
Before de-dup: 458
After de-dup: 458
ids_by_coarse - index: 461641984
Before de-dup: 531
After de-dup: 531
ids_by_coarse - index: 461641984
Before de-dup: 554
After de-dup: 554
ids_by_coarse - index: 461641984
Before de-dup: 604
After de-dup: 604
ids_by_coarse - index: 461641984
Before de-dup: 629
After de-dup: 629
ids_by_coarse - index: 461641984
Before de-dup: 674
After de-dup: 674
ids_by_coarse - index: 461641984
Before de-dup: 744
After de-dup: 744
ids_by_coarse - index: 461641984
Before de-dup: 781
After de-dup: 781
ids_by_coarse - index: 461641984
Before de-dup: 866
After de-dup: 866
ids_by_coarse - index: 461641984
Before de-dup: 890
After de-dup: 890
ids_by_coarse - index: 461641984
Before de-dup: 934
After de-dup: 934
ids_by_coarse - index: 461641984
Before de-dup: 1005
After de-dup: 1005
ids_by_coarse - index: 461641984
Before de-dup: 1027
After de-dup: 1027
ids_by_coarse - index: 461641984
Before de-dup: 1129
After de-dup: 1129
ids_by_coarse - index: 461641984
Before de-dup: 1266
After de-dup: 1266
ids_by_coarse - index: 461641984
Before de-dup: 1391
After de-dup: 1391
ids_by_coarse - index: 461641984
Before de-dup: 1494
After de-dup: 1494
ids_by_coarse - index: 461641984
Before de-dup: 1586
After de-dup: 1586
ids_by_coarse - index: 461641984
Before de-dup: 1698
After de-dup: 1698
ids_by_coarse - index: 461641984
Before de-dup: 1713
After de-dup: 1713
ids_by_coarse - index: 461641984
Before de-dup: 1759
After de-dup: 1759
ids_by_coarse - index: 461641984
Before de-dup: 1799
After de-dup: 1799
ids_by_coarse - index: 461641984
Before de-dup: 1835
After de-dup: 1835
ids_by_coarse - index: 461641984
Before de-dup: 1904
After de-dup: 1904
ids_by_coarse - index: 461641984
Before de-dup: 1909
After de-dup: 1909
ids_by_coarse - index: 461641984
Before de-dup: 1950
After de-dup: 1950
ids_by_coarse - index: 461641984
Before de-dup: 1968
After de-dup: 1968
ids_by_coarse - index: 461641984
Before de-dup: 1972
After de-dup: 1972
ids_by_coarse - index: 461641984
Before de-dup: 2055
After de-dup: 2055
ids_by_coarse - index: 461641984
Before de-dup: 2078
After de-dup: 2078
ids_by_coarse - index: 461641984
Before de-dup: 2087
After de-dup: 2087
ids_by_coarse - index: 461641984
Before de-dup: 2094
After de-dup: 2094
ids_by_coarse - index: 461641984
Before de-dup: 2098
After de-dup: 2098
ids_by_coarse - index: 461641984
Before de-dup: 2108
After de-dup: 2108
ids_by_coarse - index: 461641984
Before de-dup: 2109
After de-dup: 2109
ids_by_coarse - index: 461641984
Before de-dup: 2110
After de-dup: 2110
ids_by_coarse - index: 461641984
Before de-dup: 2116
After de-dup: 2116
ids_by_coarse - index: 461641984
Before de-dup: 2122
After de-dup: 2122
ids_by_coarse - index: 461641984
Before de-dup: 2272
After de-dup: 2272
ids_by_coarse - index: 461641984
Before de-dup: 2320
After de-dup: 2320
ids_by_coarse - index: 461641984
Before de-dup: 2358
After de-dup: 2358
ids_by_coarse - index: 461641984
Before de-dup: 2390
After de-dup: 2390
ids_by_coarse - index: 461641984
Before de-dup: 2448
After de-dup: 2448
ids_by_coarse - index: 461641984
Before de-dup: 2499
After de-dup: 2499
ids_by_coarse - index: 461641984
Before de-dup: 2542
After de-dup: 2542
ids_by_coarse - index: 461641984
Before de-dup: 2556
After de-dup: 2556
ids_by_coarse - index: 461641984
Before de-dup: 2565
After de-dup: 2565
ids_by_coarse - index: 461641984
Before de-dup: 2571
After de-dup: 2571
ids_by_coarse - index: 461641984
Before de-dup: 2608
After de-dup: 2608
ids_by_coarse - index: 461641984
Before de-dup: 2615
After de-dup: 2615
ids_by_coarse - index: 461641984
Before de-dup: 2623
After de-dup: 2623
ids_by_coarse - index: 461641984
Before de-dup: 2626
After de-dup: 2626
ids_by_coarse - index: 461641984
Before de-dup: 2640
After de-dup: 2640
ids_by_coarse - index: 461641984
Before de-dup: 2673
After de-dup: 2673
ids_by_coarse - index: 461641984
Before de-dup: 2690
After de-dup: 2690
ids_by_coarse - index: 461641984
Before de-dup: 2705
After de-dup: 2705
ids_by_coarse - index: 461641984
Before de-dup: 2712
After de-dup: 2712
ids_by_coarse - index: 461641984
Before de-dup: 2743
After de-dup: 2743
ids_by_coarse - index: 461641984
Before de-dup: 2800
After de-dup: 2800
ids_by_coarse - index: 461641984
Before de-dup: 2812
After de-dup: 2812
ids_by_coarse - index: 461641984
Before de-dup: 2817
After de-dup: 2817
ids_by_coarse - index: 461641984
Before de-dup: 2821
After de-dup: 2821
ids_by_coarse - index: 461641984
Before de-dup: 2888
After de-dup: 2888
ids_by_coarse - index: 461641984
Before de-dup: 3085
After de-dup: 3085
ids_by_coarse - index: 461641984
Before de-dup: 3093
After de-dup: 3093
ids_by_coarse - index: 461641984
Before de-dup: 3140
After de-dup: 3140
ids_by_coarse - index: 461642240
Before de-dup: 4
After de-dup: 4

@drolbr
Copy link
Owner

drolbr commented Aug 28, 2016

It is now an enhancement because it is a merely question of performance.

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Oct 11, 2016

Closing, as issue was fixed in c30c154..e6394bf

Thanks!

@mmd-osm mmd-osm closed this as completed Oct 11, 2016
@mmd-osm
Copy link
Contributor Author

mmd-osm commented Jun 4, 2017

I'm reopening this issue, as there's a strange effect with the Karlsruhe node.

I'm testing with keyregexp and a Cyrillic character 'л':

node(49.01342968289611,8.403076529502867,49.015037577534585,8.406440019607544)[!name][~"."~"л"];
out;

/*
Test:
  nodes: 21487097  -> ok        (  [!name] removes node  )
         240120582 -> not ok    (  [!name] has no impact )
*/   

http://overpass-turbo.eu/s/pw1

My expectation is that node 240120582 (Karlsruhe) is not shown in the result, because is has a name tag. This works pretty well for another nearby node 21487097 (Mühlburg), which also has a name tag and the same Cyrillic character, i.e. the Regexp itself filters the correct nodes.

This one is even worse, returning lots of nodes with a name tag:

node({{bbox}})[!name][~"."~"a"];
node._[name];
out;

http://overpass-turbo.eu/s/pwz

Also tested on //dev.overpass-api.de/api_new_feat/ with the same effect.

@mmd-osm mmd-osm reopened this Jun 4, 2017
@ghost
Copy link

ghost commented Jun 9, 2017

I would like to obtain all nodes in a region that have no name tag but do have name:lang. The following query returns nodes that do have a name tag, even though it shouldn't:

node[!"name"][~"^name:(.*)$"~".*"];

@mmd-osm mmd-osm changed the title Filter conditions Invalid query results with regexp filter conditions Apr 12, 2018
@drolbr drolbr added bug and removed enhancement labels May 10, 2018
@mmd-osm
Copy link
Contributor Author

mmd-osm commented Nov 9, 2018

This query returns invalid results depending on zoom level:

http://overpass-turbo.eu/s/Dww

[bbox:{{bbox}}];
node[~"^fhrs.*"~".*"][!"fhrs:id"];
out center;

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Nov 9, 2018

Always the same reason: filter_ids_by_ntags is getting called with duplicates ids in std::vector< Id_Type >& new_ids

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Nov 15, 2018

This issue was fixed in 2ca3f38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants