Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom_filter get_pois not working with value True #224

Open
mattijsdp opened this issue Dec 12, 2023 · 6 comments
Open

custom_filter get_pois not working with value True #224

mattijsdp opened this issue Dec 12, 2023 · 6 comments

Comments

@mattijsdp
Copy link

Describe the bug
The custom_filter does not work as one would expect when using the catch-all 'True' as a value in combination with another tag key. If I understand the documentation correctly, then this should return all osm records with that tag key and this return more entities compared to when one specifies a tag key and value.

To Reproduce

from pyrosm import OSM, get_data
import shapely
import h3

# Some h3 cell in New York
bb = shapely.Polygon(h3.h3_to_geo_boundary("882a107749fffff", geo_json=True))

state = "New York"
fp = get_data("New York")
osm = OSM(fp, bounding_box=bb)

# Get all buildings and "railway": "station"
pois = osm.get_pois(custom_filter={"building": True, "railway": ["station"]})
print(len(pois)) # 4

# Get all "building": "yes" and "railway": "station"
pois = osm.get_pois(custom_filter={"building": ["yes"], "railway": ["station"]})
print(len(pois)) # 2152

Expected behavior
Using True should always lead to a geq number of records than specifying some tag value.

Environment:

  • OS: Linux
  • Python package source (PyPi, conda, ...): pyrosm, 0.6.2, py311hb755f60_1, conda-forge
  • Versions of Python, Java Development Kit, Python modules: 3.11.6

Additional context
I realise there is a function called get_buildings() which would be more suited if you are only interested in buildings. I am interested in POIs but some buildings I would also like to include. In general, I am using a somewhat involved custom_filter with True and this example shows that I either don't understand the implementation or there is a bug.

Great work on the package though, very useful!!

@mattijsdp
Copy link
Author

I believe it is because in the line below, the tag values of True aren't added to filter_keys nor data_filter. If all tag values are True then this is fine as data_filter is None is handled below that but not when it is a combination.
https://github.com/HTenkanen/pyrosm/blob/66de74bd0496d1148618842cac58923bf22d97ea/pyrosm/data_filter.pyx#L112C13-L112C13

@mattijsdp
Copy link
Author

I see that overlapping_filter also plays a part. I think this should be set to True if there are any tags that are set to value True (catch-all) so that the for loop here is never broken early in these cases.

@Chwiggy
Copy link

Chwiggy commented Dec 15, 2023

I'm getting odd results with using the custom_filter for the get_boundaries method too.

Filtering with an osm id, yields more results than the same request without a custom filter.

my_filter = {'id':['639343']}
admin_boundaries = osm_data_obj.get_boundaries(custom_filter = my_filter)

yields 378 results that do not have the id '639343'. Without a custom filter there are only 245 results.

Similar things are true with filters:
{'name':['Regierungsbezirk Karlsruhe']} yields 245 results, includes only 1 with that exact name
{'admin_level':['6']} yields 245 results as well. includes only 10 with that admin_level

using a custom filter for the get_data_by_custom_criteria method seems to work as expected however.

@mattijsdp
Copy link
Author

@Chwiggy could you share the bounding box or area you're using?

@Chwiggy
Copy link

Chwiggy commented Dec 21, 2023

This is the geometry i was using:

{ "type": "Feature",
    "properties": { "name": "2023_rnv_gtfs.zip", "path": "filepath" },
    "geometry": { 
         "type": "Polygon",
         "coordinates": [ [ [ 8.1704019, 49.3432631 ], [ 8.811395, 49.3432631 ], [ 8.811395, 49.5811911 ], [ 8.1704019, 49.5811911 ], [ 8.1704019, 49.3432631 ] ] ]
    } 
}

the osm.pbf file was cropped to this bounding box from a larger file with osmosis with completeWays=yes

@mattijsdp
Copy link
Author

@Chwiggy please see my MR for a fix.

Your use case works as expected though. It is just because get_boundaries adds additional tag key:value combinations as can be seen here. If the boundary tag key isn't specified then it is added as True thereby including all osm entities with the tag boundary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants