Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem extracting admin level 3 - 4 within a country from country-specific pbf #9

Open
mashallAryan opened this issue Dec 6, 2020 · 2 comments

Comments

@mashallAryan
Copy link

I would like to extract admin levels within a specific country (e.g. USA) give its pbf file. The problem is the output of osm_extract_polygon includes admin levels from neighboring countries as well (e.g Canada). Is this a bug or I am not able to use this properly?
example:

osm_extract_polygon -m 3 -x 6 us-latest.osm.pbf
includes "quebec" but does not include Alaska.

@mashallAryan mashallAryan changed the title problem extracting admin level 3 - 4 within a country from problem extracting admin level 3 - 4 within a country from country-specific pbf Dec 7, 2020
@AndGem
Copy link
Owner

AndGem commented Dec 7, 2020

Hi @mashallAryan ,

thanks for filing the issue!

I'm not fully surprised that there is some "leaking" of other countries admin boundaries in the data source. Unfortunately, it is not trivial to filter for "country" as the data seems to be not consistently contained in admin boundary tag.

There are some solutions (that are more or less hacky) that I can see on how to resolve the situation.
i) add a country flag and either put this on the allow/deny list for the filtering and try to filter in/out those admin boundaries that are allowed/forbidden
ii) This filtering might be extended to be hierarchical: For example, it is sufficient if the country information is contained in one of the regions up to the highest admin boundary level. This might improve quality and maybe makes this even completely safe (I assume the highest admin level would reliably have country information)

There might also be a workaround that might work out for your use case. What you might do (not tested; maybe it doesn't work):

  1. use osm_extract_polygon to extract the highest boundary level us-latest.osm.pbf which would result in a 'united states of america.poly` (or something like this)
  2. use osmosis to cut us-latest.osm.pbf with this poly file to restrict it strictly to the united states territory
  3. run your original command (i.e., osm_extract_polygon -m 3 -x 6 ...) on this modified file.

I assume this might work and solve your issue. Otherwise, I'm tempted to actually implement solution ii) but it take me a bit to do it properly.

What do you think? And if you have tried out the workaround, please let me know if it works. I would add it to the README until the issue is properly addressed (if possible). I'm also open to PRs if you want to add the information yourself.

@morandd
Copy link

morandd commented Mar 16, 2021

Hi,
This issue affects many levels, not only 3-6. And it's pervasive across European countries.

Following AndGem's workaround, a useful solution could be to accept a polygon mask file as an input to this routine, and publish out only polygons whose centroids fall inside that mask. That essentially internalizes the the extra "cut" operation with osmosis. Pseudocode for how to find a centroid, and compute whether the centroid is inside another polygon, is easy to come by.

But overall, thank you for this contribution! It's super useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants