Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector tiles: Replace boundary polygons with lines, add centroids #701

Closed
1ec5 opened this issue Feb 20, 2024 · 31 comments
Closed

Vector tiles: Replace boundary polygons with lines, add centroids #701

1ec5 opened this issue Feb 20, 2024 · 31 comments

Comments

@1ec5
Copy link
Member

1ec5 commented Feb 20, 2024

A boundary that changes often results in many overlapping features in the vector tiles that are mostly redundant to each other. For example, a tile containing San José, California, weighs in the tens of megabytes, even though it contains little besides boundaries: #698 (comment). San José might be an outlier in terms of the sheer number of annexations,1 but we can expect more problem spots in the future, particularly as mappers begin recording land swaps along river boundaries, which are complex to begin with.

The overlapping boundaries inherently come with exorbitant size overhead, even if we simplify the geometries aggressively: #702. For a sense of scale, the boundary polygons representing San José over time currently contain a total of 2,444,653 separately encoded coordinates, whereas a corresponding set of boundary polylines would contain only 69,674 separately encoded coordinates, since concurrent boundary edges would be encoded only once. Boundary edge polylines also require much less effort to render on the client side than full polygons.

Some of this redundancy is necessary, since a given edge might run in opposite directions after applying winding order to each boundary. In lay terms, along a given north–south boundary edge, San José might lie to the west of the boundary in one year but to the east in another year. A rigorous solution like #603 would probably have to address this problem comprehensively, but for now we could either defer boundary edge labels or find a way to keep two distinct linestring features running in opposite directions.

A polygon representation of boundaries has some utility, especially for rendering boundary halos of different colors, for example when a national park abuts an administrative boundary. It’s also essential for making a choropleth map out of administrative areas: #700. However, overlapping polygons make it impossible to reliably style boundaries as dashes: openmaptiles/openmaptiles#1604 (comment). For example, if we were to reintroduce dashed lines of different lengths based on admin_level=*, the dashes would clump up together forming a solid line whenever a municipal boundary runs concurrently with a county or state boundary.

A naïve approach would be to replace existing usage of ST_AsMVTGeom with whatever we did in OpenHistoricalMap/ohm-deploy#89 to copy the details of an associatedStreet relation onto the member ways. We’ll also need to generate centroid point features for these boundaries at the same time: #543 (comment). But we won’t be able to encode any side-dependent attributes, such as names, until we copy each polyline feature once per winding order.

Footnotes

  1. 1,174 distinct iterations for San José, compared to only 891 for Columbus, Ohio, another major city that experienced sprawl.

@danrademacher
Copy link
Member

danrademacher commented Mar 6, 2024

Interested in thoughts from @Rub21 and @batpad on this, especially paired with #702. To me, on-the-fly replacement of polygons with lines and the generation of centroids for labels seems like it might become overly complex, but admittedly I'm beyond the limits of my understanding of what sorts of Postgres queries we can include in our TOML file without hitting some sort of query limit.

@Rub21
Copy link

Rub21 commented Mar 8, 2024

The idea of converting polygons into lines sounds promising; it will significantly reduce the size of many tiles, and for long-term efficiency, that would be beneficial to me as well. However, I believe this adjustment needs to be made during the data loading process into PostGIS, rather than within the tiler. Therefore, it should be modified in imposm by changing a parameter to load lines instead of polygons for certain types of tags.

To achieve this, we can initiate the process by copying the tile-imposm directory into ohm-deploy/images. This approach allows for better customization without altering the existing functionality in osm-seed.

For clarification, is the central point intended to be added specifically for administrative boundaries? There will be two cases: if the administrative boundaries are polygons and relations. In the case of relations, they may already contain points within them. However, for polygons, we need to add a center point. Adding a central point could be challenging using imposm, but we could explore other alternatives such as a PostGIS function to create, delete, or update center points for polygons

@1ec5
Copy link
Member Author

1ec5 commented Mar 8, 2024

For clarification, is the central point intended to be added specifically for administrative boundaries? There will be two cases: if the administrative boundaries are polygons and relations. In the case of relations, they may already contain points within them. However, for polygons, we need to add a center point. Adding a central point could be challenging using imposm, but we could explore other alternatives such as a PostGIS function to create, delete, or update center points for polygons

Yes, automatically generating a centroid point is important, for more than just boundaries: #543 (comment). For boundaries, we’re already working around the issue by manually mapping and styling explicit points near the centroids. So I think we can pursue that optimization in parallel.

@jeffreyameyer
Copy link
Member

@Rub21 - any update on this?

@Rub21
Copy link

Rub21 commented May 1, 2024

@jeffreyameyer , Per voice call with Sanjay and Dan, we agree that generating the center-point for each polygon or relation that comes from OSM may be challenging. The process of adding, updating, and removing a center-point is not currently supported by IMPOSM. Apart of that, converting those boundaries into linestrings may affect names styling. we can chat bit more about it.

@1ec5
Copy link
Member Author

1ec5 commented May 2, 2024

Per voice call with Sanjay and Dan, we agree that generating the center-point for each polygon or relation that comes from OSM may be challenging. The process of adding, updating, and removing a center-point is not currently supported by IMPOSM.

Are you sure the centroid point needs to be generated by imposm3? Can we instead generate it on the fly in one of the PostgreSQL queries in the tiler’s providers? For example, the landuse_points provider would query the osm_landuse_areas table, calling a PostGIS function such as ST_Centroid instead of just regurgitating the geometry verbatim.

Apart of that, converting those boundaries into linestrings may affect names styling.

Yes, generating the centroid points is a prerequisite for representing boundaries as lines.

To get the performance improvements, another important step would be to either join the boundary relations to deduplicate their LineString geometries or copy the relations’ details onto the member ways at import time. imposm3 might be more of an obstacle for doing this. I think we should take a look at how OpenMapTiles implements boundaries using imposm3.

/ref #543 (comment)

@Rub21
Copy link

Rub21 commented May 6, 2024

Are you sure the centroid point needs to be generated by imposm3?

Yes, you are right. Those can be generated in the Postgres queries. I was not seeing it that way; I was looking more into the imposm side. But adding centroid in the Postgres query totally makes sense. Let me add those centroid points

@Rub21
Copy link

Rub21 commented May 6, 2024

I have made evaluations to display the boundaries as lines and the centroids, but there are some cases where the centroid points are outside of the polygons/lines. For example, there are 96 cases only for Admin levels 1 and 2. 👇

For the following cases, we can convert the multipolygons to simple polygons and then create centroids from these simple polygons, which could be centered under the polygon/lines.

Screenshot 2024-05-06 at 10 42 36 AM Screenshot 2024-05-06 at 10 46 17 AM

For the following cases, it will be a bit difficult because these are already simple polygons, and the centroids are falling outside. I am still investigating how these can overlap with the polygons and centroids.

Screenshot 2024-05-06 at 10 43 33 AM Screenshot 2024-05-06 at 10 47 39 AM

@1ec5
Copy link
Member Author

1ec5 commented May 6, 2024

there are some cases where the centroid points are outside of the polygons/lines

You’re right, this is one of the shortcomings of ST_Centroid. There are some alternative centroid algorithms for better results at some (hopefully negligible) performance cost. openstreetmap-carto uses ST_PointOnSurface. OpenMapTiles switches between ST_Centroid and ST_PointOnSurface depending on the geometry type. You may also need to account for degenerate cases by making the geometry valid first: openmaptiles/openmaptiles#487.

Sometimes the results of ST_PointOnSurface can be a bit unintuitive because it only considers the maximum horizontal space:

The Blue Ash Golf Course has an irregular shape with several pinch points and a hole in the middle that pushes the centroid way off to the corner of the polygon.

Another option is ST_MaximumInscribedCircle, which implements the same algorithm that MapLibre uses to label polygons natively. Here’s what a GeoJSON of the same multipolygon looks like in Overpass Ultra, which is powered by MapLibre GL JS, when zoomed out far enough that no tile boundary cuts across the feature:

The centroid is located in the southeast corner, far from any edge.

@jeffreyameyer
Copy link
Member

jeffreyameyer commented May 6, 2024

Seems like center of ST_MaximumInscribedCircle is the most appealing, which is a pretty cool function.

@1ec5 - what are your thoughts on supporting manually-specified labels?

Also - are these points just for boundary polygons or for all area polygons? (Ref #543 and #579 )

Asking for a friend...
Monosnap OpenHistoricalMap 2024-05-06 12-43-10

@Rub21
Copy link

Rub21 commented May 6, 2024

@1ec5 @jeffreyameyer , ST_MaximumInscribedCircle works very good, here the results.
Screenshot 2024-05-06 at 6 27 06 PM

Screenshot 2024-05-06 at 6 28 33 PM Screenshot 2024-05-06 at 6 27 41 PM

I am going to deploy it in staging

@Rub21
Copy link

Rub21 commented May 7, 2024

The centroid and lines for boundaries have been deployed in staging: https://vtiles.staging.openhistoricalmap.org/. The ohm_areas will continue to show until the styles are updated.

image

cc. @erictheise

@1ec5
Copy link
Member Author

1ec5 commented May 7, 2024

The centroid and lines for boundaries have been deployed in staging

Awesome, thanks for the quick turnaround! Am I correct in understanding that we’re synthesizing the centroid points prior to tiling up the geometries? So we don’t need to worry about a bazillion United States points at z16, all in the same place?

Also - are these points just for boundary polygons or for all area polygons? (Ref #543 and #579 )

The idea is to eventually synthesize these centroid points for any polygon or multipolygon that we intend for stylesheets to label – boundaries, land use areas, buildings, you name it. We started talking about it in #543, but the discussion migrated over here because it’s blocking the conversion of boundaries from polygons to linestrings. If we’re only lighting synthesizing the points on boundaries for now, that’s fine, but then we should continue to track something similar for other layers in #543.

what are your thoughts on supporting manually-specified labels?

Yes, we already support manually specified labels, but we’ll need to implement something to avoid duplication. Some kinds of boundaries, such as cities, will always have a manually specified label because they should be labeled at a point other than the centroid. Ideally, we’d only synthesize a centroid point if the relation doesn’t already have a label member. (Naturally, this condition is only relevant to boundary relations, not other areas or multipolygons.)

There will be many opportunities to polish this feature, but the most pressing need is to get something functional out the door so that a) mappers no longer feel pressure to manually map labels at centroids, and b) we can start converting boundaries to linestrings. That step will probably require some more thought around how to merge features while preserving relevant dates. Once this initial iteration is deployed, we can get to work deleting the Newberry import’s centroid points, many of which lie completely outside of the boundaries they label.

@Rub21
Copy link

Rub21 commented May 7, 2024

Awesome, thanks for the quick turnaround! Am I correct in understanding that we’re synthesizing the centroid points prior to tiling up the geometries? So we don’t need to worry about a bazillion United States points at z16, all in the same place?

We are synthesizing the points and lines in the tiler server not imposm, here the configurations: https://github.com/OpenHistoricalMap/ohm-deploy/blob/staging/images/tiler-server/config/providers/admin_boundaries_centroids.toml , https://github.com/OpenHistoricalMap/ohm-deploy/blob/staging/images/tiler-server/config/providers/admin_boundaries_centroids.toml

Currently, the vector tiles in staging contain the boundaries for polygons, lines, and points. Once the styles are implemented for them, I am going to remove the boundaries for areas, so the vector tiles should be lighter.

@Rub21
Copy link

Rub21 commented May 7, 2024

Something I have been noticing is that the 'place_points' and 'ohm_land_centroids' layers are showing the same information in the points. We need to apply a filter to 'place_points' to avoid displaying the admin points. e.g 👇

Screenshot 2024-05-07 at 11 30 13 AM

@1ec5
Copy link
Member Author

1ec5 commented May 7, 2024

Something I have been noticing is that the 'place_points' and 'ohm_land_centroids' layers are showing the same information in the points. We need to apply a filter to 'place_points' to avoid displaying the admin points.

Yes, this is what I meant above about deduplicating centroids: some of these manually mapped nodes can be deleted once we’ve deployed this feature. However, some place points need to remain because their locations carry more significance (like a city center). A style would typically make one of these place points look different than a centroid, if it labels the centroid at all, so the tiler would need to either:

  • Avoid synthesizing a centroid point if the relation has a label member; or
  • Include a property that helps the stylesheet distinguish a synthesized centroid point from a label member

Once the styles are implemented for them, I am going to remove the boundaries for areas, so the vector tiles should be lighter.

Will you attempt to deduplicate overlapping boundary lines at all? That would have a significant performance benefit, but it might be tricky to get right. Sometimes a boundary starts, ends, starts, ends, and so on, changing admin_levels at various times.

@Rub21
Copy link

Rub21 commented May 7, 2024

Avoid synthesizing a centroid point if the relation has a label member; or
Include a property that helps the stylesheet distinguish a synthesized centroid point from a label member

I am figuring out how we can include more attributes for the place_points layer. I am making some adjustments to the imposm.

Will you attempt to deduplicate overlapping boundary lines at all? That would have a significant performance benefit, but it might be tricky to get right. Sometimes a boundary starts, ends, starts, ends, and so on, changing admin_levels at various times. ,

That is the idea to convert all administrative boundaries into lines and centroids. I don't understand what you meant by 'start and ends,

@Rub21
Copy link

Rub21 commented May 8, 2024

Avoid synthesizing a centroid point if the relation has a label member; or

I have implemented this functionality, the centroids are going to be created if the polygons/relations do not have a label member.

I ran this query 👇 directly in the staging database. Later, if this works, I can implement it to run automatically in the database using a trigger or a cron job

DO $$
BEGIN
    IF NOT EXISTS (SELECT 1 FROM information_schema.columns 
                   WHERE table_name='osm_admin_areas' 
                   AND column_name='has_label') THEN
        ALTER TABLE osm_admin_areas ADD COLUMN has_label BOOLEAN DEFAULT FALSE;
    END IF;
END $$;

CREATE INDEX IF NOT EXISTS osm_relation_members_osm_id ON osm_relation_members (osm_id);

UPDATE osm_admin_areas
SET has_label = TRUE
WHERE osm_id IN (
    SELECT osm_id
    FROM osm_relation_members
    WHERE role = 'label'
);

The results e.g,

This relation do not have label in their members. , so it show a centroid in the vtiles.
https://vtiles.staging.openhistoricalmap.org/#10.18/42.1345/-5.75
image

In case of this relation the has label member, the centroid it wont show up in vtiles.
image

@1ec5
Copy link
Member Author

1ec5 commented May 8, 2024

Will you attempt to deduplicate overlapping boundary lines at all? That would have a significant performance benefit, but it might be tricky to get right. Sometimes a boundary starts, ends, starts, ends, and so on, changing admin_levels at various times. ,

That is the idea to convert all administrative boundaries into lines and centroids. I don't understand what you meant by 'start and ends,

For example, this way at the edge of San José, California, is a member of many boundary relations. Since each boundary is currently a polygon feature, the same coordinates are repeated many times over in the same tile. Just converting these boundary relations to lines probably wouldn’t affect the tile’s size, but I was wondering if whether you were planning to go a step further and dissolve the boundaries back onto this shared way so that it only appears once in the tile. If so, this might be an easy case, since all the boundary relations have the same boundary=* and admin_level=* tags. But sometimes it’s more complicated. This way at the edge of Texas has mostly run along admin_level=4 boundaries, but on two separate occasions, a total of four admin_level=2 boundaries have also run along it.

I think it would be fine to deploy a first iteration of this feature that doesn’t merge the boundary ways yet. At least that would give us an opportunity to clean up the data and the stylesheets. But sooner or later we’ll want to consolidate the ways in order to reduce tile size. This would also enable stylesheets to apply dashed lines to the boundaries; overlapping ways or polygons prevent dashing because one line’s dashes can fill in the other line’s gaps.

@Rub21
Copy link

Rub21 commented May 8, 2024

For example, this way at the edge of San José, California, is a member of many boundary relations. Since each boundary is currently a polygon feature, the same coordinates are repeated many times over in the same tile.

yes, you are right, the polygons are been created according the number of relations/closed ways, for the case of San Jose way edge The way is part of 164 polygons in the tiler DB.

If we want to keep only one geometry as a line in this case, we may need to add all the relation information that has been part of it. it means that the line should retain information from the 164 relations, including all details necessary to display on the map, such as name, start date, and end date. as well as the following lines too, that also a lot attributes informations for a tiles size.

Currently, the lines (geometries) are being repeated 164 times, but the polygon/linestring information such as name, start_date, and end_date appears only once in the tile.

I think it would be fine to deploy a first iteration of this feature that doesn’t merge the boundary ways yet. At least that would give us an opportunity to clean up the data and the stylesheets.

Totally make sense,

Before deploying those latest changes, I am going to make the script run automatically. What I've done so far is run many scripts manually in the database.

@Rub21
Copy link

Rub21 commented May 8, 2024

This task has been completed; the process of creating the centroids and avoiding them in case relations have label is working fine.
https://vtiles.staging.openhistoricalmap.org/#4.26/-3.9/-88.94

Screenshot 2024-05-08 at 5 51 54 PM

@1ec5
Copy link
Member Author

1ec5 commented May 9, 2024

If we want to keep only one geometry as a line in this case, we may need to add all the relation information that has been part of it. it means that the line should retain information from the 164 relations, including all details necessary to display on the map, such as name, start date, and end date. as well as the following lines too, that also a lot attributes informations for a tiles size.

Now that we’re generating separate point features for the labels, name properties on the lines wouldn’t be useful for labeling centroids anymore. They could be useful for boundary edge labels, but only if we indicate whether a name applies to the left or right side of the linestring. That would be unnecessary for now, since our styles don’t have boundary edge labels yet.

I was thinking that we could also detect that the way is part of relations that completely cover a certain time period. My example way from San José was continuously part of a boundary relation for many years, so all we need is a single linestring with the earliest relation’s start_date and the latest relation’s end_date. I don’t know how feasible this is with the current architecture. We can track this idea in a separate issue to keep things clear.

@vknoppkewetzel vknoppkewetzel changed the title Replace boundary polygons with lines in vector tiles Vector tiles: Replace boundary polygons with lines, add centroids May 15, 2024
@vknoppkewetzel
Copy link
Collaborator

I've updated this ticket with a new name to capture it as vector tile work, and created a separate design-related ticket for the stylesheet updates (#787 )

@danrademacher
Copy link
Member

This task has been completed; the process of creating the centroids and avoiding them in case relations have label is working fine. https://vtiles.staging.openhistoricalmap.org/#4.26/-3.9/-88.94

Screenshot 2024-05-08 at 5 51 54 PM

@Rub21 can you clarify here -- in order for @vknoppkewetzel and @tsinn to style all labels for polygons, it sounds like sometimes they need to target place_points and sometimes they need to target land_ohm_centroids. Is that correct? The two will never be part of a single layer?

@vknoppkewetzel
Copy link
Collaborator

I can style with them as separate but it seems like it would make sense to fold into the place_points data in the future I think?

I do know some of the attributes are slightly different.
image
image

in place_points, there is no reference to admin_level like in land_ohm_centroids. In place_points however, type has a value that is useful to know (possibly?) - the naming convention may be country-dependent, but the above example screenshots shows type=county.

This means:

  1. country name data layer styling for place_points AND country name data layer styling for land_ohm_centroids
  2. etc, 2 layers for all admin levels

@Rub21
Copy link

Rub21 commented May 21, 2024

can you clarify here -- in order for @vknoppkewetzel and @tsinn to style all labels for polygons, it sounds like sometimes they need to target place_points and sometimes they need to target land_ohm_centroids. Is that correct? The two will never be part of a single layer?

How is the current flow and distribution of objects in the layers?

The place_points layer consists of all objects that have place=* according to the OpenStreetMap wiki. This includes place=city, place=town, place=village, and place=hamlet. Examples of these points are:

When importing to the database using tiler-imposm, a conversion from place to type in the attributes is performed. These were established at the beginning of the project.

So, if there is a relation like this one, this relation object will be added to two layers:

  1. land_ohm (relation/polygons), which is the area of the relation.
  2. place_points, because the relation has a point member with place=state.

Screenshot

What we are currently developing:

For the land_ohm_lines and land_ohm_centroids layers:

  • land_ohm (relations/polygons) that are areas will be converted to linestring, and all attributes will be copied to the linestring and it will be represented as the land_ohm_lines layer.
  • land_ohm (polygons) that are areas will have their centroids calculated, and all attributes will be copied to the centroids, and it will be represented as the land_ohm_centroids layer.

Therefore, for the layers, the same object will have the same attributes in the land_ohm_lines and land_ohm_centroids layers.

As example:

  • A Relation 2800841 has been created with a group of ways and a label as a member. It has a label as a point member with the attribute place=state (Node 2113249224). This point will be shown in the place_points layer.
image image

Therefore, according to our new conversion of linestrings and centroids, we would have same object in place_points , land_ohm_lines and land_ohm_centroids, as the comment here: #701 (comment)

image

To resolve the issue of duplication in the place_points and land_ohm_centroids layers, we have implemented some functions in the PostgreSQL database. If there is a point member in the relation with the attribute place=*, it will be shown in the place_points layer as it currently is. Otherwise, if there is no point member, it will be shown in the land_ohm_centroids layer. This avoids having two points displaying the same attributes.

For example, the above relation and way do not have a member point with place=*. For this reason, we are creating the centroids. Note, we need to create centroids for these objects because once we convert them into linestrings, there will be no way to show the representative names without using the centroids.

https://vtiles.staging.openhistoricalmap.org/#7.14/-10.298/-78.884
image

@vknoppkewetzel
Copy link
Collaborator

vknoppkewetzel commented May 21, 2024

Thanks @Rub21 . Currently Country and state labels are housed in place_points , and will in the future be housed in land_ohm_centroids ?

OR just "the relevant country and state created centroids will be brought into place labels"??? So land_ohm_centroids is everything else?

This task has been completed; the process of creating the centroids and avoiding them in case relations have label is working fine.
I had understood that the task was completed here, but perhaps that was just referencing the wrapping up of land_ohm_lines?

In #787 I've updated a test style that shows the updated land_ohm_lines and includes a TEST data layer with the land_ohm_centroids highlighted in red text. I can circle back to refine those further when the centroid work is finished. Does that sound like the right next steps for me on my end?

I am just trying to figure out how I am meant to style land_ohm_centroids. :) I initially was confused to not see as many points as I expected, and then to see them referencing a variety of admin levels - but not consistently, geographically.

However, if the answer is just "duplicate all admin related labels and expect to pull from both place_points and land_ohm_centroids I can do that haha

@1ec5
Copy link
Member Author

1ec5 commented May 21, 2024

If there is a point member in the relation with the attribute place=*, it will be shown in the place_points layer as it currently is.

Does this distinguish between the admin_centre and label roles? A member with either role would typically be a place=* node, but we would still want to generate a centroid for the United States if it lacks a label member, even if the node for Washington, D.C., is an admin_centre member. These days, I view admin_centre as somewhat off-topic for OHM, because mappers can indicate a “seat of government of” relationship in Wikidata; however, we have over 6,000 occurrences of this role, and I’d expect the number to increase over time due to influence from OSM.

@jeffreyameyer
Copy link
Member

Barring the discussion above related to admin_centre and label roles, given that this is now operating in production, is this ticket done? @danrademacher w00t!

@Rub21
Copy link

Rub21 commented Jun 3, 2024

This ticket is done, closing here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants