-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow AsTopoJSON when running on big tables #13
Comments
This is also discussed in this issue . Length of output
|
If I test with GeoJson it's about 50 times faster.
|
Here is some stats about this input polygon
The mbr |
Is there a chance to get a small-enough dump of the database ? |
I check this some more and the problem in this case was all the holes, which would not show up anyway on the selected area by the client. In this Postgis branch I added bounding a box parameter and then it's about 80 times faster.
This also reduces the payload that need to be sent to the client. (I have not tested topojson yet)
|
Added a dump of the of topology schema topo_ar5_sysdata_webclient_t2 that I am testing on. |
Looking at that TopoJSON output, is it normal to have all those dangling edges ? Many arc_id are present as both positive and negative. In the very first output I see even internal rings formed by the SAME edge ?? See Are you sure the topology is valid ? I'd go check for those edges (those are arc_id, actual edge identifiers would be in the edgemap table). Either the topology is invalid or the code of AsGeoJSON is badly broken |
Oops, forget last comment, it's the two's complement for negative arcs which puzzled me ! |
I received a dump of the edges for TopoGeometry 742280 (referenced in #13 (comment)), loaded them in a new topology and created a TopoGeometry object representing the corresponding polygon. Creating the TopoGeometry using toTopoGeom took ~13 seconds on itself. The polygon has 1137 holes so the topology ends up having 1138 faces. The topogeometry is composed by a single element: the outer face, All the 2415 edges of the topology will be part of the output TopoJSON. Calling AsTopoJSON against the so-created TopoGeometry on this system takes ~18 seconds, no matter the presence of an index in the edgemap, or if the edgemap starts empty or is already populated. The TopoJSON output size is 15429 bytes. I'll profile and report back what I find. |
I have a couple of ideas to speed the query up. First idea brings down the time on my machine from 18 seconds to 2 second, using a temporary table to store all edges binding the TopoGeometry and thus not hitting the edges table more than once per TopoGeometry. The speed improvement may be higher when the topology has more than ONLY the subject topogeometry. The change currently results in some difference in output for the regression testsuite so I need to check those more carefully. The second idea is just a minor refactoring that uses NO temporary table (so still hits the edge table) but brings down the time from 18 seconds to 5 seconds for me. This second idea is less intrusive and quicker to test, if you want test it @larsop here's the patch:
|
Upstream ticket: https://trac.osgeo.org/postgis/ticket/4789 |
The temporary-table implementation is in this git branch: https://git.osgeo.org/gitea/strk/postgis/src/branch/topojson-temptable I've yet to verify if ordering differences are significant or not |
Thanks I will test your patches today , but first since I had started with the patch where I use bounding box as optional parameter to AsTopoJSON I tested that first. In this case the client moves around in small areas, from the client I get the extent of what is visible for the user and I then pass this as parameter to backend and I removed all holes that are outside the the area that are visible for user. For single a bb area and the test topo the time was reduced from 80 times ( 9707.204 ms to 117.616 ms) Here is a complete test case where it goes 70 times faster (from 59620.133 ms to 834.981 ms).
And here is the fast one with bounding box as a parameter done in less than a second
Could you also consider to include this patch in Postgres also ? |
I tested this now and the the case with out bb was reduced from 59620.133 ms to 18128.421 ms.
The test where use bb as parameter was going faster from 834.981 ms to 606.804 ms
Thanks, this is great with your patch and the patch from me, the TopoJson seen from client is now almost 100 times faster in our testcase. The data we test here are original and can not split them in any smaller pieces and I am pretty sure we have polygons more complicated than this. |
I also tested this one together with my patch where I use bb. Here the time is reduced to 7344.997 ms, so this is your best patch so far.
With bb we get this time.
This is about 100 ms slower that using your first patch together with bb. So to get below 1 second it seems like we need to use bounding box patch in some way. |
From ~70seconds to ~7seconds isn't bad, thanks for testing! As for the bbox patch I find it too wild. |
It's not a problem because when we do a query based on bb we return return a polygons that intersects this bb today. If the client moves a meter west it's has to request new data anyway , because there might be a new polygon that now intersects the new bb . The only case client can do some caching is when zooming in. Another problem with caching here is that when the user starts to update the database, it's important the data the user work is the data in the database. |
I was also thinking about adding the bb border between lines that share the same inside face and are crossing the the bb. Yes this a much more complicated operation and a lot of different cases to think about, so it would probably need a lot more work, but if we got it to work it would be something "like" vector tails in a way. |
I merged my temporary table patch in master branch of PostGIS, so we can talk about further improvements. |
I shall note that your patch will only remove holes composed by a single edge. In the topology I constructed from the dump there are only 135 holes composed by a single edge vs. 1003 holes composed by more than one edge. It might have been an import issue on my side, but worth checking in your real database if this is also the case (ie: how many holes are you dropping?). It would also be useful to restart the profiling from scratch: is |
Yes that's correct I was unsure about how to do this, so I just made a simple test that cleaned out the the simple holes. So we should probably make this test to include more than simple holes.
Great work, this is also very fast now down to 1/4 of second.
|
I set this as closed/solved and I create a new task to reduce topojson size based on input bounding box |
Is case test case I have 4107624 rows in edge_data, with full test case it will around 30 mill edges.
When I run the SQL below its's done in less 4 ms. The number of rows returned is 71 from the SQL below.
but when I use this as input to in the sql below it takes more than a minute,
Most of time is here .
The text was updated successfully, but these errors were encountered: