-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polygonize: using the two-arm chains edgetracing algorithm #7344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work ! I've issue a pull request against your fork in kikitte#1 with various non substantial improvements.
Are you connected/affiliate with the authors of the paper, or did you do the implementation just from it (I've had a look at it and although I see the relationship between your code and the paper, I also see that there are various gaps you had to fill in)
@@ -664,9 +172,6 @@ static CPLErr GDALPolygonizeT(GDALRasterBandH hSrcBand, | |||
eErr = GDALRasterIO(hSrcBand, GF_Read, 0, iY, nXSize, 1, panThisLineVal, | |||
nXSize, 1, eDT, 0, 0); | |||
|
|||
if (eErr == CE_None && hMaskBand != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the removal of the masking appropriate here ? Isn't there a risk of setting a different ID in the first and second pass if both don't apply the masking ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a problem, it need to mask the raster data while reading.
Since the polygonizer needs a full labeling raster as its input(via Polygonizer::processLine), it means that any zero area defined by the mask should be assigned to a unique value, but this looks like a time-consuming job to find that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the most used case is to use nodata area as mask, so the nodata value serve as the unique value, and all looks ok in that case.
And I think the logic of the polygonizer can be update to take into account the "invalid polygon"(its polygon id is -1 if masking while reading) concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that nodata is probably the main use case, but we should make sure that it works with an arbitrary mask band, not necessarily tied to nodata. I would suggest you restore the use of the mask band in the first pass, per this initial pull request, and potentially come with improvements/optimizations in a follow-up pull request. I don't think the main perf improvements of the new algorithm are related to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to fix this problem, it should work well but with little memory usage increasing because the polygonizer may hole the invalid polygon for a long time. UPDATE: I found a problem(the number of the output polygon changed) when running a 8 connectedness conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need an extra test case in autotest to catch those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no problem. It is caused by my carelessness, actually the result is correct, I don't remember why I written the wrong feature count in the benchmark table, now I have updated it.
Yes, we'd better add a none nodata mask as a test case, it would help us to avoid this situation in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'd better add a none nodata mask as a test case
do you want to add that in this pull request ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just now I've add a test case.
…id (harmless) unsigned integer overflow
No, I have no any connections to the authors of this paper, and I do the implementation just from it. Yes, actually this paper is less clarity in many details(but it shows us a nice idea of r2v alg), so I filled these gaps during implementation on my own. |
Merged. Thanks again @kikitte ! |
What does this PR do?
The Two-Arm Chains EdgeTracing Algorithm does a faster, memory saving, and robust polygonize job. It is described in Junhua Teng, Fahui Wang, Yu Liu, An Efficient Algorithm for Raster-to-Vector Data Conversion.
It is:
The Two-Arm Chains EdgeTracing Algorithm is as fast as the original algorithm when processing small dataset, however it is much faster than the original algorithm when processing large dataset. Please check the benchmarks below for detail.
This algorithm can handle many special cases gracefully, produce 'correct' polygon geometry without topology error. And the polygons produced by this algorithm follow the right-hand rule (counterclockwise external rings, clockwise internal rings).
It is easy to extend this algorithm to do another jobs by deriving the
PolygonReceiver
base class. For eaxmple, Finding the boundary cells of a raster.This algorithm use less memory compare to the original in all tests.
Benchmarks
Test on AMD 4800h, 16GB RAM, 1T nvme ssd machine, with ArchLInux installed. GDAL 3.6.2 is used as the origin algorithm.
GDEM Colorized Map(4320x2160, 26.7 MiB)
4connected:
8connected:
random_grid.tif (5000x5000, 23.9 MiB)
This file is created with the following python script:
4connected:
8connected:
OR_NLCD_2011(converted to geotiff)(21959x16118, 337.6 MiB)
4connected:
8connected: