-
Notifications
You must be signed in to change notification settings - Fork 226
Add new algorithm - densify() #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think that a more appropriate name would be "densify". In terms of semantics, what happens if you have two consecutive points on a segment that are exactly equal to the threshold? What happens when due to numerical errors you compute a number of sample points that produce distances that are slightly larger than the threshold (again due to numerical errors)? Also, again in terms of semantics, what is the guarantee that is given if any? The way you describe it there is reason not to make the segments arbitrarily small. How are the generated segments produced? From the code it seems that you are trying to evenly distribute them. Maybe this should be part of the semantics for the generated output. I have to admit that I have not really looked at the code in detail, so please point me to parts of the code that address, my questions. |
|
This is a good idea, indeed necessary and I believe it is normally called Densify. At least in Esri but I have seen it more |
|
+1 for postgis also uses that term; here: https://postgis.net/docs/ST_FrechetDistance.html used for more accurate Frechet distance computation |
|
As introduction, the algorithm traverses the geometry and for each segment calls the strategy. So how the segment is handled depends entirely on the strategy which fills the other geometry. I've implemented it like that because in geographic CS inverse and direct solutions has to be calculated during the generation of points in some way or the other and this is CS specific. Currently the algorithm is linear/iterative so inverse formula is called to get azimuth and distance and then direct formula is called N times to generate N missing points, this is done entirely in strategy. Let's call it algorithm [1]. A different approach comes to mind where e.g. segment is divided in a recursive manner by half, until the length is shorter than threshold, this would allow to design more elegant strategy since the other geometry filling part coudl be moved to the algorithm. This could also generate more accurate geographic segment representation because the closer to the second point the azimuth calculated by inverse formula would be more accurate. However in this case the strategy would be required to call both inverse and direct formula for each segment division and more segments would be generated. So my esstimate is that the algorithm would be 2x slower. Let's call this algorithm [2].
The segment is left as it is (this is done in strategy).
The segments that are slightly too long are in the output and nothing is done with them. Such case is in the test file with a comment. And by "slightly too long" I mean a few meters for around 10km long segments generated for initial segment 2x longer with andoyer inverse formula and thomas direct 1st order. So these may be inaccuracies of formulas not even numerical errors.
Currently there is no guarantee regarding the lengths of output segments as the behavior is defined in strategy and as you noticed due to the iterative nature of the algorithm [1]. If e.g. algorithm [2] was used then we could guarantee that a corresponding (using the same formulas) call of
Yes, and as described above this is defined in strategy (algorithm [1]). What do you mean by "part of the semantics for the generated output"?
The relevant parts are in strategies so |
|
@mkaravel @barendgehrels @vissarion Thanks. I'll rename it to |
…gies. This allows moving range handling, e.g. push_back() calls from strategies to algorithm.
|
Thanks, I made some comments |
|
@barendgehrels I cannot see the comments. |
barendgehrels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, had to submit the review. Here it is
| // other strategies dimensions > 2 should be taken into account. | ||
| signed_size_type n = signed_size_type(len2d / length_threshold); | ||
| if (n <= 0) | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add, here and at some other places, curly braces for return, as we are used to do
| // TODO: For consistency with spherical and geographic 2d length is | ||
| // taken into account. This probably should be changed. Also in the | ||
| // other strategies dimensions > 2 should be taken into account. | ||
| signed_size_type n = signed_size_type(len2d / length_threshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should assert that length_threshold > 0, or is that done elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it should probably be an assertion in strategy and exception in algorithm. I'll add it
| CalculationType | ||
| >::type calc_t; | ||
|
|
||
| typedef model::point<calc_t, 2, cs::cartesian> point2d_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Can we not use Point for this?
EDIT: It will then maybe also work for 3D points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary in order to use calc_t in calculations. The rationale behind dimension 2 is explained in the comment below:
// TODO: For consistency with spherical and geographic 2d length is
// taken into account. This probably should be changed. Also in the
// other strategies dimensions > 2 should be taken into account.
I have no simple answer regarding this. But in order to be consistent with other strategies we should know how to calculate length in e.g. geographic with height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented N-dimensional version which means that in cartesian strategy the length threshold is WRT N-dimensional segment. So it's different than in spherical and geographic.
|
|
||
| private: | ||
| Range & m_rng; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this duplicates <boost/geometry/core/mutable_range.hpp>
Can you verify if that can be used instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutable_range.hpp implements default MutableRange traits for pushing-back, resizing and clearing of range. Traits which users can specialize if needed. That's all. For convenience we then have functions like bg::range::push_back() using these traits.
On the other hand AssignPolicy works like a function object or callback function allowing to define how the intermediate points created by the strategy are handled. It's here to have mutable Range operations only in the algorithm, not in strategies. This way the strategies don't depend on any concept more complex than a Point.
|
|
||
| convert_and_push_back(rng_out, p0); | ||
|
|
||
| strategy.apply(p0, p1, policy, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If combination of segment length and densify_length results in a fraction, is than p1 missing?
EDIT: Ah, AppendLastPoint might take care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any calculation related to lengths is done in strategies. The strategy only generates intermediate points which means that it may generate no points and that the responsibility to return segment's endpoints is not the algorithm's side.
AppendLastPoint is here to distinguish between open and closed geometries. And yes, if the strategy doesn't generate any points then the condition below appends the last endpoint.
| static inline void apply(Geometry const& rng, GeometryOut & rng_out, | ||
| T const& len, Strategy const& strategy) | ||
| { | ||
| typedef typename boost::range_value<Geometry>::type point_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively you can state here:
if (boost::empty(rng))
{
return;
}
Anyway, otherwise please make these count variables const, and use curly braces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use boost::empty() and rewrite the algorithm to use range iterator instead of an index i and range::at() calls.
|
|
||
| // NOTE: Normalization will not work for integral coordinates | ||
| // normalize | ||
| //geometry::divide_value(dir01, len2d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. We can create a compile time error for integer coordinates, or is that already done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is no need for that. Below equivalent formula is used. It's requires more operations since vector multiplication and division is done each iteration. An alternative would be to use it only for integral coordinates and for FP or user-defined coordinates used normalized vector. But I'm not sure if it's that important. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I thought integer was not supported at all. Now I understand, thanks. So you can ignore my comment.
test/algorithms/densify.cpp
Outdated
| G o; | ||
| bg::densify(g, o, max_distance, d.compl_s); | ||
|
|
||
| // geometry was indeed complexified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
densified
| Distance const& max_distance) | ||
| { | ||
| densify(geometry, out, max_distance, default_strategy()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling densify looks good, as expected. Thanks.
The Geometry type input/output is the same, that is most probably OK, in simplify it is the same, for example. But the code in strategy assumes that it can be different too. Anyway, that is not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, originally I intended to allow passing different types. E.g. if the input is Segment the output has to be Linestring or MultiLinestring. But in order to make it simpler for now I decided to mimic simplify's interface.
| struct cartesian | ||
| { | ||
| template <typename Point, typename AssignPolicy, typename T> | ||
| static inline void apply(Point const& p0, Point const& p1, AssignPolicy & policy, T const& length_threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface looks OK, with still the comment that AssignPolicy can maybe combined with our mutable_range concept (which is adaptable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that instead of AssignPolicy a MutableRange could simply be passed into the strategy? If that's the case then this was the original design but I changed it to get rid of Range-related code from strategies and have it only in the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed a suggestion, but I did not verify if it is really possible or more convenient. So a "maybe", just to verify if what we have is OK or if this is better.
|
@awulkiew thanks for your answers! |
…ssert max_distance > 0 (in strategies).
|
Thanks Adam! Some comments: Should you add the header of strategies and algorithms to strategies.hpp and geometry.hpp respectively? Testing the algorithm a bit, for LINESTRING(0 0,1 1) and max distance=1 the lengths are as following for geographic: 156897.7995 (original) while for spherical with max distance=0.01: 157249.5978 (original) This is expected since the algorithm for spherical uses a more direct computation and not this navigation-like construction of new points that highly depends on the computed azimuth. Interestingly, the different strategies for direct and inverse computation does not affect a lot the result (compared to the length error w.r.t. the original segment). Moreover, we observe that less accurate methods (i.e. andoyer) has smaller error than more accurate ones (i.e. vincenty). |
Yes, it should. I just added it.
Yes, spherical strategy works with 3d cartesian vectors and geographic strategy use inverse and direct geodesic formulas to calculate intermediate points. So geographic strategy strongly depends on the accuracy of the azimuth calculation. The segment from your example is 150km long and max_distance is 1m, so densified geometry will have around 150k points. I'm not sure which densify strategies were used in your example, default ones or corresponding to distance strategies and which distance strategy was used to calculate the length of the original. From the values I've got below (though with presumably different compiler) you probably densified geometry using 3 different formulas but in all cases calculated distance with default strategy using andoyer formula. Out of curiosity I checked all combinations of strategies. Again for LINESTRING(0 0,1 1) and max_distance 1m would be (with msvc-14):
and for a longer segment LINESTRING(0 0,40 40) and max_distance 40m to keep the number of points similar:
So the results looks as expected i.e. less accurate formulas give less accurate results. Note that the loss in accuracy occurs both at the densify and at the distance calculation stages.
In this case, and if you indeed used the strategies I think you used (andoyer for distance in every case), the least precise part is the length calculation of a linestring containing ~150k points because the inaccuracies are accummulated in the result and the accumulated error of the result in all three cases may be greater than the error of andoyer formula for the original segment. So indeed in this case the observation is correct however I think the reason is as described above and I doubt that this is the general rule (that theoretically least precise formula is the most precise in practice). |
Change strategies from structs to classes as this is the requirement of the docs generating tool.
a1b7f91 to
f35a4f9
Compare
|
Indeed as you said, I used the default strategy for length (i.e. andoyer) and varied the densify strategies. Thanks for the clarification. My main question is still: why using andoyer formula for length the result on vincenty densified segment is less accurate than the one on andoyer densified segment. But I see that is of little interest and it only happens in the first table above, thus I am OK with it. Finally, it seems that even with very densely densified segments (as the ones above) the results have some good accuracy (when vincenty is used). Do you think the more accurate/iterative method you described above worth to be implemented? It should give even more accurate results. |
|
Regarding the other algorithm. Currently the algorithm [1] is iterative, i.e. first the distance and azimuth is calculated (1* inverse) and then N points are generated on the segment (N* direct). Alternative algorithm [2] would be to divide segment recursively so for each division calculate inverse and direct (N* inverse and N* direct). So my guess is that algorithm 2 would be more accurate. Now, the accuracy of the result is heavily influenced by the accuracy of the azimuth calculation. Theoretically the least accurate point generated by algorithm [1] would be the point furthest from the first point (passed into direct solution). So I've done an experiment comparing both. In this experiment the last intermediate point is calcualted as if it was generated by the recursive algorithm [2]. So by generating a midpoint and then performing the same for the second part of the divided segment and doing that until the length of the segment is smaller than the threshold (below it's called recursive). And for comparison an algorithm which calculates the length of the original segment once, then only calculates the distance recursively to get the distance to the intermediate point as if the segment was divided into half recursively and then performing direct calculation once using this distance (it's called direct) as if it was used by algorithm [1]. In all cases I was using the formulas consistently in algorithms length, distance and densify. The results are: LINESTRING(0 0, 1 1), max_distance = 1000
LINESTRING(0 0, 1 1), max_distance = 1
LINESTRING(0 0, 40 40), max_distance = 40
So looking at azimuths the recursive algorithm seems to be more accurate for andoyer and thomas but less accurate for vincenty. However the difference between algorithms is similar to the accuracy of formulas. E.g. the difference between points for shorter linestring and andoyer is around 2m so similar as in length of the whole segment for andoyer v.s. vincenty (156897.7995 v.s. 156899.5683). EDIT: Similar for original azimuths (0.7886744754 v.s. 0.7886800845, so difference 5.6e-6). So it looks like the implementation of recursive algorithm is not needed. However it looks like the geometry densified with recursive algorithm is more accurate representation, at least for less precise algorithms. So I guess it could be used as a simple way of "improving" the geometry before passing it into some other algorithm. What do you think? Btw, it probably has no sense to pass a threshold smaller than the error of the formula. E.g. 1m threshold for formula with 20m error. So it's probable that for andoyer the only reasonable results are the ones for max_distance = 1000. |
|
It seems that it is not needed. Thanks for the details. |
|
Btw, distance calculation using thomas formula seems to be wrong in all 3 cases. |
This algorithm adds points on each existing segment in order to make segments shorter than passed
max_distance.I called itEDIT: It's calledcomplexifybut if someone has a better idea I'm open to suggestion.densify().The parameters are input geometry, output geometry, max distance and optional strategy. For now the default strategy for geographic CS is geographic, so it's different than e.g. for relops and setops where the default strategy is spherical. We should change one or the other I guess, probably the default intersection strategy because geographic distance and area strategies are used by default for geographic geometries.
Example (in- red, out- green):
Note that this is simple visualization in the coordinates space. This is why the red polygon above has straight edges (same as the green one). The library may treat them as geodesic segments depending on passed strategy. Because more points were added the edges looks like curved lines in case of green polygon.
This function may be useful for