Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Mercator projection #2397

Merged
merged 10 commits into from
Nov 1, 2020
Merged

Update Mercator projection #2397

merged 10 commits into from
Nov 1, 2020

Conversation

cffk
Copy link
Contributor

@cffk cffk commented Oct 26, 2020

Introduction

The existing formulation for the Mercator projection is
"satisfactory"; it is reasonably accurate. However for a core
projection like Mercator, I think we should strive for full double
precision accuracy.

This commit uses cleaner, more accurate, and faster methods for
computing the forward and inverse projections. These use the
formulation in terms of hyperbolic functions that are manifestly odd
in latitude

psi = asinh(tan(phi)) - e * atanh(e  * sin(phi))

(phi = latitude; psi = isometric latitude = Mercator y coordinate).
Contrast this with the existing formulation

psi = log(tan(pi/4 - phi/2))
      - e/2 * log((1 + e * sin(phi)) / (1 - e * sin(phi)))

where psi(-phi) isn't exactly equal to -psi(phi) and psi(0) isn't
guaranteed to be 0.

Implementation

There's no particular issue implementing the forward projection, just
apply the formulas above. The inverse projection is tricky because
there's no closed form solution for the inverse. The existing code
for the inverse uses an iterative method from Snyder. This is the
usual hokey function iteration, and, as usual, the convergence rate is
linear (error reduced by a constant factor on each iteration). This
is OK (just) for low accuracy work. But nowadays, something with
quadratic convergence (e.g., Newton's method, number of correct digits
doubles on each iteration) is preferred (and used here). More on this
later.

The solution for phi(psi) I use is described in my TM paper and I
lifted the specific formulation from GeographicLib's Math::tauf, which
uses the same underlying machinery for all conformal projections. It
solves for tan(phi) in terms of sinh(psi) which as a near identity
mapping is ideal for Newton's method.

For comparison I also look at the approach adopted by Poder + Engsager
in their TM paper and implemented in etmerc. This uses trigonometric
series (accurate to n^6) to convert phi <-> chi. psi is then given by

psi = asinh(tan(chi))

Accuracy

I tested just the routines for transforming phi <-> psi from merc.cpp
and measured the errors (converted to true nm = nanometers) for the
forward and inverse mapping. I also included in my analysis the
method used by etmerc. This uses a trigonometric series to convert
phi <-> chi = atan(sinh(psi)), the conformal latitude.

            forward      inverse
            max  rms     max    rms
old merc    3.60 0.85 2189.47 264.81
  etmerc    1.82 0.38    1.42   0.37
new merc    1.83 0.30    2.12   0.31

1 nm is pretty much the absolute limit for accuracy in double
precision (1 nm = 10e6 m / 2^53, approximately), and 5 nm is probably
the limit on what you should routinely expect. So the old merc
inverse is considerably less accurate that it could be. The old merc
forward is OK on accuracy -- except that if does not preserve the
parity of the projection.

The accuracy of etmerc is fine (the truncation error of the 6th order
series is small compared with the round-off error). However,
situation reverses as the flattening is increased. E.g., at f =
1/150, the max error for the inverse projection is 8 nm. etmerc is OK
for terrestrial applications, but couldn't be used for Mars.

Timing

Here's what I get with g++ -O3 on various Linux machines with recent
versions of g++. As always, you should take these with a grain of
salt. You might expect the relative timings to vary by 20% or so when
switching between compilers/machines. Times per call in ns =
nanoseconds.

           forward inverse
old merc     121      360
  etmerc    4e-6      1.4
new merc      20      346

The new merc method is 6 times faster at the forward projection and
modestly faster at the inverse projection (despite being more
accurate). The latter result is because it only take 2 iterations of
Newton's method to get full accuracy compared with an average of 5
iterations for the old method to get only um accuracy.

A shocking aspect of these timings is how fast etmerc is. Another is
that forward etmerc is streaks faster that inverse etmerc (it made be
doubt my timing code). Evidently, asinh(tan(chi)) is a lot faster to
compute than atan(sinh(psi)). The hesitation about adopting etmerc
then comes down to:

  • the likelihood that Mercator may be used for non-terrestrial
    bodies;

  • the question of whether the timing benefits for the etmerc method
    would be noticeable in a realistic application;

  • need to duplicate the machinery for evaluating the coefficients
    for the series and for Clenshaw summation in the current code
    layout.

More discussion follows

  • Tests added
  • Added clear title that can be used to generate release notes
  • Fully documented, including updating docs/source/*.rst for new API

Introduction
------------

The existing formulation for the Mercator projection is
"satisfactory"; it is reasonably accurate.  However for a core
projection like Mercator, I think we should strive for full double
precision accuracy.

This commit uses cleaner, more accurate, and faster methods for
computing the forward and inverse projections.  These use the
formulation in terms of hyperbolic functions that are manifestly odd
in latitude

    psi = asinh(tan(phi)) - e * atanh(e  * sin(phi))

(phi = latitude; psi = isometric latitude = Mercator y coordinate).
Contrast this with the existing formulation

    psi = log(tan(pi/4 - phi/2))
          - e/2 * log((1 + e * sin(phi)) / (1 - e * sin(phi)))

where psi(-phi) isn't exactly equal to -psi(phi) and psi(0) isn't
guaranteed to be 0.

Implementation
--------------

There's no particular issue implementing the forward projection, just
apply the formulas above.  The inverse projection is tricky because
there's no closed form solution for the inverse.  The existing code
for the inverse uses an iterative method from Snyder.  This is the
usual hokey function iteration, and, as usual, the convergence rate is
linear (error reduced by a constant factor on each iteration).  This
is OK (just) for low accuracy work.  But nowadays, something with
quadratic convergence (e.g., Newton's method, number of correct digits
doubles on each iteration) is preferred (and used here).  More on this
later.

The solution for phi(psi) I use is described in my TM paper and I
lifted the specific formulation from GeographicLib's Math::tauf, which
uses the same underlying machinery for all conformal projections.  It
solves for tan(phi) in terms of sinh(psi) which as a near identity
mapping is ideal for Newton's method.

For comparison I also look at the approach adopted by Poder + Engsager
in their TM paper and implemented in etmerc.  This uses trigonometric
series (accurate to n^6) to convert phi <-> chi.  psi is then given by

    psi = asinh(tan(chi))

Accuracy
--------

I tested just the routines for transforming phi <-> psi from merc.cpp
and measured the errors (converted to true nm = nanometers) for the
forward and inverse mapping.  I also included in my analysis the
method used by etmerc.  This uses a trigonometric series to convert
phi <-> chi = atan(sinh(psi)), the conformal latitude.

		forward      inverse
		max  rms     max    rms
    old merc    3.60 0.85 2189.47 264.81
      etmerc    1.82 0.38    1.42   0.37
    new merc    1.83 0.30    2.12   0.31

1 nm is pretty much the absolute limit for accuracy in double
precision (1 nm = 10e6 m / 2^53, approximately), and 5 nm is probably
the limit on what you should routinely expect.  So the old merc
inverse is considerably less accurate that it could be.  The old merc
forward is OK on accuracy -- except that if does not preserve the
parity of the projection.

The accuracy of etmerc is fine (the truncation error of the 6th order
series is small compared with the round-off error).  However,
situation reverses as the flattening is increased.  E.g., at f =
1/150, the max error for the inverse projection is 8 nm.  etmerc is OK
for terrestrial applications, but couldn't be used for Mars.

Timing
------

Here's what I get with g++ -O3 on various Linux machines with recent
versions of g++.  As always, you should take these with a grain of
salt.  You might expect the relative timings to vary by 20% or so when
switching between compilers/machines.  Times per call in ns =
nanoseconds.

               forward inverse
    old merc     121      360
      etmerc    4e-6      1.4
    new merc      20      346

The new merc method is 6 times faster at the forward projection and
modestly faster at the inverse projection (despite being more
accurate).  The latter result is because it only take 2 iterations of
Newton's method to get full accuracy compared with an average of 5
iterations for the old method to get only um accuracy.

A shocking aspect of these timings is how fast etmerc is.  Another is
that forward etmerc is streaks faster that inverse etmerc (it made be
doubt my timing code).  Evidently, asinh(tan(chi)) is a lot faster to
compute than atan(sinh(psi)).  The hesitation about adopting etmerc
then comes down to:

  * the likelihood that Mercator may be used for non-terrestrial
    bodies;

  * the question of whether the timing benefits for the etmerc method
    would be noticeable in a realistic application;

  * need to duplicate the machinery for evaluating the coefficients
    for the series and for Clenshaw summation in the current code
    layout.

Ripple effects
==============

The Mercator routines used the the Snyder method, pj_tsfn and pj_phi2,
are used in other projections.  These relate phi to t = exp(-psi) (a
rather bizarre choice in my book).  I've retrofitted these to use the
more accurate methods.  These do the "right thing" for phi in [-pi/2,
pi/2] , t in [0, inf], and e in [0, 1).  NANs are properly handled.

Of course, phi = pi/2 in double precision is actually less than pi/2,
so cos(pi/2) > 0.  So no special handling is needed for pi/2.  Even if
angles were handled in such a way that 90deg were exactly represented,
these routines would still "work", with, e.g., tan(pi/2) -> inf.

(A caution: with long doubles = a 64-bit fraction, we have cos(pi/2) <
0; and now we would need to be careful.)

As a consequence, there no need for error handling in pj_tsfn; the
HUGE_VAL return has gone and, of course, HUGE_VAL is a perfectly legal
input to tsfn's inverse, phi2, which would return -pi/2.  This "error
handling" was only needed for e = 1, a case which is filtered out
upstream.  I will note that bad argument handling is much more natural
using NAN instead of HUGE_VAL.  See issue OSGeo#2376

I've renamed the error condition for non-convergence of the inverse
projection from "non-convergent inverse phi2" to "non-convergent
sinh(psi) to tan(phi)".

Now that pj_tsfn and pj_phi2 now return "better" results, there were
some malfunctions in the projections that called them, specifically
gstmerc, lcc, and tobmerc.

  * gstmerc invoked pj_tsfn(phi, sinphi, e) with a value of sinphi
    that wasn't equal to sin(phi).  Disaster followed.  I fixed this.
    I also replaced numerous occurrences of "-1.0 * x" by "-x".
    (Defining a function with arguments phi and sinphi is asking for
    trouble.)

  * lcc incorrectly thinks that the projection isn't defined for
    standard latitude = +/- 90d.  This happens to be false (it reduces
    to polar stereographic in this limit).  The check was whether
    tsfn(phi) = 0 (which only tested for the north pole not the south
    pole).  However since tsfn(pi/2) now (correctly) returns a nonzero
    result, this test fails.  I now just test for |phi| = pi/2.  This
    is clearer and catches both poles (I'm assuming that the current
    implementation will probably fail in these cases).

  * tobmerc similarly thinks that phi close to +/- pi/2 can't be
    transformed even though psi(pi/2) is only 38.  I'm disincline to
    fight this.  However I did tighten up the failure condition
    (strict equality of |phi| == pi/2).

OTHER STUFF
===========

Testing
-------

builtins.gei: I tightened up the tests for merc (and while I was about
it etmerc and tmerc) to reflect full double precision accuracy.  My
test values are generated with MPFR enabled code and so should be
accurate to all digits given.  For the record, for GRS80 I use f =
1/298.2572221008827112431628366 in these calculations.

pj_phi2_test: many of the tests were bogus testing irrelevant input
parameters, like negative values of exp(-psi), and freezing in the
arbitrary behavior of phi2.  I've reworked most for the tests to be
semi-useful.  @schwehr can you review.

Documentation
-------------

I've updated merc.rst to outline the calculation of the inverse
projection.

phi2.cpp includes detailed notes about applying Newton's method to
find tan(phi) in terms of sinh(psi).

Future work
-----------

lcc needs some tender loving care.  It can easily (and should) be
modified to allow stdlat = +/- 90 (reduces to polar stereographic),
stdlat = 0 and stdlat_1 + stdlat_2 = 0 (reduces to Mercator).  A
little more elbow grease will allow the treatment of stdlat_1 close to
stdlat_2 using divided differences.  (See my implementation of the
LambertConformalConic class in GeographicLib.)

All the places where pj_tsfn and pj_phi2 are called need to be
reworked to cut out the use of Snyder's t = exp(-psi() variable and
instead use sinh(psi).

Maybe include the machinery for series conversions between all
auxiliary latitudes as "support functions".  Then etmerc could use
this (as could mlfn for computing meridional distance).  merc could
offer the etmerc style projection via chi as an option when the
flattening is sufficiently small.
@cffk
Copy link
Contributor Author

cffk commented Oct 26, 2020

Ripple effects

The Mercator routines used the the Snyder method, pj_tsfn and pj_phi2,
are used in other projections. These relate phi to t = exp(-psi) (a
rather bizarre choice in my book). I've retrofitted these to use the
more accurate methods. These do the "right thing" for phi in [-pi/2,
pi/2] , t in [0, inf], and e in [0, 1). NANs are properly handled.

Of course, phi = pi/2 in double precision is actually less than pi/2,
so cos(pi/2) > 0. So no special handling is needed for pi/2. Even if
angles were handled in such a way that 90deg were exactly represented,
these routines would still "work", with, e.g., tan(pi/2) -> inf.

(A caution: with long doubles = a 64-bit fraction, we have cos(pi/2) <
0; and now we would need to be careful.)

As a consequence, there no need for error handling in pj_tsfn; the
HUGE_VAL return has gone and, of course, HUGE_VAL is a perfectly legal
input to tsfn's inverse, phi2, which would return -pi/2. This "error
handling" was only needed for e = 1, a case which is filtered out
upstream. I will note that bad argument handling is much more natural
using NAN instead of HUGE_VAL. See issue #2376

I've renamed the error condition for non-convergence of the inverse
projection from "non-convergent inverse phi2" to "non-convergent
sinh(psi) to tan(phi)".

Now that pj_tsfn and pj_phi2 now return "better" results, there were
some malfunctions in the projections that called them, specifically
gstmerc, lcc, and tobmerc.

  • gstmerc invoked pj_tsfn(phi, sinphi, e) with a value of sinphi
    that wasn't equal to sin(phi). Disaster followed. I fixed this.
    I also replaced numerous occurrences of "-1.0 * x" by "-x".
    (Defining a function with arguments phi and sinphi is asking for
    trouble.)

  • lcc incorrectly thinks that the projection isn't defined for
    standard latitude = +/- 90d. This happens to be false (it reduces
    to polar stereographic in this limit). The check was whether
    tsfn(phi) = 0 (which only tested for the north pole not the south
    pole). However since tsfn(pi/2) now (correctly) returns a nonzero
    result, this test fails. I now just test for |phi| = pi/2. This
    is clearer and catches both poles (I'm assuming that the current
    implementation will probably fail in these cases).

  • tobmerc similarly thinks that phi close to +/- pi/2 can't be
    transformed even though psi(pi/2) is only 38. I'm disincline to
    fight this. However I did tighten up the failure condition
    (strict equality of |phi| == pi/2).

@cffk
Copy link
Contributor Author

cffk commented Oct 26, 2020

OTHER STUFF

Testing

builtins.gei: I tightened up the tests for merc (and while I was about
it etmerc and tmerc) to reflect full double precision accuracy. My
test values are generated with MPFR enabled code and so should be
accurate to all digits given. For the record, for GRS80 I use f =
1/298.2572221008827112431628366 in these calculations.

pj_phi2_test: many of the tests were bogus testing irrelevant input
parameters, like negative values of exp(-psi), and freezing in the
arbitrary behavior of phi2. I've reworked most for the tests to be
semi-useful. @schwehr can you review.

Documentation

I've updated merc.rst to outline the calculation of the inverse
projection.

phi2.cpp includes detailed notes about applying Newton's method to
find tan(phi) in terms of sinh(psi).

Future work

lcc needs some tender loving care. It can easily (and should) be
modified to allow stdlat = +/- 90 (reduces to polar stereographic),
stdlat = 0 and stdlat_1 + stdlat_2 = 0 (reduces to Mercator). A
little more elbow grease will allow the treatment of stdlat_1 close to
stdlat_2 using divided differences. (See my implementation of the
LambertConformalConic class in GeographicLib.)

All the places where pj_tsfn and pj_phi2 are called need to be
reworked to cut out the use of Snyder's t = exp(-psi() variable and
instead use sinh(psi).

Maybe include the machinery for series conversions between all
auxiliary latitudes as "support functions". Then etmerc could use
this (as could mlfn for computing meridional distance). merc could
offer the etmerc style projection via chi as an option when the
flattening is sufficiently small.

test/unit/pj_phi2_test.cpp Outdated Show resolved Hide resolved
test/unit/pj_phi2_test.cpp Outdated Show resolved Hide resolved
test/unit/pj_phi2_test.cpp Outdated Show resolved Hide resolved
src/tsfn.cpp Outdated Show resolved Hide resolved
@schwehr
Copy link
Member

schwehr commented Oct 26, 2020

For pj_get_default_ctx, the test seem fine. The tests I wrote long ago were done without a deep understanding of the code. Just some style comments. I'm not a fan of some of things like converting 1.0 to 1 causing a hidden cast. These aren't critical by any means. I just always try to keep conversions to the minimum and const / constexpr as much as possible. It makes debugging easier and every once in a while it finds a sneaky bug. I'm unusual in that I'm usually debugging through massive apps that I often don't know and these things probably matter to me far more than most.

src/phi2.cpp Outdated Show resolved Hide resolved
src/phi2.cpp Outdated
// if (e2m < 0) return std::numeric_limits<double>::quiet_NaN();
int i = numit;
for (; i; --i) {
double tau1 = sqrt(1 + tau * tau),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these are const, yes? I've seen too many sneaky things over the years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went around the block a few times on this particular question a while back. I know that I can make these variable const. But that's not how I think of them which is "a bunch of variable I need in this do loop". const elevates them in importance so now I have to ask: "gee, is there something I need to watch out for?" So in my mind, I reserve the const declaration for long-lived immutable quantities; so maybe I would agree with you if the boday of the for loop was 50 lines or more. This is all a matter of taste, of course. The compiler will produce the same code in either case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to not make them const, but I take issue with const declaration for long-lived immutable quantities and elevates them in importance. e.g. This says I'm iterating over something and I'm not mutating it. It's definitely not important.

for (const auto& item : container) {
  // short bit of code.
}

@@ -106,21 +106,21 @@ PJ *PROJECTION(lcc) {
double ml1, m1;

m1 = pj_msfn(sinphi, cosphi, P->es);
ml1 = pj_tsfn(Q->phi1, sinphi, P->e);
if( ml1 == 0 ) {
if( abs(Q->phi1) == M_HALFPI ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those checks were added because ossfuzz came with odd PROJ strings (mostly degenerate ellipsoids) where ml1 would evaluate to 0 but abs(phi1). I don't recall if phi1 was close to half pi. Same for ml2 below

I'm wondering how exact equality testing against half pi is useful ? It is generally considered not a good idea to use exact equality for floating point numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "normal" way ml is zero is with phi = pi/2. phi = -pi/2 which from
the perspective of projections should be the equivalent situation gives
ml = inf (or maybe just a large number). This is why the test is best
done on phi instead of ml.

It's likely that many things come apart the very degenerate ellipsoids.
I can see the need for an up front check on the flattening (e.g., f <
0.99) and that would probably be preferable to junking up the code with

The mantra that "comparing floating point numbers for equality is bad"
is now somewhat outdated. Since IEEE floating point has become the
norm, floating point arithmetic is much more predictable (and accurate)
and equality tests are often OK. That's the case here, I would say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the need for an up front check on the flattening (e.g., f <
0.99) and that would probably be preferable to junking up the code with

Sounds good

equality tests are often OK. That's the case here, I would say.

definitely trusting you on this

static PJ_XY tobmerc_s_forward (PJ_LP lp, PJ *P) { /* Spheroidal, forward */
PJ_XY xy = {0.0, 0.0};
double cosphi;

if (fabs(fabs(lp.phi) - M_HALFPI) <= EPS10) {
if (fabs(lp.phi) >= M_HALFPI) {
// builtins.gie tests "Test expected failure at the poles:". However
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the test could be amended/removed if we have better accuracy now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is possible. Best would be to have someone review the code for tobmerc and make a decision on how the near pole situation should be handled (and document this). An implementation of the ellipsoidal version would be easy (however, it's clear that the projection is there "to make a point" and not for serious use).

// given that M_HALFPI is strictly less than pi/2 in double precision,
// it's not clear why shouldn't just return a large result for xy.y (and
// it's not even that large, merely 38.025...). Even if the logic was
// such that phi was strictly equal to pi/2, allowing xy.y = inf would be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure returning inf as a valid value is a good idea (especially since it is our HUGE_VAL error marker).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes #2376 more relevant. Would help differentiate between HUGE_VAL that are valid versus invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @snowman2. However, the issue is moot. An inf return isn't on the cards in the present code base, because there's no way that phi can be set to exactly 90deg. The closest is round(pi/2) which then gives asinh(tan(phi)) = 38, a distinctly non-large number. Things would change, if the representation of angles were changed (e.g., after we get sinpi and cospi functions) so that the cardinal points were exactly represented. I don't see this happening in the short term.


return (tan (.5 * (M_HALFPI - phi)) /
pow((1. - sinphi) / (denominator), .5 * e));
double cosphi = cos(phi);
Copy link
Member

@rouault rouault Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we need cos(phi) now then we should probably alter the signature of pj_tsfn to provide this value. This will enable callers to implicitly use sincos() to compute both sin(phi) and cos(phi) faster than sin(phi) and cos(phi) in a isolated way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about this. I recommend putting a higher priority on removing the calls to tsfn and phi2 by reformulating the algorithms not to use Snyder's flakey t = exp(-psi) variable (which leads to oddities like the N and S poles being treated differently.

src/phi2.cpp Outdated Show resolved Hide resolved
src/phi2.cpp Outdated
// if (e2m < 0) return std::numeric_limits<double>::quiet_NaN();
int i = numit;
for (; i; --i) {
double tau1 = sqrt(1 + tau * tau),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. splitting onto several lines would make it more readable

src/phi2.cpp Outdated Show resolved Hide resolved
*
* returns atan(sinpsi2tanphi(tau'))
***************************************************************************/
return atan(pj_sinhpsi2tanphi(ctx, (1/ts0 - ts0) / 2, e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember I optimized recently pj_phi2() to use much less transcendent functions to have speed ups (#2052). At first sight, it looks like the new implementation will use more. Any idea of the perf impact ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry I didn't read your cover letter where you explain that the faster convergence makes the computation faster globally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cutting down on the number of iterations is the key. Also, I wonder... should pow count as 2 transcendental calls to log then exp? That then matches (sort of) the new code atanh (log'ish) then sinh (exp'ish).

@rouault
Copy link
Member

rouault commented Oct 26, 2020

A shocking aspect of these timings is how fast etmerc is.

Are you really sure about your timings ? 4e-6 ns per call for etmerc looks completely bogus to me.

@cffk
Copy link
Contributor Author

cffk commented Oct 26, 2020

A shocking aspect of these timings is how fast etmerc is.

Are you really sure about your timings ? 4e-6 ns per call for etmerc looks completely bogus to me.

I agree. I looked long an hard at the timing code and can't fathom what's happening. Perhaps is screwing up somehow. Any this quirk of timing (affecting the etmerc method) is a side issue for now.

@rouault
Copy link
Member

rouault commented Oct 26, 2020

I looked long an hard at the timing code and can't fathom what's happening

Can you show your timing code ? Isn't it possible if you put it too close of the etmerc function and repeat the measurement on the same input value that the compiler is smart enough to realize that the output is the same without side effects and just completely remove the loop ?

@cffk
Copy link
Contributor Author

cffk commented Oct 26, 2020

I looked long an hard at the timing code and can't fathom what's happening

Can you show your timing code ? Isn't it possible if you put it too close of the etmerc function and repeat the measurement on the same input value that the compiler is smart enough to realize that the output is the same without side effects and just completely remove the loop ?

Even, you're right. The compiler outsmarted me. I know most of its
tricks that can give you bogus timings. I had

initialize big arrays of starting values for phi/psi

double xx = 0;

start timer1
loop over phi
  xx += psif(phi[i]);
stop timer1

start timer2
loop over psi
  xx += phif(psi[i]);
stop timer2

The "xx +=" is there to make the compiler actually call the function.
HOWEVER!!!, I neglected to include at the end

cout << "IGNORE " << xx << "\n";

So the compiler was skipping some of the function calls. (Not all of
them! because some has mutable variables for counting the number of
iterations.) I will post updated timing data...

@rouault
Copy link
Member

rouault commented Oct 26, 2020

So the compiler was skipping some of the function calls.

ok, that's reassuring.

Side though (no action needed for this PR): perhaps we should have a reference benchmark program ? It could use proj_trans() so as to be realistic about the timings users will get when using the API.

@cffk
Copy link
Contributor Author

cffk commented Oct 26, 2020

Timing UPDATED

Sorry the previous timing data was wrong. Here are corrected values..

Here's what I get with g++ -O3 on two Linux machines with recent
versions of g++. As always, you should take these with a grain of
salt. Times per call in ns = nanoseconds.

            Fedora 31    Ubuntu 18 
            g++-9.3.1    g++-7.5.0
            fwd   inv    fwd   inv
old merc    207   461    217   522
new merc    228   457    168   410
  etmerc    212   196    174   147

The new forward method is the 10% slower (resp 20% faster) on Fedora
31 (resp Ubuntu 18). The new inverse method is the same speed (resp
20% faster) on Fedora 31 (resp Ubuntu 18).

Roughly speaking the speed comparison is a wash. Maybe we should pay
attention more to the Fedora 31 results since these are with a newer
version of the compiler. I would still make the argument that a 20%
time penalty (which in a full PROJ pipeline would probably be no more
than a 5% penalty) would be a worthwhile price to pay for a more
robust implementation of the projection.

Also some corrected information...

Timing UPDATED
--------------

Sorry the previous timing data was wrong.  Here are corrected values..

Here's what I get with g++ -O3 on two Linux machines with recent
versions of g++.  As always, you should take these with a grain of
salt.  Times per call in ns = nanoseconds.

                Fedora 31    Ubuntu 18
                g++-9.3.1    g++-7.5.0
                fwd   inv    fwd   inv
    old merc    207   461    217   522
    new merc    228   457    168   410
      etmerc    212   196    174   147

The new forward method is the 10% slower (resp 20% faster) on Fedora
31 (resp Ubuntu 18).  The new inverse method is the same speed (resp
20% faster) on Fedora 31 (resp Ubuntu 18).

Roughly speaking the speed comparison is a wash.  Maybe we should pay
attention more to the Fedora 31 results since these are with a newer
version of the compiler.  I would still make the argument that a 20%
time penalty (which in a full PROJ pipeline would probably be no more
than a 5% penalty) would be a worthwhile price to pay for a more
robust implementation of the projection.
@cffk
Copy link
Contributor Author

cffk commented Oct 26, 2020

I'm not a fan of some of things like converting 1.0 to 1 causing a hidden cast. These aren't critical by any means.

I come from the world of scientific writing where 1.0 is hardly ever used to represent 1 and, likewise, 0.5 is not typically used to represent 1/2. So I normally like to see sqrt(1 + x*x) because it's closer to the expression given in the paper. (Of course, I then have to remember to do 1/2.0. Life is full of little compromises.)

xy.x = P->k0 * lp.lam;
xy.y = - P->k0 * log(pj_tsfn(lp.phi, sin(lp.phi), P->e));
xy.y = P->k0 * (asinh(tan(lp.phi)) - P->e * atanh(P->e * sin(lp.phi)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the following wouldn't recover some speed:

Suggested change
xy.y = P->k0 * (asinh(tan(lp.phi)) - P->e * atanh(P->e * sin(lp.phi)));
const double cosphi = cos(lp.phi);
const double sinphi = sin(lp.phi);
xy.y = P->k0 * (asinh(sinphi/cosphi) - P->e * atanh(P->e * sinphi));

This should be optimized as a sincos() call instead of tan() + sin(). But perhaps the numerical accuracy of computing tan() this way is questionable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this speeds the forward projection up 20% - 30%. The accuracy is hardly affected: rms error increases from 0.30 nm to 0.33 nm, max error increases from 1.83 nm to 1.84 nm (a barely noticeable degradation). So now we have a modest speed up overall for the new method. Thanks.

BTW, I know gcc has this sincos optimization. Does Visual Studio have it too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Visual Studio have it too?

Just found https://github.com/MicrosoftDocs/cpp-docs/blob/master/docs/overview/what-s-new-for-visual-cpp-in-visual-studio.md which mentions "Some of the compiler optimizations are brand new, such as [...] the combining of calls sin(x) and cos(x) into a new sincos(x)" since Visual Studio 2017 version 15.5

Times per call in ns = nanoseconds.

                Fedora 31    Ubuntu 18
                g++-9.3.1    g++-7.5.0
                fwd   inv    fwd   inv
    old merc    207   461    217   522
    new merc    159   457    137   410
      etmerc    212   196    174   147

The new forward method is now 25% faster (resp 35% faster) on Fedora
31 (resp Ubuntu 18).  The new inverse method is the same speed (resp 20%
faster) on Fedora 31 (resp Ubuntu 18).

The accuracy is hardly affected: rms error increases from 0.30 nm to
0.33 nm, max error increases from 1.83 nm to 1.84 nm (a barely
noticeable degradation).
src/phi2.cpp Outdated Show resolved Hide resolved
Copy link
Member

@kbevers kbevers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charles, this PR is very impressive. I appreciate the effort that has gone into this.

I'm a bit late to the game so most everything has been mentioned and dealt with already. Good job by everyone. Here's my few comments, that mostly just reiterate what's been said already:

  • I agree with Kurt and Even that multiple variable initialization in one statement is not a good practice.
  • I am mostly indifferent regarding 1.0 vs 1. The "scientific writing" approach is simpler to read and I trust that compilers are clever enough to handle the implicit conversions correctly, so I have a sligt preference for that.
  • I am slightly worried that changing the wording of a error message might cause problems downstream. This is only a concern though if this PR is sneaked in 7.2.0RC2 tomorrow.

Regarding that last point, do you feel this has been tested enough to be included in 7.2.0? I am happy to include it so that these improvements will be available right away (next chance is in six months).

src/phi2.cpp Outdated Show resolved Hide resolved
test/gie/builtins.gie Outdated Show resolved Hide resolved
test/unit/pj_phi2_test.cpp Outdated Show resolved Hide resolved
test/unit/pj_phi2_test.cpp Outdated Show resolved Hide resolved
@cffk
Copy link
Contributor Author

cffk commented Oct 27, 2020

Regarding that last point, do you feel this has been tested enough to be included in 7.2.0? I am happy to include it so that these improvements will be available right away (next chance is in six months).

Let's not rush this into 7.2.0. No bugs are being fixed. It's just part of an ongoing effort to improve PROJ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants