Approximate tmerc (Snyder): speed optimizations #2039
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
fwd: 7% faster on Core-i7@2.6GHz (with FMA triggered), 22% faster on GCE Xeon@2GHz (with FMA)
inv: 31% faster on Core-i7@2.6GHz (with FMA triggered), 60% faster on GCE Xeon@2GHz (with FMA)
The optimizations consists in different things:
Binaries are generated with the standard instruction set (SSE/SSE2),
and with one variant with FMA, and the appropriate version is selected automatically
at runtime. This gives a modest speedup, but measurable. The speedup is more
obvious on lower clocked CPU.
by observing that the argument changes in modest way at each iteration,
and using approximation of sin()/cos(). The differences due to that approximation
are way below the 1e-11 tolerance threshold.
Different in results are neglectable (only found in areas where the approximations
of the Snyder formulas are already no longer valid)
Before:
$ echo 8e5 9e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
799997.896522093331 8999999.520601103082
$ echo 8e5 5e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
800000.000007762224 4999999.999971268699
$ echo 18e5 9e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
1079182.990696100984 8661150.574729491025
$ echo 18e5 5e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
1799997.510861013783 4999999.567328464240
After:
$ echo 8e5 9e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
799997.896522093331 8999999.520601103082
$ echo 8e5 5e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
800000.000007762224 4999999.999971268699
$ echo 18e5 9e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
1079182.990696124267 8661150.574729502201
$ echo 18e5 5e6 | src/proj -d 12 +proj=utm +zone=31 -I +approx | src/proj -d 12 +proj=utm +zone=31 +approx
1799997.510861013783 4999999.567328464240