Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster method for calculating distances. #5041

Merged
merged 9 commits into from
Apr 27, 2018
Merged

Conversation

danpat
Copy link
Member

@danpat danpat commented Apr 19, 2018

Our use of the haversineDistance function is a bit trigonometry heavy. Trig functions are relatively expensive, especially when used a lot.

With work on the distances branch, we'll be performing a lot of distance calculations within matrix requests. A 100x100 table with distances without caching will spend over 50% of it's time in haversineDistnace().

This PR switches from using the haversineDistance function, to a new fccApproximateDistance function, supplied by the https://github.com/mapbox/cheap-ruler-cpp library.

Don't let the name fool you - the fccApproximateDistance method is actually more precise than Haversine, and significantly faster. For short distances (up to a handful of km), which covers almost all the geometry we care about, it has almost 0 difference to the Vincenty method.

In simple tests, this change reduced a 100x100 table request with annotations=distances from 8900ms down to about 4500ms. There's a big speedup here because there are a lot of distance calculations made. Regular /route requests are unlikely to see such a boost, as the distance calcs they do form a much smaller part of the overall request workload.

Before the change, this was the CPU profile in a table request with distances:
screen shot 2018-04-19 at 4 26 11 pm

after this change:
screen shot 2018-04-19 at 4 21 58 pm

This PR is not ready to merge, see checklist below.

Tasklist

  • Because this call is in such a hot loop for tables, it's a strong candidate for inlining - some quick test indicated that it could get the 100x100 test request down to about 4050ms, so it could be worth the refactoring it would take.
  • See if we can achieve the same cheap_ruler_cache bucket lookup without the use of lround(), which is using 2.5% of the total time in calculateEBGNAnnotations. If we have coords in integer format, simple digit truncation and multiplication should do the job.
  • Check that we have enough buckets for sufficient ruler precision. I pulled 10-degree buckets out of the air, but I don't know how much error that will introduce.
  • Deprecate haversineDistance everywhere - there is no need to use it anywhere fccApproximateDistance is faster and more accurate. greatCircleDistance may also be removable for the same reason.
  • Verify test case distances. I suspect some might change by a tiny fraction. I haven't looked at any yet.
  • Consider porting the code to our codebase, to avoid the overhead of mapbox::geometry conversion (note: this may not be necessary - the type conversion steps probably get optimized away under -O3 compilation in Release mode).
  • Do strict bounds checking on the cheap_ruler_cache - make sure we can't underflow/overflow the static array (add an assert, and double-check the math).
  • Figure out how to safely declare the cheap_ruler_cache - should it be static? How does the current declaration work when this code is linked in multiple times? Does this need to change if we try to get inlining working?
  • Add fallback calculation for long lines - the fccApprixmateDistance method has increasingly bad error as lines get longer, so we should only use it for short geometry. TODO: figure out a threshold to fall back to haversine or maybe vincenty.
  • Handle the case where (a->b) != (b->a) because a and b are in different buckets, and we use the first coordinate for bucket lookup. Maybe always use the smallest latitude value?
  • Check precision when longitude > +/- 179.9 when approximation falls apart, maybe fallback to haversine or vincenty. There are very few geometries at these longitudes, so it's probably OK to have a slow-but-accurate method for calculations here.
  • Make sure we handle crossing the dateline (+/- 180) properly.
  • CHANGELOG.md entry (How to write a changelog entry)
  • add tests (see testing documentation
  • review
  • adjust for comments
  • cherry pick to release branch

const auto lat1 = static_cast<double>(util::toFloating(coordinate_1.lat));
const auto lon2 = static_cast<double>(util::toFloating(coordinate_2.lon));
const auto lat2 = static_cast<double>(util::toFloating(coordinate_2.lat));
return cheap_ruler_cache[std::lround(lat1 / 10) + 9].distance({lon1, lat1}, {lon2, lat2});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add assertion here that bucket number is within range for our cache.

@chaupow
Copy link
Member

chaupow commented Apr 20, 2018

awesome @danpat
let me rerun the measurements with this!

@emiltin
Copy link
Contributor

emiltin commented Apr 20, 2018

sweet

@chaupow
Copy link
Member

chaupow commented Apr 20, 2018

Aaaaaand numbers are out!

What distances PR faster-distances PR
100x100 in us-west 12s 5.6s
100x100 in la max dist 20km 3.7s 800ms

@chaupow chaupow mentioned this pull request Apr 20, 2018
11 tasks
@emiltin
Copy link
Contributor

emiltin commented Apr 20, 2018

i like what i see

@chaupow chaupow self-assigned this Apr 20, 2018
@chaupow
Copy link
Member

chaupow commented Apr 20, 2018

@danpat if you're ok with it, I will take it on from here working on your tasklist? 😄

@mourner
Copy link

mourner commented Apr 24, 2018

Wow, that's a massive speedup! Happy how this is turning out :)

@cffk
Copy link

cffk commented Apr 24, 2018

This method of calculating the distance has the following problems:

  • The distance between points with longitudes ± 179.9° is (badly!) off.
  • d(p1, p2) ≠ d(p2, p1) if p1 and p2 fall in different latitude buckets; this might throw off some calculations.
  • The error in the distance is about 10% for points on latitude 55°. (It's off by a factor of 2 for points on latitude 85°.)

(For comparison, the error in the great-circle distance is about 0.5%.)

@danpat
Copy link
Member Author

danpat commented Apr 24, 2018

The error in the distance is about 10% for points on latitude 55°. (It's off by a factor of 2 for points on latitude 85°.)

@cffk Are you implying a problem with the method itself, or a bug in the implementation? Those error values are unexpectedly large.

Assuming it's correct, https://github.com/mapbox/cheap-ruler#precision shows the expected precision at various latitudes. It never comes anywhere near 10% for up to 1000km lines.

Thanks for the other notes though - I've added some items to the checklist. We're fortunate that we don't care much about being a general purpose distance calculator, and we also only care about fairly short geometry (most "segments" from OSM maps are quite short). A lot of the geometry we care about should fall in ranges where this method (supposedly) works, and we can detect and fallback to other methods for the outliers.

@cffk
Copy link

cffk commented Apr 24, 2018

@danpat The problem is the latitude binning. The distance between two points on latitude 55° uses the latitude scale factor for 60°. The main term in the latitude scale factor is cos(latitude) and cos(60d) differs from cos(55d) by about 10%.

I noticed you saying that the approximation "falls apart" for longitudes ±179.9°. This isn't the case! You just need to make sure to reduce the longitude difference to [−180°, 180°].

@mourner
Copy link

mourner commented Apr 24, 2018

@danpat yeah, rounding difference of 5 degrees is definitely too much. I'd suggest to 1) calculate the cheap ruler cache lazily; 2) bin much more granularly, e.g. 0.01 deg.

@chaupow
Copy link
Member

chaupow commented Apr 24, 2018

@mourner @cffk

yup, I'm working on this. I think @danpat just used few buckets here to see whether it works at all and see the speed gain.

I am adding more buckets (currently at 180, with .5 diff of latitude but still doing Q&A) and the results seem nice. Thanks for flagging this 😄

Copy link

@cffk cffk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend

  • centering the bins (now the N and S polar regions are treated the same)
  • halving the number of bins using symmetry

Thus, in constructor

step(90.0 / number_of_rulers)
cheap_ruler_cache[n] = mapbox::cheap_ruler::CheapRuler(step * (n + 0.5), mapbox::cheap_ruler::CheapRuler::Meters);

and in getRuler

return cheap_ruler_cache[std::min((int)std::floor(abs(lat)/step), (int)cheap_ruler_cache.size() - 1)];

and instantiate with

CheapRulerContainer cheap_ruler_container(1800);

Finally, in the distance function invoke with the mean latitude

return cheap_ruler_container.getRuler(0.5 * (lat1 + lat2)).distance({lon1, lat1}, {lon2, lat2});

@cffk
Copy link

cffk commented Apr 25, 2018

FYI, I submitted a pull request to cheap-ruler-cpp to fix the longitude wrapping bug.

@chaupow
Copy link
Member

chaupow commented Apr 25, 2018

@cffk thanks so much! I adjusted the changes you suggested in the last commit.

@cffk
Copy link

cffk commented Apr 25, 2018

A couple of additional comments:

  • I submitted another pull request for cheap-ruler-cpp to improve the accuracy and to let you switch to the WGS84 ellipsoid (from Clarke 1866).
  • To minimize the errors for a given number of buckets, you would distribute the buckets according to cos(lat). Since this would slow down selecting the bucket, I don't think there's any way to take advantage of this.

@oxidase
Copy link
Contributor

oxidase commented Apr 26, 2018

@chaupow std::experimental::optional is not supported in MSVC2015. I changed it to mapbox::optional in feature.hpp and pushed a commit to make a CI check.

@TheMarex
Copy link
Member

Pushed some optimizations that removed the rounding and replaced it with integer math, which yielded a nice 5% speedup.

Measurements for 10x10 on Berlin:

image

Copy link
Member

@TheMarex TheMarex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Did some performance measurements and they are looking great!

std::vector<mapbox::cheap_ruler::CheapRuler> cheap_ruler_cache;
const double step;
};
CheapRulerContainer cheap_ruler_container(1800);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be marked as static.

mapbox::cheap_ruler::CheapRuler &getRuler(const FixedLatitude lat)
{
BOOST_ASSERT(step > 1);
auto bin = (std::abs(static_cast<int>(lat)) - 1) / step;
Copy link
Member Author

@danpat danpat Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the request is for lat = 0 here? This'll could lead to bin = -1 if step is only 2. Or will integer division guarantee this is always 0 or bigger? Either way, an assert that bin >= 0 && bin < cheap_ruler_cache.size() should probably be here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Didn't realize rounding direction was not actually defined for 1/2.

| a | 0 | 100 |
| b | 100 | 0 |
| | a | b |
| a | 0 | 100+-1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere (like in the "When I route I should get " tests) the syntax used is "100m +-3", rather than "100+-3". For consistency it might be good to follow this syntax.

@TheMarex TheMarex merged commit ae805f9 into master Apr 27, 2018
@TheMarex TheMarex deleted the faster_distances branch April 27, 2018 03:21
@chaupow
Copy link
Member

chaupow commented Apr 27, 2018

Awesome, thanks a ton @oxidase !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants