Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 64bit floats for geoDistance calculation #58476

Closed
dadebue opened this issue Jan 3, 2024 · 14 comments · Fixed by #61848
Closed

Use 64bit floats for geoDistance calculation #58476

dadebue opened this issue Jan 3, 2024 · 14 comments · Fixed by #61848
Labels
easy task Good for first contributors unfinished code

Comments

@dadebue
Copy link

dadebue commented Jan 3, 2024

Describe the unexpected behaviour

  • The geoDistance function (and the other geographical distance functions) only uses Float32 values for the calculation
  • This doesn't provide a high enough precision for high precision positioning systems (like RTK GPS) which are able to determine the position up to ±1cm precision
  • When using the geoDistance function with close enough positions the result is always 0, which is incorrect

How to reproduce

  • most recent Clickhouse version
  • SELECT geoDistance(8.623000, 49.355000, 8.6230001, 49.3550001); returns 0
  • The exact distance is 0.01327m (link to online calculation)
    • Latitude 1: 49.3550001
    • Latitude 2: 49.3550000
    • Longitude 1: 8.6230001
    • Longitude 1: 8.6230000
  • The exact distance can be calculated with Clickhouse too when using the full haversine formula manually (returns 0.01327037431348616):
SELECT (2*atan2(sqrt((sin((49.3550001-49.3550000)*pi()/360)*sin((49.3550001-49.3550000)*pi()/360))+(cos(49.3550001*pi()/180)*cos(49.3550000*pi()/180)*sin((8.6230001-8.6230000)*pi()/360)*sin((8.6230001-8.6230000)*pi()/360))),sqrt(1-(sin((49.3550001-49.3550000)*pi()/360)*sin((49.3550001-49.3550000)*pi()/360))+(cos(49.3550001*pi()/180)*cos(49.3550000*pi()/180)*sin((8.6230001-8.6230000)*pi()/360)*sin((8.6230001-8.6230000)*pi()/360)))))*6371000 distance;

Expected behavior

  • The geoDistance function should use Float64 values for calculation to support high precision position values
  • SELECT geoDistance(8.623000, 49.355000, 8.6230001, 49.3550001); should return 0.01327

Additional context

@alexey-milovidov
Copy link
Member

Let's switch to Float64 and check if there will be any noticeable difference in performance.
I thought there would be no difference.

@alexey-milovidov alexey-milovidov added the easy task Good for first contributors label Jan 3, 2024
@dadebue
Copy link
Author

dadebue commented Jan 4, 2024

BTW this chart clearly shows the difference between the geoDistance and manual haversine formula and the lack of precision...

`Screenshot_2024-01-04_11-24-16

@geetptl
Copy link

geetptl commented Jan 14, 2024

I'd like to work on this.

@alexey-milovidov
Copy link
Member

@geetptl, thank you! This will be very helpful.

@Uzair-90
Copy link

I'm interested in working on this issue. It aligns well with my skills and interests. I've reviewed the problem, and I believe I can contribute a solution.

@alexey-milovidov
Copy link
Member

@Uzair-90, @geetptl, I don't know who will propose a working solution first, but I'd appreciate your work!

@geetptl
Copy link

geetptl commented Jan 16, 2024

Unsure how to link this PR here, but here's the link to it:

#58847

@Uzair-90
Copy link

@alexey-milovidov Thank you for the opportunity I will give it a try and highly appreciate it.

@dadebue
Copy link
Author

dadebue commented Jan 29, 2024

@alexey-milovidov What do you think about the PR from geetptl #58847?

@alexey-milovidov
Copy link
Member

@dadebue, it is unfinished - there is performance degradation.

@alexey-milovidov
Copy link
Member

#61848

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Mar 26, 2024

@dadebue, @Uzair-90, @geetptl, I've reimplemented it. As a result, the performance increased instead of being degraded as expected before. Also, I've managed to make it 100% compatible and even allow the user to control the behavior.

https://s3.amazonaws.com/clickhouse-test-reports/61848/c2209c997ca08466a595b3ff9fec6db78e244123/performance_comparison_[2_4]/report.html

@geetptl
Copy link

geetptl commented Mar 26, 2024

@alexey-milovidov It was vacation when I was working on this issue, but student life caught up to me, along with my inexperience in C++. But thank you for your guidance in the now closed PR!

@alexey-milovidov
Copy link
Member

Thank you, and it is actually your contribution :) I will add you as a co-author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy task Good for first contributors unfinished code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants