Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COUNT DISTINCT takes much longer than DISTINCT and crashes OSM Planet #1082

Open
hannahbast opened this issue Sep 4, 2023 · 0 comments
Open

Comments

@hannahbast
Copy link
Member

hannahbast commented Sep 4, 2023

This query materializes all OSM geometries. The query execution tree is an INDEX SCAN followed by a DISTINCT. It takes around 2 seconds on OSM Germany and around 50 seconds on OSM Planet.

https://qlever.cs.uni-freiburg.de/osm-germany/jO4LF9

This query returns the size of that result. The query execution tree is an INDEX SCAN followed by a GROUP BY. It takes around 20 seconds on OSM Germany and crashes the server for OSM Planet.

https://qlever.cs.uni-freiburg.de/osm-germany/r4ICBq

Here is a workaround, which just takes the first query as subquery and counts the result.

https://qlever.cs.uni-freiburg.de/osm-germany/7H9cJp

hannahbast pushed a commit to ad-freiburg/qlever-petrimaps that referenced this issue Sep 4, 2023
The previous query used `COUNT(DISTINCT ...)`, which amounts to a `GROUP
BY`, which is rather slow and crashed the latest version of OSM Planet.
Now just use the "all distinct geometries" query (which we have to
compute anyway), as subquery of a simple `COUNT`.

See also ad-freiburg/qlever#1082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant