Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL query planning time metric #12923

Merged
merged 1 commit into from
Aug 22, 2022

Conversation

rohangarg
Copy link
Member

Adds sqlQuery/planningTimeMs metric per SQL query which computes the time it takes to build a native query from the SQL query.
While comparing the perf of native to SQL API for the same query, users tend to check the extra time taken by planning stage of the SQL. Currently, that time is estimated as the difference between running the SQL and then running the native query using EXPLAIN. To find the exact difference, the query/time metrics of both SQL and native queries are compared.
Further, for SQL query workloads there is no easy visiblity into the holistic planning time.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.


final Map<String, Object> statsMap = new LinkedHashMap<>();
statsMap.put("sqlQuery/time", TimeUnit.NANOSECONDS.toMillis(queryTimeNs));
statsMap.put("sqlQuery/planningTimeMs", TimeUnit.NANOSECONDS.toMillis(planningTimeNanos));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious that if this metric is emitted in milliseconds, why do we need to hold it in nanosecond in the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that it's because of the better accuracy.

2089/1000 - 1989/1000 = 1
(2089 - 1989) / 1000 = 0

0 seems better here given the minuscule difference between two values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, my thinking was that given it doesn't need any extra space, it is better to store the time in the highest resolution. Then we can emit in whatever resolution is needed by the callers and also use the highest accuracy times for any arithmetic if needed in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see if there's any chance that we might change this metric from ms to ns in the future since that would cause incompatibility.

I think SQL planning phase should be very short, does this phase always take milliseconds to complete? I guess this phase should be done in micro-seconds, if it's true, the emitted metric is always zero in most cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't mean to say that we'd update the current metric. Rather that incase anyone tries to use it for a new metric or derive a metric using the planningTime, they would have the value with highest accuracy.

In my local tests with SELECT 1 query, even after multiple quick invocations the lowest planning time I saw was 4ms. The lowest query/time observed was 6ms. Have you seen < millisecond scale planning in production clusters maybe - which my local test would be missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I have seen many queries taking significant planning time. For E.g. we have seen this in particular with queries with a large IN clause or queries with many UNION operators. When debugging a slow query, we often don't have direct information about the time spent in planning. We tend to infer it by looking at other metrics and then comparing it with overall execution time. Having this metric available firsthand would help in that troubleshooting. If that planning time is too high, we would either fix it through a code change or adjust the query in some way.

So IMO there is definitely utility in having this metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohangarg I just guess based on my experience, and don't have evidence to prove that.

@abhishekagarwal87 abhishekagarwal87 merged commit 3c129f6 into apache:master Aug 22, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants