Add query/time metric for SQL queries from router by rohangarg · Pull Request #12867 · apache/druid

rohangarg · 2022-08-05T07:37:55Z

This change adds the query/time metric for SQL queries from router. Currently, the native queries do report that metric whereas the SQL queries don't. The biggest problem in support SQL query metrics is that in router the SQL query doesn't have a native query plan which can be used to send metrics.
So, instead we extract sqlQueryId from the query response header and only set that dimension for query/time metric for SQL queries. Due to the lack of a native translated query, we use a dummy native query to interact with QueryMetrics interface but ensure that no dummy dimensions are set in the metric.

The reasons for not de-serializing the SQL query requests are :

It can take time on the router which can add to the latencies of the query
It is not possible to de-serialize the JDBC SQL queries easily (would require same structure as Avatica handlers)

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

abhishekagarwal87

Thanks @rohangarg for fixing this. I have few comments on this PR.

services/src/main/java/org/apache/druid/server/AsyncQueryForwardingServlet.java

services/src/test/java/org/apache/druid/server/AsyncQueryForwardingServletTest.java

abhishekagarwal87 · 2022-08-05T11:57:51Z

processing/src/main/java/org/apache/druid/query/GenericQueryMetricsFactory.java

-  {
-    return null;
-  }
+  QueryMetrics<Query<?>> makeMetrics();


do we not need a default implementation?

I removed the default implementation since it is ok as per the PublicApi documentation to add method to interfaces in a major release. Further, the existing method's semantics are kept same.

Just because it is "okay" to add a thing, forcing anybody that happens to implement an interface to implement a new method adds friction to moving forward. It's "okay" to provide for some way to make changes and move things forward, not to say that it's okay to force people to make simple random changes to keep up with versions. Always make it as simple as possible for someone to bring their extension forward a version, if it's relatively easy to make it so that nobody has to change anything and they get good behavior, then that should be done.

Now, if the default implementation doesn't provide good behavior and instead causes bad things to happen, then people should be forced to implement it. In this case, it seems like a default is good?

I'm not sure if the default would be a good thing to have here. having a default method in the interface which returns null would mean that all the developers who have written custom GenericQueryMetricsFactory would need to implement this method anyways to get router SQL query metrics. Also this information to extend the factory would need communication.
Further since the method return would be a nullable, all users of the method would need to handle null explicitly in the code from now on.
The good thing that happens with default implementation is that developers who don't care about router metrics for SQL queries don't need to make any code changes to their custom implementation of metrics factory.

Then the answer isn't "I don't have a default implementation because it's okay to add one" it's "I don't have one because there isn't a good implementation for them and anybody who implements this interface really does need to think about what the correct implementation is". In which case, that's the reason to add a new method, so great :).

Anyway, sorry to be pedantic, but for anything that impacts compatibility, it's important to show the work that we've thought about the plight of the developer who is updating their cluster.

Yes, I totally agree with your sentiment regarding SPI compat being more of a judgement call rather than a technicality. I started out with a default implementation and then removed it due to the above rationale. But I missed adding the full explanation in the previous comment.
After this discussion I was also thinking if we could change makeMetrics(Query) to makeMetrics(@Nullable Query) and then make the makeMetrics default implementation as makeMetrics(null). But again, with that I think that might impact the semantics of makeMetrics(Query) to expect a non-null Query.
I'll again think if any other way is possible to avoid incompatbility and update if I find one.

I believe this is resolved. It is the way it is in this PR because that's the way it needs to be.

2. Add router metric tests for JDBC SQL query using avactica JSON 3. Add request log line for native sql queries

…time

rohangarg · 2022-08-18T17:55:56Z

@cheddar and I had a discussion where we talked about alternative ways to get the queryId and sqlQueryId for user queries. The current way to fetch the ids was from response header which is a last resort rather than a principled way. The response header method isn't very scalable incase we want to add more metrics from the router, since all of them would have to wait for the full request to be completed and then would have to be emitted from the response listeners.

The challenges with other approaches to fetch ids are :

We need to generate query ids incase the user hasn't provided them. This is needed to keep the metrics from routers and other services consistent for same user query. To check and generate the ids, we need to deserialize JDBC and normal SQL queries to a state where we can see the context map for them.
JDBC is a stateful protocol which works on the session state being managed by the broker. So, all the JDBC request won't have the ids in them (config params only allowed in connection-open request). Thus, incase a user is setting id via JDBC, while executing the actual SQL queries we won't be able to know about the ids.
Native query id for SQL can't be set from router for a query since the SQL might break into multiple native queries (union queries), and all of them then would have the same native id.

As a result, we've decided to do the following :

For normal SQL queries, we use a deserialized version of the request and inject it with SQL query-id if the user hasn't overriden it.
We don't emit native query-id for any SQL queries to avoid any problems with union queries
For JDBC queries, we extract the SQL query-id from the response header since that is best we can do as of now.

The above mentioned solution will allow to get the query/time metric for all SQL queries, while giving us some room to add more metrics atleast for normal SQL queries.

…eMetrics

imply-cheddar

A few questions to work out, once they are worked out I'll be approved.

imply-cheddar · 2022-08-26T05:52:38Z

processing/src/test/java/org/apache/druid/query/DefaultQueryMetricsTest.java

        .build();
    queryMetrics.query(query);
    queryMetrics.reportQueryTime(0).emit(serviceEmitter);
+    queryMetrics.sqlQueryId("dummy"); // done just to pacify the code coverage tool


What's the point of this comment? 6 months from now, when someone reads this code and sees that comment, how did the comment enrich their life? Fwiw, I'm not asking this sarcastically, I'm askign because I want whatever your answer is to be bundled into the comment :).

That or maybe the test can validate that something is done with the sqlQueryId and then it can actually be testing it or something?

The code was added just to make the code coverage tool pass - the actual verification can't be done because we have a no-op implementation for sqlQuery(String) in the default metrics. I've updated the comment to be more clear.

imply-cheddar · 2022-08-26T05:57:13Z

services/src/test/java/org/apache/druid/server/AsyncQueryForwardingServletTest.java

+          public HttpFields getHeaders()
+          {
+            HttpFields httpFields = new HttpFields();
+            httpFields.add(new HttpField(QueryResource.QUERY_ID_RESPONSE_HEADER, "dummy"));


I tend to frown on re-using constants like this in a test. The test is validating the consistency of the API. If you use a constant like this for this part, then someone could come along and change the header that the queryId is returned on, the tests would pass because they are also being changed because they are using the same object, but the production deployment could fail as you've broken the API: anything that depended on the older header name will be broken.

It's better for tests to actually be brittle in these cases: hard-code the header name so that if anything accidentally changes it in the future, it will be caught by the tests.

Yes, that makes sense to me! 👍 have updated to use hardcoded values to protect against silent failures

imply-cheddar · 2022-08-26T05:59:52Z

services/src/main/java/org/apache/druid/server/AsyncQueryForwardingServlet.java

+  private SqlQuery buildSqlQueryWithId(SqlQuery sqlQuery)
+  {
+    Map<String, Object> context = new HashMap<>(sqlQuery.getContext());
+    context.putIfAbsent(BaseQuery.SQL_QUERY_ID, UUID.randomUUID().toString());


Just double checking, but this will end up setting the native queryId as well if that was null, right? I.e. when I'm comparing my query/time metrics filtering on a single native queryId, I'll also get the query/time from the router, right?

discussed offiline, have added BaseQuery.QUERY_ID to the context, so the event will have id dimension filled in router's query/time metric event as well

…time

… query

Add query/time metric for SQL queries from router

71a21cc

abhishekagarwal87 added Area - SQL Area - Metrics/Event Emitting labels Aug 5, 2022

abhishekagarwal87 reviewed Aug 5, 2022

View reviewed changes

services/src/main/java/org/apache/druid/server/AsyncQueryForwardingServlet.java Outdated Show resolved Hide resolved

services/src/test/java/org/apache/druid/server/AsyncQueryForwardingServletTest.java Show resolved Hide resolved

fixup! Add query/time metric for SQL queries from router

174b9c5

abhishekagarwal87 reviewed Aug 5, 2022

View reviewed changes

rohangarg added 3 commits August 18, 2022 19:26

1. Use SqlQuery object for native queries to construct metrics

2441d95

2. Add router metric tests for JDBC SQL query using avactica JSON 3. Add request log line for native sql queries

Merge remote-tracking branch 'upstream/master' into router_sql_query_…

54b9beb

…time

Fix CI problems

f695d1d

Remove Nullable annotation from DefaultGenericQueryMetricsFactory#mak…

690d279

…eMetrics

abhishekagarwal87 added the Release Notes label Aug 22, 2022

imply-cheddar reviewed Aug 26, 2022

View reviewed changes

rohangarg added 5 commits August 31, 2022 13:25

Add queryId for SQL queries + address review

6aea1de

Merge remote-tracking branch 'upstream/master' into router_sql_query_…

b456bed

…time

Empty commit to trigger CI

dcc7d1d

fixup! Add queryId for SQL queries + address review

5419b88

Fix query cancel bug when user has overriden native query-id in a SQL…

74da72c

… query

cheddar approved these changes Sep 7, 2022

View reviewed changes

rohangarg merged commit 7aa8d7f into apache:master Sep 7, 2022

kfaraz added this to the 25.0 milestone Nov 22, 2022

Comments

Conversation

rohangarg commented Aug 5, 2022

Uh oh!

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohangarg Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohangarg commented Aug 18, 2022

Uh oh!

imply-cheddar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rohangarg Aug 18, 2022 •

edited

Loading