Add query/time metric for SQL queries from router#12867
Add query/time metric for SQL queries from router#12867rohangarg merged 11 commits intoapache:masterfrom
Conversation
abhishekagarwal87
left a comment
There was a problem hiding this comment.
Thanks @rohangarg for fixing this. I have few comments on this PR.
services/src/main/java/org/apache/druid/server/AsyncQueryForwardingServlet.java
Outdated
Show resolved
Hide resolved
services/src/test/java/org/apache/druid/server/AsyncQueryForwardingServletTest.java
Show resolved
Hide resolved
| { | ||
| return null; | ||
| } | ||
| QueryMetrics<Query<?>> makeMetrics(); |
There was a problem hiding this comment.
do we not need a default implementation?
There was a problem hiding this comment.
I removed the default implementation since it is ok as per the PublicApi documentation to add method to interfaces in a major release. Further, the existing method's semantics are kept same.
There was a problem hiding this comment.
Just because it is "okay" to add a thing, forcing anybody that happens to implement an interface to implement a new method adds friction to moving forward. It's "okay" to provide for some way to make changes and move things forward, not to say that it's okay to force people to make simple random changes to keep up with versions. Always make it as simple as possible for someone to bring their extension forward a version, if it's relatively easy to make it so that nobody has to change anything and they get good behavior, then that should be done.
Now, if the default implementation doesn't provide good behavior and instead causes bad things to happen, then people should be forced to implement it. In this case, it seems like a default is good?
There was a problem hiding this comment.
I'm not sure if the default would be a good thing to have here. having a default method in the interface which returns null would mean that all the developers who have written custom GenericQueryMetricsFactory would need to implement this method anyways to get router SQL query metrics. Also this information to extend the factory would need communication.
Further since the method return would be a nullable, all users of the method would need to handle null explicitly in the code from now on.
The good thing that happens with default implementation is that developers who don't care about router metrics for SQL queries don't need to make any code changes to their custom implementation of metrics factory.
There was a problem hiding this comment.
Then the answer isn't "I don't have a default implementation because it's okay to add one" it's "I don't have one because there isn't a good implementation for them and anybody who implements this interface really does need to think about what the correct implementation is". In which case, that's the reason to add a new method, so great :).
Anyway, sorry to be pedantic, but for anything that impacts compatibility, it's important to show the work that we've thought about the plight of the developer who is updating their cluster.
There was a problem hiding this comment.
Yes, I totally agree with your sentiment regarding SPI compat being more of a judgement call rather than a technicality. I started out with a default implementation and then removed it due to the above rationale. But I missed adding the full explanation in the previous comment.
After this discussion I was also thinking if we could change makeMetrics(Query) to makeMetrics(@Nullable Query) and then make the makeMetrics default implementation as makeMetrics(null). But again, with that I think that might impact the semantics of makeMetrics(Query) to expect a non-null Query.
I'll again think if any other way is possible to avoid incompatbility and update if I find one.
There was a problem hiding this comment.
I believe this is resolved. It is the way it is in this PR because that's the way it needs to be.
2. Add router metric tests for JDBC SQL query using avactica JSON 3. Add request log line for native sql queries
|
@cheddar and I had a discussion where we talked about alternative ways to get the The challenges with other approaches to fetch ids are :
As a result, we've decided to do the following :
The above mentioned solution will allow to get the |
imply-cheddar
left a comment
There was a problem hiding this comment.
A few questions to work out, once they are worked out I'll be approved.
| .build(); | ||
| queryMetrics.query(query); | ||
| queryMetrics.reportQueryTime(0).emit(serviceEmitter); | ||
| queryMetrics.sqlQueryId("dummy"); // done just to pacify the code coverage tool |
There was a problem hiding this comment.
What's the point of this comment? 6 months from now, when someone reads this code and sees that comment, how did the comment enrich their life? Fwiw, I'm not asking this sarcastically, I'm askign because I want whatever your answer is to be bundled into the comment :).
That or maybe the test can validate that something is done with the sqlQueryId and then it can actually be testing it or something?
There was a problem hiding this comment.
The code was added just to make the code coverage tool pass - the actual verification can't be done because we have a no-op implementation for sqlQuery(String) in the default metrics. I've updated the comment to be more clear.
| public HttpFields getHeaders() | ||
| { | ||
| HttpFields httpFields = new HttpFields(); | ||
| httpFields.add(new HttpField(QueryResource.QUERY_ID_RESPONSE_HEADER, "dummy")); |
There was a problem hiding this comment.
I tend to frown on re-using constants like this in a test. The test is validating the consistency of the API. If you use a constant like this for this part, then someone could come along and change the header that the queryId is returned on, the tests would pass because they are also being changed because they are using the same object, but the production deployment could fail as you've broken the API: anything that depended on the older header name will be broken.
It's better for tests to actually be brittle in these cases: hard-code the header name so that if anything accidentally changes it in the future, it will be caught by the tests.
There was a problem hiding this comment.
Yes, that makes sense to me! 👍 have updated to use hardcoded values to protect against silent failures
| private SqlQuery buildSqlQueryWithId(SqlQuery sqlQuery) | ||
| { | ||
| Map<String, Object> context = new HashMap<>(sqlQuery.getContext()); | ||
| context.putIfAbsent(BaseQuery.SQL_QUERY_ID, UUID.randomUUID().toString()); |
There was a problem hiding this comment.
Just double checking, but this will end up setting the native queryId as well if that was null, right? I.e. when I'm comparing my query/time metrics filtering on a single native queryId, I'll also get the query/time from the router, right?
There was a problem hiding this comment.
discussed offiline, have added BaseQuery.QUERY_ID to the context, so the event will have id dimension filled in router's query/time metric event as well
This change adds the query/time metric for SQL queries from router. Currently, the native queries do report that metric whereas the SQL queries don't. The biggest problem in support SQL query metrics is that in router the SQL query doesn't have a native query plan which can be used to send metrics.
So, instead we extract
sqlQueryIdfrom the query response header and only set that dimension for query/time metric for SQL queries. Due to the lack of a native translated query, we use a dummy native query to interact withQueryMetricsinterface but ensure that no dummy dimensions are set in the metric.The reasons for not de-serializing the SQL query requests are :
This PR has: