Add SQL id, request logs, and metrics #6302

gaodayue · 2018-09-05T10:37:07Z

This PR adds a new sqlQueryId method to the QueryMetrics interface, which should be noted in the release notes.

jon-wei · 2018-09-14T02:06:18Z

@gaodayue Is this still WIP? I see "TODO" in the description

gaodayue · 2018-09-14T02:48:15Z

Hi @jon-wei , I'm waiting for review comments for proposal #6301 . If the proposal sounds good, I will continue to finish the PR.

gaodayue · 2018-09-20T13:20:08Z

Added UT and docs, this PR is ready to be reviewed. Hi @jon-wei and @gianm , could any of you guys help review?

jon-wei · 2018-09-20T21:40:53Z

@gaodayue Sure, I can review, though I probably won't have time until next week

gianm · 2018-09-27T03:15:20Z

Hi @gaodayue, in the meantime could you please check the tests and inspections? In the tests, it looks like lots of CalciteQueryTests are failing due to the new sqlQueryId. Maybe the verifier that compares the queries for correctness should strip sqlQueryId first. In the inspections, it is reporting that the Query.getSqlQueryId() method is never used. If it's not needed, you can remove it.

…qlQueryId

gaodayue · 2018-09-27T16:37:26Z

Thanks @gianm . I will fix all failed test cases ASAP.

In the inspections, it is reporting that the Query.getSqlQueryId() method is never used. If it's not needed, you can remove it.

That method is for custom QueryMetrics impls to add "sqlQueryId" dimension to the metrics. I add @SuppressWarnings("unused") to the signature, hope the inspection tool can understand it.

jon-wei · 2018-09-28T00:16:12Z

docs/content/configuration/index.md

@@ -347,6 +347,37 @@ Composite Request Logger emits request logs to multiple request loggers.
 |--------|-----------|-------|
 |`druid.request.logging.loggerProviders`|List of request loggers for emitting request logs.|none|

+### SQL Request Logging
+
+Brokers can be configured to log the sql request (both from HTTP and JDBC) they see.


Let's capitalize "sql" everywhere in the non-property parts of the docs

jon-wei · 2018-09-28T00:22:20Z

server/src/main/java/org/apache/druid/server/log/AbstractFileRequestLogger.java

+import java.util.concurrent.Callable;
+import java.util.concurrent.ScheduledExecutorService;
+
+public abstract class AbstractFileRequestLogger


I don't think it's necessary to split the RequestLogger implementations into native/SQL versions, it's enough that there's a new SqlRequestLogLine implementation.

If you want to have two different request loggers for SQL and native queries, I think would be better to bind the SQL provider to a different configuration parameter and inject the desired provider using annotations (maybe like how you can get a @Coordinator DruidLeaderClient and a separate @IndexingService instance)

@gianm Do you have a strong preference on single request logger vs. separate loggers for SQL and native queries? I saw there was some discussion around that in the proposal, I'm personally fine with either approach

Hmm, after thinking about it more, I think it's better to have everything go through one RequestLogger interface. Will write up the reasons why in a separate comment.

gaodayue · 2018-09-28T01:33:23Z

Seems like the inspection tool still complains about the unused method, could you give me some suggestions on how to handle it?

QiuMM · 2018-09-28T12:35:30Z

@gaodayue maybe you can merge the master branch, it works for me.

gianm · 2018-09-28T19:19:07Z

@gaodayue @jon-wei re: #6302 (comment),

I am thinking it will be best to have a single RequestLogger interface with methods like logNativeQuery and logSqlQuery. There are three main reasons:

It is just as flexible as having multiple interfaces. Imagine a "MultiRequestLogger" that can route different types of logs to different underlying loggers (SQL to kafka, native to file; or different files, etc). Sort of like the "composing" emitter we have.
It is more powerful, in that it can do one thing that the multiple interfaces cannot: it can write different types of logs to the same file.
Last but not least: it is simpler for users that are just getting started. They set up request logging one time, and all request logs go to the same place (of course, there should be a logType field that allows users to understand what is being logged). We could even extend this in the future to add exception logging and other sorts of structured logging (as opposed to log4j, which we use for unstructured logging).

gianm · 2018-09-28T19:19:17Z

What do you think?

gaodayue · 2018-10-01T17:16:24Z

Thanks @gianm. I think your arguments about single RequestLogger interface are reasonable, and I would like to give that approach a try.

Sorry for the late reply. I'm currently on a vacation, so I'm not sure when can I finish refactoring this, but I'll do my best.

gaodayue · 2018-10-09T12:07:05Z

Combined RequestLogger and SqlRequestLogger into one interface according to Gian's advice, the code looks a lot concise now. @gianm could your review again?

gaodayue · 2018-10-23T04:19:36Z

Sync with master again

gianm

@gaodayue, sorry for the delay on reviewing. But thank you very much for the contribution. I left some comments just now, a couple of which are substantive (to do with the request log format and the X-Druid-Native-Query-Ids header). Let us know what you think.

And /cc @jon-wei, could you please take a look at the security checks in SqlLifecycle?

docs/content/configuration/index.md

processing/src/main/java/org/apache/druid/query/BaseQuery.java

gianm · 2019-01-08T21:23:58Z

server/src/main/java/org/apache/druid/server/RequestLogLine.java

@@ -58,10 +78,12 @@ public String getLine(ObjectMapper objectMapper) throws JsonProcessingException
    );
  }

-  @JsonProperty("timestamp")
-  public DateTime getTimestamp()
+  public String getSqlQueryLine(ObjectMapper objectMapper) throws JsonProcessingException


This format should be documented, but also, IMO, adjusted. Firstly, it should be on one line (this makes it easier to parse); secondly, it should be the same rough format as getNativeQueryLine (so the same parser can handle both). That means a TSV. How about doing:

return JOINER.join( Arrays.asList( timestamp, remoteAddr, objectMapper.writeValueAsString(ImmutableMap.of("queryType", "sql", "sql", sql)), objectMapper.writeValueAsString(queryStats) ) );

It's sort of using a fake queryType of 'sql', which is a little weird, but makes parsing pretty easy.

Alternatively, this:

return JOINER.join( Arrays.asList( timestamp, remoteAddr, "", objectMapper.writeValueAsString(queryStats), objectMapper.writeValueAsString(ImmutableMap.of("sql", sql)) ) );

It leaves the native-query field blank, and adds a new field on the end of the TSV for sql query. It uses a JSON object rather than emitting the SQL as-is for two reasons: (1) we can extend with more info later if we want; (2) the SQL query might have newlines and such in it, and the objectMapper.writeValueAsString will get rid of those.

I agree that the format should be documented and easy to parse. Considering the two approaches, I'm in favor of the latter one because it makes it clear that SQL query is different from native query (e.g., it doesn't have queryType). But I also want to tweak it into

return JOINER.join( Arrays.asList( timestamp, remoteAddr, "", objectMapper.writeValueAsString(queryStats), objectMapper.writeValueAsString(ImmutableMap.of("query", sql, "context", sqlQueryContext)) ) );

It uses the field naming of SqlQuery but only includes query and context fields which apply to both http and jdbc scenario. Please let me know your opinions :)

I like that format you suggested, +1 from me.

server/src/main/java/org/apache/druid/server/log/FileRequestLogger.java

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java

sql/src/main/java/org/apache/druid/sql/calcite/view/DruidViewMacro.java

sql/src/main/java/org/apache/druid/sql/http/SqlResource.java

jon-wei

Reviewed the auth-related changes, those lgtm, had a couple of other small comments

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java

gaodayue · 2019-01-11T04:54:45Z

Thanks @gianm and @jon-wei for reviewing. I have replied to all your comments and I think the only one needing further discussion is about the request log format. Once we reach consensus on that, I'll go implement and rebase to master again.

gianm

Please merge from master and then this LGTM.

gaodayue · 2019-01-15T04:54:53Z

Merged with master. The reported inspection errors don't look like a problem to me.

gianm · 2019-01-15T15:41:22Z

@gaodayue thanks for merging with master. We'll need to fix the inspection report since otherwise it will start failing for master as well. Could you look into it? There are three I see:

"QueryMetrics.java:210: sqlQueryId() Parameter query is not used in either this method or any of its derived methods."

It's an extension point, not meant to be used by Druid production code. Annotating it with @PublicApi (or @SuppressWarnings("unused")) will tell that to the static analyzer, & it should stop complaining.

"QueryMetrics.java:210: sqlQueryId() Method is never used as a member of this interface, but only as a member of the implementation class(es). The project will stay compilable if the method is removed from the interface."

I think doing (1) should fix this too.

"ComposingRequestLoggerProvider.java:128: accept() The declared exception IOException is never thrown in method implementations"

This looks like a bug in the inspection. It looks like RequestLogLineConsumer declares throws IOException and its implementations do too, so there is no issue with the code. If that's right, then try working around this by adding //noinspection RedundantThrows before the void accept(... line.

gianm · 2019-01-15T20:50:10Z

Hmm, the first thing is fixed, but TC is still complaining about the redundant "throws" that is not actually redundant. I'm not sure how to fix this. I tried downloading the latest IntelliJ (2018.3.2) and my IDE does not flag this line as a redundant throw. In fact, it flags the //noinspection RedundantThrows as unnecessary. It seems to be something broken with the online TeamCity analyzer.

@gaodayue, sorry, but can you try one more thing: change //noinspection RedundantThrows to @SuppressWarnings("RedundantThrows")?

If making that change does not silence the inspection, my suggestion would be to remove the RedundantThrows check from .idea/inspectionProfiles/Druid.xml.

/cc @leventov, any other ideas what might be going on?

…mposingRequestLoggerProvider

gianm · 2019-01-16T05:50:29Z

That seemed to work!

gianm · 2019-01-16T05:52:34Z

@jon-wei any further comments?

jon-wei

LGTM

leventov · 2019-01-16T12:31:53Z

@gianm please raise an issue in IntelliJ's YouTrack. Ideally a link to the issue should be in a comment next to suppression, so that people could know when the suppression could be removed. See DruidFloatPredicate for an example.

gianm · 2019-01-16T17:07:18Z

I've raised this issue: https://youtrack.jetbrains.com/issue/IDEA-205535

gaodayue added 3 commits September 5, 2018 12:03

use SqlLifecyle to manage sql execution, add sqlId

eba4853

add sql request logger

b3585dc

fix UT

47caff1

gaodayue added 4 commits September 19, 2018 17:09

Merge remote-tracking branch 'upstream/master' into sqlid

0e26059

rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc

c3f2b2a

add docs and more sql request logger impls

ec346d7

add UT for http and jdbc

b59b96f

fix forbidden use of com.google.common.base.Charsets

bf00e4f

fix UT in QuantileSqlAggregatorTest, supressed unused warning of getS…

fe54773

…qlQueryId

do not use default method in QueryMetrics interface

36e83c8

jon-wei reviewed Sep 28, 2018

View reviewed changes

capitalize 'sql' everywhere in the non-property parts of the docs

fb881d2

gaodayue added 3 commits October 8, 2018 19:47

Merge remote-tracking branch 'apache/master' into sqlid

56068ef

use RequestLogger interface to log sql query

94419ec

minor bugfixes and add switching request logger

f0bc250

gianm self-assigned this Oct 15, 2018

Merge remote-tracking branch 'apache/master' into sqlid

3046b58

add filePattern configs for FileRequestLogger

b2f73b9

gianm reviewed Jan 8, 2019

View reviewed changes

jon-wei self-assigned this Jan 8, 2019

jon-wei reviewed Jan 9, 2019

View reviewed changes

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved

address review comments, adjust sql request log format

fcd9220

gianm approved these changes Jan 14, 2019

View reviewed changes

Merge remote-tracking branch 'apache/master' into sqlid

4aff6a3

fix inspection error

0288440

try SuppressWarnings("RedundantThrows") to fix inspection error on Co…

dfde651

…mposingRequestLoggerProvider

jon-wei approved these changes Jan 16, 2019

View reviewed changes

jon-wei merged commit 5b8a221 into apache:master Jan 16, 2019

jon-wei added this to the 0.14.0 milestone Jan 16, 2019

jon-wei mentioned this pull request Feb 22, 2019

0.14.0-incubating release notes #7126

Closed

jon-wei added Release Notes Area - Metrics/Event Emitting labels Feb 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQL id, request logs, and metrics #6302

Add SQL id, request logs, and metrics #6302

gaodayue commented Sep 5, 2018 •

edited

Loading

jon-wei commented Sep 14, 2018

gaodayue commented Sep 14, 2018

gaodayue commented Sep 20, 2018

jon-wei commented Sep 20, 2018

gianm commented Sep 27, 2018

gaodayue commented Sep 27, 2018

jon-wei Sep 28, 2018

jon-wei Sep 28, 2018

jon-wei Sep 28, 2018

gianm Sep 28, 2018

gaodayue commented Sep 28, 2018

QiuMM commented Sep 28, 2018

gianm commented Sep 28, 2018

gianm commented Sep 28, 2018

gaodayue commented Oct 1, 2018

gaodayue commented Oct 9, 2018

gaodayue commented Oct 23, 2018

gianm left a comment

gianm Jan 8, 2019

gianm Jan 8, 2019

gaodayue Jan 11, 2019

gianm Jan 11, 2019

jon-wei left a comment

gaodayue commented Jan 11, 2019

gianm left a comment

gaodayue commented Jan 15, 2019

gianm commented Jan 15, 2019

gianm commented Jan 15, 2019

gianm commented Jan 16, 2019

gianm commented Jan 16, 2019

jon-wei left a comment

leventov commented Jan 16, 2019

gianm commented Jan 16, 2019

Add SQL id, request logs, and metrics #6302

Add SQL id, request logs, and metrics #6302

Conversation

gaodayue commented Sep 5, 2018 • edited Loading

jon-wei commented Sep 14, 2018

gaodayue commented Sep 14, 2018

gaodayue commented Sep 20, 2018

jon-wei commented Sep 20, 2018

gianm commented Sep 27, 2018

gaodayue commented Sep 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaodayue commented Sep 28, 2018

QiuMM commented Sep 28, 2018

gianm commented Sep 28, 2018

gianm commented Sep 28, 2018

gaodayue commented Oct 1, 2018

gaodayue commented Oct 9, 2018

gaodayue commented Oct 23, 2018

gianm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei left a comment

Choose a reason for hiding this comment

gaodayue commented Jan 11, 2019

gianm left a comment

Choose a reason for hiding this comment

gaodayue commented Jan 15, 2019

gianm commented Jan 15, 2019

gianm commented Jan 15, 2019

gianm commented Jan 16, 2019

gianm commented Jan 16, 2019

jon-wei left a comment

Choose a reason for hiding this comment

leventov commented Jan 16, 2019

gianm commented Jan 16, 2019

gaodayue commented Sep 5, 2018 •

edited

Loading