Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL id, request logs, and metrics #6302

Merged
merged 20 commits into from
Jan 16, 2019
Merged

Add SQL id, request logs, and metrics #6302

merged 20 commits into from
Jan 16, 2019

Conversation

gaodayue
Copy link
Contributor

@gaodayue gaodayue commented Sep 5, 2018

Fixed #6301

This PR adds a new sqlQueryId method to the QueryMetrics interface, which should be noted in the release notes.

@jon-wei
Copy link
Contributor

jon-wei commented Sep 14, 2018

@gaodayue Is this still WIP? I see "TODO" in the description

@gaodayue
Copy link
Contributor Author

Hi @jon-wei , I'm waiting for review comments for proposal #6301 . If the proposal sounds good, I will continue to finish the PR.

@gaodayue
Copy link
Contributor Author

Added UT and docs, this PR is ready to be reviewed. Hi @jon-wei and @gianm , could any of you guys help review?

@jon-wei
Copy link
Contributor

jon-wei commented Sep 20, 2018

@gaodayue Sure, I can review, though I probably won't have time until next week

@gianm
Copy link
Contributor

gianm commented Sep 27, 2018

Hi @gaodayue, in the meantime could you please check the tests and inspections? In the tests, it looks like lots of CalciteQueryTests are failing due to the new sqlQueryId. Maybe the verifier that compares the queries for correctness should strip sqlQueryId first. In the inspections, it is reporting that the Query.getSqlQueryId() method is never used. If it's not needed, you can remove it.

@gaodayue
Copy link
Contributor Author

Thanks @gianm . I will fix all failed test cases ASAP.

In the inspections, it is reporting that the Query.getSqlQueryId() method is never used. If it's not needed, you can remove it.

That method is for custom QueryMetrics impls to add "sqlQueryId" dimension to the metrics. I add @SuppressWarnings("unused") to the signature, hope the inspection tool can understand it.

@@ -347,6 +347,37 @@ Composite Request Logger emits request logs to multiple request loggers.
|--------|-----------|-------|
|`druid.request.logging.loggerProviders`|List of request loggers for emitting request logs.|none|

### SQL Request Logging

Brokers can be configured to log the sql request (both from HTTP and JDBC) they see.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's capitalize "sql" everywhere in the non-property parts of the docs

import java.util.concurrent.Callable;
import java.util.concurrent.ScheduledExecutorService;

public abstract class AbstractFileRequestLogger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to split the RequestLogger implementations into native/SQL versions, it's enough that there's a new SqlRequestLogLine implementation.

If you want to have two different request loggers for SQL and native queries, I think would be better to bind the SQL provider to a different configuration parameter and inject the desired provider using annotations (maybe like how you can get a @Coordinator DruidLeaderClient and a separate @IndexingService instance)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm Do you have a strong preference on single request logger vs. separate loggers for SQL and native queries? I saw there was some discussion around that in the proposal, I'm personally fine with either approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, after thinking about it more, I think it's better to have everything go through one RequestLogger interface. Will write up the reasons why in a separate comment.

@gaodayue
Copy link
Contributor Author

Seems like the inspection tool still complains about the unused method, could you give me some suggestions on how to handle it?

@QiuMM
Copy link
Member

QiuMM commented Sep 28, 2018

@gaodayue maybe you can merge the master branch, it works for me.

@gianm
Copy link
Contributor

gianm commented Sep 28, 2018

@gaodayue @jon-wei re: #6302 (comment),

I am thinking it will be best to have a single RequestLogger interface with methods like logNativeQuery and logSqlQuery. There are three main reasons:

  1. It is just as flexible as having multiple interfaces. Imagine a "MultiRequestLogger" that can route different types of logs to different underlying loggers (SQL to kafka, native to file; or different files, etc). Sort of like the "composing" emitter we have.
  2. It is more powerful, in that it can do one thing that the multiple interfaces cannot: it can write different types of logs to the same file.
  3. Last but not least: it is simpler for users that are just getting started. They set up request logging one time, and all request logs go to the same place (of course, there should be a logType field that allows users to understand what is being logged). We could even extend this in the future to add exception logging and other sorts of structured logging (as opposed to log4j, which we use for unstructured logging).

@gianm
Copy link
Contributor

gianm commented Sep 28, 2018

What do you think?

@gaodayue
Copy link
Contributor Author

gaodayue commented Oct 1, 2018

Thanks @gianm. I think your arguments about single RequestLogger interface are reasonable, and I would like to give that approach a try.

Sorry for the late reply. I'm currently on a vacation, so I'm not sure when can I finish refactoring this, but I'll do my best.

@gaodayue
Copy link
Contributor Author

gaodayue commented Oct 9, 2018

Combined RequestLogger and SqlRequestLogger into one interface according to Gian's advice, the code looks a lot concise now. @gianm could your review again?

@gianm gianm self-assigned this Oct 15, 2018
@gaodayue
Copy link
Contributor Author

Sync with master again

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaodayue, sorry for the delay on reviewing. But thank you very much for the contribution. I left some comments just now, a couple of which are substantive (to do with the request log format and the X-Druid-Native-Query-Ids header). Let us know what you think.

And /cc @jon-wei, could you please take a look at the security checks in SqlLifecycle?

docs/content/configuration/index.md Show resolved Hide resolved
@@ -58,10 +78,12 @@ public String getLine(ObjectMapper objectMapper) throws JsonProcessingException
);
}

@JsonProperty("timestamp")
public DateTime getTimestamp()
public String getSqlQueryLine(ObjectMapper objectMapper) throws JsonProcessingException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format should be documented, but also, IMO, adjusted. Firstly, it should be on one line (this makes it easier to parse); secondly, it should be the same rough format as getNativeQueryLine (so the same parser can handle both). That means a TSV. How about doing:

    return JOINER.join(
        Arrays.asList(
            timestamp,
            remoteAddr,
            objectMapper.writeValueAsString(ImmutableMap.of("queryType", "sql", "sql", sql)),
            objectMapper.writeValueAsString(queryStats)
        )
    );

It's sort of using a fake queryType of 'sql', which is a little weird, but makes parsing pretty easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this:

    return JOINER.join(
        Arrays.asList(
            timestamp,
            remoteAddr,
            "",
            objectMapper.writeValueAsString(queryStats),
            objectMapper.writeValueAsString(ImmutableMap.of("sql", sql))
        )
    );

It leaves the native-query field blank, and adds a new field on the end of the TSV for sql query. It uses a JSON object rather than emitting the SQL as-is for two reasons: (1) we can extend with more info later if we want; (2) the SQL query might have newlines and such in it, and the objectMapper.writeValueAsString will get rid of those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the format should be documented and easy to parse. Considering the two approaches, I'm in favor of the latter one because it makes it clear that SQL query is different from native query (e.g., it doesn't have queryType). But I also want to tweak it into

    return JOINER.join(
        Arrays.asList(
            timestamp,
            remoteAddr,
            "",
            objectMapper.writeValueAsString(queryStats),
            objectMapper.writeValueAsString(ImmutableMap.of("query", sql, "context", sqlQueryContext))
        )
    );

It uses the field naming of SqlQuery but only includes query and context fields which apply to both http and jdbc scenario. Please let me know your opinions :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that format you suggested, +1 from me.

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved
sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved
sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved
@jon-wei jon-wei self-assigned this Jan 8, 2019
Copy link
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the auth-related changes, those lgtm, had a couple of other small comments

sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved
sql/src/main/java/org/apache/druid/sql/SqlLifecycle.java Outdated Show resolved Hide resolved
@gaodayue
Copy link
Contributor Author

Thanks @gianm and @jon-wei for reviewing. I have replied to all your comments and I think the only one needing further discussion is about the request log format. Once we reach consensus on that, I'll go implement and rebase to master again.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge from master and then this LGTM.

@gaodayue
Copy link
Contributor Author

Merged with master. The reported inspection errors don't look like a problem to me.

@gianm
Copy link
Contributor

gianm commented Jan 15, 2019

@gaodayue thanks for merging with master. We'll need to fix the inspection report since otherwise it will start failing for master as well. Could you look into it? There are three I see:

"QueryMetrics.java:210: sqlQueryId() Parameter query is not used in either this method or any of its derived methods."

It's an extension point, not meant to be used by Druid production code. Annotating it with @PublicApi (or @SuppressWarnings("unused")) will tell that to the static analyzer, & it should stop complaining.

"QueryMetrics.java:210: sqlQueryId() Method is never used as a member of this interface, but only as a member of the implementation class(es). The project will stay compilable if the method is removed from the interface."

I think doing (1) should fix this too.

"ComposingRequestLoggerProvider.java:128: accept() The declared exception IOException is never thrown in method implementations"

This looks like a bug in the inspection. It looks like RequestLogLineConsumer declares throws IOException and its implementations do too, so there is no issue with the code. If that's right, then try working around this by adding //noinspection RedundantThrows before the void accept(... line.

@gianm
Copy link
Contributor

gianm commented Jan 15, 2019

Hmm, the first thing is fixed, but TC is still complaining about the redundant "throws" that is not actually redundant. I'm not sure how to fix this. I tried downloading the latest IntelliJ (2018.3.2) and my IDE does not flag this line as a redundant throw. In fact, it flags the //noinspection RedundantThrows as unnecessary. It seems to be something broken with the online TeamCity analyzer.

@gaodayue, sorry, but can you try one more thing: change //noinspection RedundantThrows to @SuppressWarnings("RedundantThrows")?

If making that change does not silence the inspection, my suggestion would be to remove the RedundantThrows check from .idea/inspectionProfiles/Druid.xml.

/cc @leventov, any other ideas what might be going on?

@gianm
Copy link
Contributor

gianm commented Jan 16, 2019

That seemed to work!

@gianm
Copy link
Contributor

gianm commented Jan 16, 2019

@jon-wei any further comments?

Copy link
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jon-wei jon-wei merged commit 5b8a221 into apache:master Jan 16, 2019
@leventov
Copy link
Member

@gianm please raise an issue in IntelliJ's YouTrack. Ideally a link to the issue should be in a comment next to suppression, so that people could know when the suppression could be removed. See DruidFloatPredicate for an example.

@gianm
Copy link
Contributor

gianm commented Jan 16, 2019

I've raised this issue: https://youtrack.jetbrains.com/issue/IDEA-205535

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants