Skip to content

[KYUUBI #3926] Introduce antlr4 to parse query statement#3944

Closed
yikf wants to merge 1 commit intoapache:masterfrom
yikf:parser-init
Closed

[KYUUBI #3926] Introduce antlr4 to parse query statement#3944
yikf wants to merge 1 commit intoapache:masterfrom
yikf:parser-init

Conversation

@yikf
Copy link
Copy Markdown
Contributor

@yikf yikf commented Dec 8, 2022

Why are the changes needed?

Close #3926, we intend to introduce the parser module based on antlr4 in the Apache kyuubi server. Through this module, we can achieve:

  1. Expand kyuubi capabilities, for example, implement SHOW KYUUBI_SESSIONS through SQL.
  2. Sql translation for mysql front-end protocol.
  3. Sql translation for trino front-end protocol.

This issue is the first step of the parser module, which introduces the parser module and implements SHOW KYUUBI_SESSION, and then improves the Apache kyuubi parser module based on this initial pr.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@github-actions github-actions bot added kind:build kind:infra license, community building, project builds, asf infra related, etc. module:server labels Dec 8, 2022
@yikf
Copy link
Copy Markdown
Contributor Author

yikf commented Dec 8, 2022

@pan3793 @ulysses-you Please take a look if you find a moment, thanks

@pan3793
Copy link
Copy Markdown
Member

pan3793 commented Dec 8, 2022

Thanks @yikf, overall LGTM, will take a deep look this week. cc @cfmcgrady @yaooqinn @turboFei as well.

KYUUBI_SESSION would be a kind resource defined by KyuubiQL, I propose to add the following syntax in the future.

CREATE KYUUBI_SESSION ...
DROP KYUUBI_SESSION <session_id>
SHOW KYUUBI_SESSIONS [<conditions>]
ALTER KYUUBI_SESSION <session_id> SET ...

Any suggestions are welcome :)

@yaooqinn
Copy link
Copy Markdown
Member

yaooqinn commented Dec 8, 2022

will take a deep look tomorrow

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 8, 2022

Codecov Report

Merging #3944 (eb98f1e) into master (eb98f1e) will not change coverage.
The diff coverage is n/a.

❗ Current head eb98f1e differs from pull request most recent head 4463b29. Consider uploading reports for the commit 4463b29 to get more accurate results

@@            Coverage Diff            @@
##             master    #3944   +/-   ##
=========================================
  Coverage     51.96%   51.96%           
  Complexity       13       13           
=========================================
  Files           522      522           
  Lines         29042    29042           
  Branches       3887     3887           
=========================================
  Hits          15093    15093           
  Misses        12575    12575           
  Partials       1374     1374           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@yaooqinn
Copy link
Copy Markdown
Member

yaooqinn commented Dec 9, 2022

do we need to design our syntaxes with the keyword KYUUBI as a prefix?

@pan3793
Copy link
Copy Markdown
Member

pan3793 commented Dec 9, 2022

do we need to design our syntaxes with the keyword KYUUBI as a prefix?

I can image the most important resources in Kyuubi are SESSION and ENGINE, considering SESSION is quite a common thing, to avoid potential risk of syntax conflict, I suggest adding KYUUBI_ prefix

* SHOW KYUUBI_SESSIONS;
* }}}
*/
case class ShowSessions() extends RunnableCommand {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the thrift types directly like the spark SQL meta operations

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it takes a while to remove type-mapping with spark ones, so I suggest we remove these and use thrift types directly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kyuubi parser module has its own SQL capability, so I think it would be better to have its own schema system. In addition, the newly added kyuubi runnable node can conveniently implement its own node's output schema instead of constructing thrift types, which is somewhat troublesome. we only need to convert the output once.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you add them to the operation manager, which based on TRowSet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems not, I add these schema to parser module, Kyuubi operation ExecutedCommandExec call RunnableCommand.getNextRowSet, operation is not aware of the parser schema.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea is that the new runnableCommand can easily implement its own schema and data output. If each runnableCommand constructs thrift's schema and rowSet, I don't think it is convenient

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a PR(better KPIP) and dev discussion for adding a new type system, it's not a small work.

I'm truly sorry for the inconvenient about this request will cost, but it's necessary as an incomplete type system will result in a nightmare. See the timestamps/dates/intervals in hive/spark SQL type system for an example

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks kent, agree with you, i will move out the type system-related changes in this pr and the thrift type is used directly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the newly introduced type system and use thrift's TTypeId

@yaooqinn
Copy link
Copy Markdown
Member

yaooqinn commented Dec 9, 2022

do we need to design our syntaxes with the keyword KYUUBI as a prefix?

I can image the most important resources in Kyuubi are SESSION and ENGINE, considering SESSION is quite a common thing, to avoid potential risk of syntax conflict, I suggest adding KYUUBI_ prefix

yes, I agree with you. But I mean a SQL KEYWORD, like KYUUBI CMD/ KYUUBIADMIN CMD as a CMD prefix, not just a prefix of another KEYWORD, which leads us to add a lot of KEYWORDs in the form of KYUUBI_XXXX

@pan3793
Copy link
Copy Markdown
Member

pan3793 commented Dec 9, 2022

... like KYUUBI CMD/ KYUUBIADMIN CMD as a CMD prefix ...

Emm, I remember in the early offline discussion, we prefer the ANSI-like syntax. But I'm fine w/ both way.

@yaooqinn
Copy link
Copy Markdown
Member

yaooqinn commented Dec 9, 2022

... like KYUUBI CMD/ KYUUBIADMIN CMD as a CMD prefix ...

Emm, I remember in the early offline discussion, we prefer the ANSI-like syntax. But I'm fine w/ both way.

Let's list commands we may add as many as possible first?

case class ShowSessions() extends RunnableCommand {

override def run(kyuubiSession: KyuubiSession): Unit = {
val rows = kyuubiSession.sessionManager.allSessions().map { session =>
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this cmd lists all sessions, is it safe to do such a thing from an end user? @pan3793

Copy link
Copy Markdown
Member

@pan3793 pan3793 Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REST api has the similar behavior

org.apache.kyuubi.server.api.v1.SessionsResource#sessions

Copy link
Copy Markdown
Member

@yaooqinn yaooqinn Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REST API is in experimental phase, not declared GA

Copy link
Copy Markdown
Contributor Author

@yikf yikf Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List all sessions is an insecure operation, we need to have a mechanism to prevent this insecure situation from happening.

Either the person who does this operation is admin, or only list the sessions that the user has permission to.

It seems that there is no good mechanism now? This command can be set up as a experimental at this phase.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this command from this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A describe session command is safe to add I guess

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with you, will add desc session command


object SchemaHelper {

def toTTTableSchema(schema: List[Column]): TTableSchema = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toTTableSchema?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

assert(node4.isInstanceOf[PassThroughNode])
}

test("Parse Show Kyuubi_Session Node") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to Show Kyuubi_Sessions Node?

Seq(
Column("user", TTypeId.STRING_TYPE, Some("Kyuubi session user")),
Column("type", TTypeId.STRING_TYPE, Some("Kyuubi session type")),
Column("ip", TTypeId.STRING_TYPE, Some("Kyuubi session remote ip")))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id, user, type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the id column indicate the session identify?

@pan3793
Copy link
Copy Markdown
Member

pan3793 commented Dec 12, 2022

Summary: anyway, we need to introduce Antlr4 framework in server module, and the framework is already in a good shape, the key points here are the grammar and security. Basically, there are two type of styles: command and ANSI-like.

Command:

  • KYUUBI xxxx
  • KYUUBIADMIN xxxx

ANSI-like:

  • CREATE KYUUBI_SESSION xxx
  • ALTER KYUUBI_SESSION xxx SET xxx

each of those are extendable, and all of them lgtm.

Let's list commands we may add as many as possible first?

It's a good idea, we can do it in another issue, but I don't think it should block this PR.

For security concern about SHOW KYUUBI_SESSIONS, it's just a simplest command I can imagine to demonstrate how does the framework work, we can replace it by another simple command.

@yikf
Copy link
Copy Markdown
Contributor Author

yikf commented Dec 12, 2022

This PR should be blocked by syntax style, Just like the two styles that @pan3793 mentioned, both are fine to me.

But have a suggestion:

  • ANSI-like style is a fine syntax style, but it has the problem of duplicate prefixes in grammar files, and there is a certain chance of future grammar conflict problems;
  • KYUUBI/KYUUBIADMIN cmd style will definitely have no problem with grammar conflicts in the future, and no duplicate prefixes, so the syntax is more concise, but it is not an ansi syntax.

Considering that Kyuubi is not an sql engine and one of the purposes of the parser module we introduced was sql enhancement, I think ANSIstyle is probably not that important, so I prefer cmd style, any thought?

* limitations under the License.
*/

lexer grammar KyuubiSqlBaseLexer;
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ulysses-you, can you verify that this parser will work with trino FE?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not about trino for now, but we can support trino statement one by one by extending antlr later

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we tell from a meta operation and regular one for select * from sys.tables

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't affect other FE, that means other FE will pass though it. Only Trino session will match those special statements. I think user can not access the regular one if use Trino FE follows Trino's behavior.

@yaooqinn
Copy link
Copy Markdown
Member

I doubt these can be accomplished by the current implementation which is bound to the BackendService

Sql translation for mysql front-end protocol.
Sql translation for trino front-end protocol.

Copy link
Copy Markdown
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, it does not harm anything exists, I'm OK to repair things on a running car

@pan3793
Copy link
Copy Markdown
Member

pan3793 commented Dec 15, 2022

Thanks all, merging to master

@pan3793 pan3793 added this to the v1.7.0 milestone Dec 15, 2022
@pan3793 pan3793 closed this in 7b5af4a Dec 15, 2022
@yikf
Copy link
Copy Markdown
Contributor Author

yikf commented Dec 15, 2022

Thanks you all guys

@yikf yikf deleted the parser-init branch December 15, 2022 10:13
@pan3793
Copy link
Copy Markdown
Member

pan3793 commented Jan 13, 2023

When adding new deps into binary tgz, don't forget to update LICENSE-binary.

yikf added a commit that referenced this pull request Jan 18, 2023
### _Why are the changes needed?_

This pr is followup of the #3944, it supplement LICENSE-binary

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4163 from Yikf/license.

Closes #3926

92d907d [Yikf] [KYUUBI #3926][FOLLOWUP] Supplement LICENSE-binary

Authored-by: Yikf <yikaifei1@gmail.com>
Signed-off-by: Yikf <yikaifei@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:build kind:infra license, community building, project builds, asf infra related, etc. module:server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Introduce antlr4 to parse query statement

7 participants