[KYUUBI #3926] Introduce antlr4 to parse query statement#3944
[KYUUBI #3926] Introduce antlr4 to parse query statement#3944yikf wants to merge 1 commit intoapache:masterfrom yikf:parser-init
Conversation
|
@pan3793 @ulysses-you Please take a look if you find a moment, thanks |
|
Thanks @yikf, overall LGTM, will take a deep look this week. cc @cfmcgrady @yaooqinn @turboFei as well.
Any suggestions are welcome :) |
|
will take a deep look tomorrow |
kyuubi-server/src/main/antlr4/org/apache/kyuubi/sql/KyuubiSqlBaseParser.g4
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #3944 +/- ##
=========================================
Coverage 51.96% 51.96%
Complexity 13 13
=========================================
Files 522 522
Lines 29042 29042
Branches 3887 3887
=========================================
Hits 15093 15093
Misses 12575 12575
Partials 1374 1374 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
do we need to design our syntaxes with the keyword KYUUBI as a prefix? |
I can image the most important resources in Kyuubi are |
kyuubi-server/src/main/scala/org/apache/kyuubi/operation/ExecutedCommandExec.scala
Show resolved
Hide resolved
| * SHOW KYUUBI_SESSIONS; | ||
| * }}} | ||
| */ | ||
| case class ShowSessions() extends RunnableCommand { |
There was a problem hiding this comment.
why not use the thrift types directly like the spark SQL meta operations
There was a problem hiding this comment.
it takes a while to remove type-mapping with spark ones, so I suggest we remove these and use thrift types directly
There was a problem hiding this comment.
The Kyuubi parser module has its own SQL capability, so I think it would be better to have its own schema system. In addition, the newly added kyuubi runnable node can conveniently implement its own node's output schema instead of constructing thrift types, which is somewhat troublesome. we only need to convert the output once.
There was a problem hiding this comment.
but you add them to the operation manager, which based on TRowSet
There was a problem hiding this comment.
It seems not, I add these schema to parser module, Kyuubi operation ExecutedCommandExec call RunnableCommand.getNextRowSet, operation is not aware of the parser schema.
There was a problem hiding this comment.
my idea is that the new runnableCommand can easily implement its own schema and data output. If each runnableCommand constructs thrift's schema and rowSet, I don't think it is convenient
There was a problem hiding this comment.
You can create a PR(better KPIP) and dev discussion for adding a new type system, it's not a small work.
I'm truly sorry for the inconvenient about this request will cost, but it's necessary as an incomplete type system will result in a nightmare. See the timestamps/dates/intervals in hive/spark SQL type system for an example
There was a problem hiding this comment.
Thanks kent, agree with you, i will move out the type system-related changes in this pr and the thrift type is used directly.
There was a problem hiding this comment.
Remove the newly introduced type system and use thrift's TTypeId
yes, I agree with you. But I mean a SQL KEYWORD, like |
Emm, I remember in the early offline discussion, we prefer the ANSI-like syntax. But I'm fine w/ both way. |
Let's list commands we may add as many as possible first? |
kyuubi-server/src/main/scala/org/apache/kyuubi/parser/KyuubiParser.scala
Show resolved
Hide resolved
kyuubi-server/src/main/scala/org/apache/kyuubi/parser/plan/command/RunnableCommand.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/scala/org/apache/kyuubi/parser/KyuubiParser.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/scala/org/apache/kyuubi/parser/KyuubiParser.scala
Outdated
Show resolved
Hide resolved
| case class ShowSessions() extends RunnableCommand { | ||
|
|
||
| override def run(kyuubiSession: KyuubiSession): Unit = { | ||
| val rows = kyuubiSession.sessionManager.allSessions().map { session => |
There was a problem hiding this comment.
So this cmd lists all sessions, is it safe to do such a thing from an end user? @pan3793
There was a problem hiding this comment.
The REST api has the similar behavior
org.apache.kyuubi.server.api.v1.SessionsResource#sessions
There was a problem hiding this comment.
The REST API is in experimental phase, not declared GA
There was a problem hiding this comment.
List all sessions is an insecure operation, we need to have a mechanism to prevent this insecure situation from happening.
Either the person who does this operation is admin, or only list the sessions that the user has permission to.
It seems that there is no good mechanism now? This command can be set up as a experimental at this phase.
There was a problem hiding this comment.
let's remove this command from this PR
There was a problem hiding this comment.
A describe session command is safe to add I guess
There was a problem hiding this comment.
agree with you, will add desc session command
|
|
||
| object SchemaHelper { | ||
|
|
||
| def toTTTableSchema(schema: List[Column]): TTableSchema = { |
| assert(node4.isInstanceOf[PassThroughNode]) | ||
| } | ||
|
|
||
| test("Parse Show Kyuubi_Session Node") { |
There was a problem hiding this comment.
rename to Show Kyuubi_Sessions Node?
kyuubi-server/src/main/scala/org/apache/kyuubi/sql/plan/KyuubiNode.scala
Outdated
Show resolved
Hide resolved
| Seq( | ||
| Column("user", TTypeId.STRING_TYPE, Some("Kyuubi session user")), | ||
| Column("type", TTypeId.STRING_TYPE, Some("Kyuubi session type")), | ||
| Column("ip", TTypeId.STRING_TYPE, Some("Kyuubi session remote ip"))) |
There was a problem hiding this comment.
Does the id column indicate the session identify?
|
Summary: anyway, we need to introduce Antlr4 framework in server module, and the framework is already in a good shape, the key points here are the grammar and security. Basically, there are two type of styles: command and ANSI-like. Command:
ANSI-like:
each of those are extendable, and all of them lgtm.
It's a good idea, we can do it in another issue, but I don't think it should block this PR. For security concern about |
|
This PR should be blocked by syntax style, Just like the two styles that @pan3793 mentioned, both are fine to me. But have a suggestion:
Considering that Kyuubi is not an sql engine and one of the purposes of the parser module we introduced was sql enhancement, I think ANSIstyle is probably not that important, so I prefer cmd style, any thought? |
| * limitations under the License. | ||
| */ | ||
|
|
||
| lexer grammar KyuubiSqlBaseLexer; |
There was a problem hiding this comment.
cc @ulysses-you, can you verify that this parser will work with trino FE?
There was a problem hiding this comment.
it is not about trino for now, but we can support trino statement one by one by extending antlr later
There was a problem hiding this comment.
how can we tell from a meta operation and regular one for select * from sys.tables ?
There was a problem hiding this comment.
It won't affect other FE, that means other FE will pass though it. Only Trino session will match those special statements. I think user can not access the regular one if use Trino FE follows Trino's behavior.
|
I doubt these can be accomplished by the current implementation which is bound to the BackendService
|
yaooqinn
left a comment
There was a problem hiding this comment.
Anyway, it does not harm anything exists, I'm OK to repair things on a running car
|
Thanks all, merging to master |
|
Thanks you all guys |
|
When adding new deps into binary tgz, don't forget to update LICENSE-binary. |
### _Why are the changes needed?_ This pr is followup of the #3944, it supplement LICENSE-binary ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4163 from Yikf/license. Closes #3926 92d907d [Yikf] [KYUUBI #3926][FOLLOWUP] Supplement LICENSE-binary Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Yikf <yikaifei@apache.org>
Why are the changes needed?
Close #3926, we intend to introduce the parser module based on antlr4 in the Apache kyuubi server. Through this module, we can achieve:
SHOW KYUUBI_SESSIONSthrough SQL.This issue is the first step of the parser module, which introduces the parser module and implements
SHOW KYUUBI_SESSION, and then improves the Apache kyuubi parser module based on this initial pr.How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request