New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-29486][sql-client] Implement ClientParser for implementing remote SQL client later #20931
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I left some comments.
...le/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/parser/ClientParser.java
Outdated
Show resolved
Hide resolved
if (firstToken.kind == IDENTIFIER) { | ||
// unrecognized token | ||
return getPotentialCommandType(firstToken.image); | ||
} else if (firstToken.kind == EXPLAIN) { | ||
return Optional.of(StatementType.EXPLAIN); | ||
} else if (firstToken.kind == SHOW) { | ||
return getPotentialShowCreateType(tokenList); | ||
} else { | ||
return Optional.of(StatementType.OTHER); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the statement is not ends with ;, it is not a statement. But it seems we will always return OTHER?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get a completed SQL statement is the function of the SqlMultiLineParser
. So I think we can make sure that statement here should be ended with ';'. And now I think all the test data should be ended with ';'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking the implementation of SqlMultiLineParser
, here when the statement is incomplete, an SqlExecutionException should be thrown. I added the codes.
package org.apache.flink.table.client.cli.parser; | ||
|
||
/** Enumerates the possible types of input statements. */ | ||
public enum StatementType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about BEGIN STATEMENT SET/END?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added these two types and corresponding tests.
flink-table/flink-sql-client/pom.xml
Outdated
<dependency> | ||
<groupId>org.apache.flink</groupId> | ||
<artifactId>flink-sql-parser</artifactId> | ||
<version>${project.version}</version> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the shade plugin below. We can do as table-planner module append the parser class into the final jar. But I think we don't need all classes in the parser jar and we can filter out if it is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new configuration to shade plugin. I pushed a new commit to see if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new configuration to shade plugin. I pushed a new commit to see if it works.
It works.
@Internal | ||
interface SqlCommandParser { | ||
public interface SqlCommandParser extends FlinkSqlParserImplConstants { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this interface extends FlinkSqlParserImplConstants
? I think ClientParser extends this is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
...le/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/parser/ClientParser.java
Outdated
Show resolved
Hide resolved
do { | ||
token = tokenManager.getNextToken(); | ||
tokenList.add(token); | ||
} while (token.endColumn != trimmedStatement.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can condition can be token != null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking the codes of getNextToken
, I found that when the input stream is over, it won't return null
but a EOF
token. So I think use condition token.kind != EOF
may be better?
} | ||
|
||
// --------------------------------------------------------------------------------------------- | ||
private Optional<StatementType> getStatementType(List<Token> tokenList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use itearor model here? We can reduce loop twice to once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think use iterator model here won't save time, but it will make the codes more complicated. I think an ArrayList
here is just OK.
|
||
private static List<Tuple2<String, Optional<StatementType>>> generateTestData() { | ||
return Arrays.asList( | ||
Tuple2.of("quit;", QUIT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a TestSpec rather than Tuple2 here. TestSpec has better semantic. BTW, when user inputs quit
, the terminal should contine reading from the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
Tuple2.of("SHOW CREATE TABLE(what_ever);", SHOW_CREATE), | ||
Tuple2.of("SHOW CREATE VIEW (what_ever)", SHOW_CREATE), | ||
Tuple2.of("SHOW CREATE syntax_error;", OTHER), | ||
Tuple2.of("--SHOW CREATE TABLE ignore_comment", EMPTY), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test SHOW TABLES -- comment ;
and muli-line cases:
SHOW\n
create\t TABLE `tbl`;
Take a look at presto test cases TestStatementSplitter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some tests.
4fa645b
to
933902b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update. I left some comments.
/** A dumb implementation. TODO: remove this after unifying the SqlMultiLineParser. */ | ||
@Override | ||
public Optional<Operation> parseCommand(String command) { | ||
return Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be
parseStatement(statement);
return Optional.empty();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method won't be called, just override the interface's method.
return Optional.empty(); | ||
} | ||
|
||
public Optional<StatementType> parseStatement(@Nonnull String statement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to add @Nonnull
annoation. By default, we assume the parameter is not null and we only add @Nullable
if it is nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
@MethodSource("generateTestData") | ||
public void testParseStatement(TestSpec testData) { | ||
Optional<StatementType> type = clientParser.parseStatement(testData.statement); | ||
assertThat(type).isEqualTo(testData.type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it's better to use
assertThat(type.orElse(null)).isEqualTo(testData.type);
We can also remove SuppressWarnings above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
} | ||
|
||
@ParameterizedTest | ||
@ValueSource(strings = {"", "\n", " ", "-- comment;", "SHOW TABLES -- comment;", "SHOW TABLES"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a case EXPLAIN STATEMENT SET BEGIN INSERT INTO StreamingTable SELECT * FROM (VALUES (1, 'Hello World'), (2, 'Hi'), (2, 'Hi'), (3, 'Hello'), (3, 'World'), (4, 'ADD'), (5, 'LINE'));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
private static List<TestSpec> generateTestData() { | ||
return Arrays.asList( | ||
TestSpec.of(";", null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we stilll need null
? What's the difference between OTHER
and null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ';' is very special. if we just typed a ';' , the terminal should ignore this input, and the null
return will finally become Optional.empty() and notify the terminal to ignore the return; An OTHER
means the parser cannot recognize the statement's type and will submit it to cluster. And if we want to notify the terminal that the input is incomplete we throw an exception.
|
||
if (tokens.size() == 0 || tokens.get(tokens.size() - 1).kind != SEMICOLON) { | ||
// throw this to notify the terminal to continue reading input | ||
throw new SqlExecutionException("", new SqlParserEOFException("")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add meaningful exception msg here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
/** | ||
* ClientParser use {@link FlinkSqlParserImplTokenManager} to do lexical analysis. It cannot | ||
* recognize special hive keywords yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive has a slightly different vocabulary compared to Flink vocabulary, which causes the ClientParser
will misunderstand Hive's keywords to IDENTIFIER. But the ClientParser
is only responsible to check whether the statement is completed or not and only cares about a few statements. So it's acceptable to tolerate the inaccuracy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more description.
|
||
/** Enumerates the possible types of input statements. */ | ||
public enum StatementType { | ||
QUIT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's better we can add detailed msg for every types...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethink the design, I think the ClientParser also requires to return the trimmed sql(without semicolon) to the executor.
new FlinkSqlParserImplTokenManager( | ||
new SimpleCharStream(new StringReader(statement))); | ||
// means to switch to "BACK QUOTED IDENTIFIER" state to support '`xxx`' in Flink SQL | ||
tokenManager.SwitchTo(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment
please visit {@link CalciteParser}#createFlinkParser for more details.
} | ||
|
||
/** Used to load generated data. */ | ||
private static class TestSpec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add toString method
continueReadInput(); | ||
} | ||
|
||
Token head = currentToken, tail = tokenManager.getNextToken(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
head && tail is a little confusing to me. I think it's better to use current
and next
tokenManager = | ||
new FlinkSqlParserImplTokenManager( | ||
new SimpleCharStream(new StringReader(statement))); | ||
// means to switch to "BACK QUOTED IDENTIFIER" state to support '`xxx`' in Flink SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace xxx to <IDENTIFIER>
: StatementType.OTHER); | ||
} else if (firstToken.kind == BEGIN) { | ||
return Optional.of( | ||
tokens.nextTokenMatched(STATEMENT) && tokens.nextTokenMatched(SET) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the nextTokenMatched may mislead others. It also move forward the pointer to the next token. Why we don't introduce lookAhead here like we do in the ParserImpl.ftl?
Token head = currentToken, tail = tokenManager.getNextToken(); | ||
|
||
// case 2, 3 and 4 | ||
boolean setNotEnded = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explain
also supports executing statement set with explain details. I think we need to rethink the behavior here.
The failed test is caused by the FLINK-29405 |
@flinkbot run azure |
…ote SQL client later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update. LGTM.
What is the purpose of the change
The current
SqlCommandParserImpl
for SQL client use the paser of embeddedTableEnvironment
, which cannot be fetched in remote mode. So this PR introduce a new parser.Brief change log
SqlCommandParser
for convenience:FlinkSqlParserImplConstants
to use defined token kindsparseStatement
method.ClientParser
and add corresponding test.Verifying this change
This change added test:
ClientParserTest
.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation