Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-29486][sql-client] Implement ClientParser for implementing remote SQL client later #20931

Merged
merged 12 commits into from Oct 12, 2022

Conversation

yuzelin
Copy link
Contributor

@yuzelin yuzelin commented Sep 30, 2022

What is the purpose of the change

The current SqlCommandParserImpl for SQL client use the paser of embedded TableEnvironment, which cannot be fetched in remote mode. So this PR introduce a new parser.

Brief change log

  • Adjust the package structure for parsers.
  • Modify SqlCommandParser for convenience:
    • extends FlinkSqlParserImplConstants to use defined token kinds
    • add parseStatement method.
  • implement ClientParser and add corresponding test.

Verifying this change

This change added test: ClientParserTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 30, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. I left some comments.

Comment on lines 66 to 155
if (firstToken.kind == IDENTIFIER) {
// unrecognized token
return getPotentialCommandType(firstToken.image);
} else if (firstToken.kind == EXPLAIN) {
return Optional.of(StatementType.EXPLAIN);
} else if (firstToken.kind == SHOW) {
return getPotentialShowCreateType(tokenList);
} else {
return Optional.of(StatementType.OTHER);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the statement is not ends with ;, it is not a statement. But it seems we will always return OTHER?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get a completed SQL statement is the function of the SqlMultiLineParser. So I think we can make sure that statement here should be ended with ';'. And now I think all the test data should be ended with ';'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the implementation of SqlMultiLineParser, here when the statement is incomplete, an SqlExecutionException should be thrown. I added the codes.

package org.apache.flink.table.client.cli.parser;

/** Enumerates the possible types of input statements. */
public enum StatementType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about BEGIN STATEMENT SET/END?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these two types and corresponding tests.

Comment on lines 514 to 518
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-parser</artifactId>
<version>${project.version}</version>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the shade plugin below. We can do as table-planner module append the parser class into the final jar. But I think we don't need all classes in the parser jar and we can filter out if it is not needed.

Copy link
Contributor Author

@yuzelin yuzelin Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new configuration to shade plugin. I pushed a new commit to see if it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new configuration to shade plugin. I pushed a new commit to see if it works.

It works.

@Internal
interface SqlCommandParser {
public interface SqlCommandParser extends FlinkSqlParserImplConstants {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this interface extends FlinkSqlParserImplConstants? I think ClientParser extends this is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

do {
token = tokenManager.getNextToken();
tokenList.add(token);
} while (token.endColumn != trimmedStatement.length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can condition can be token != null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the codes of getNextToken, I found that when the input stream is over, it won't return null but a EOF token. So I think use condition token.kind != EOF may be better?

}

// ---------------------------------------------------------------------------------------------
private Optional<StatementType> getStatementType(List<Token> tokenList) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use itearor model here? We can reduce loop twice to once

Copy link
Contributor Author

@yuzelin yuzelin Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use iterator model here won't save time, but it will make the codes more complicated. I think an ArrayList here is just OK.


private static List<Tuple2<String, Optional<StatementType>>> generateTestData() {
return Arrays.asList(
Tuple2.of("quit;", QUIT),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a TestSpec rather than Tuple2 here. TestSpec has better semantic. BTW, when user inputs quit, the terminal should contine reading from the input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

Tuple2.of("SHOW CREATE TABLE(what_ever);", SHOW_CREATE),
Tuple2.of("SHOW CREATE VIEW (what_ever)", SHOW_CREATE),
Tuple2.of("SHOW CREATE syntax_error;", OTHER),
Tuple2.of("--SHOW CREATE TABLE ignore_comment", EMPTY),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test SHOW TABLES -- comment ; and muli-line cases:

SHOW\n
create\t TABLE `tbl`;

Take a look at presto test cases TestStatementSplitter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests.

@yuzelin yuzelin force-pushed the ClientParser branch 2 times, most recently from 4fa645b to 933902b Compare October 3, 2022 11:18
Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your update. I left some comments.

/** A dumb implementation. TODO: remove this after unifying the SqlMultiLineParser. */
@Override
public Optional<Operation> parseCommand(String command) {
return Optional.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

parseStatement(statement);
return Optional.empty();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method won't be called, just override the interface's method.

return Optional.empty();
}

public Optional<StatementType> parseStatement(@Nonnull String statement)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to add @Nonnull annoation. By default, we assume the parameter is not null and we only add @Nullable if it is nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

@MethodSource("generateTestData")
public void testParseStatement(TestSpec testData) {
Optional<StatementType> type = clientParser.parseStatement(testData.statement);
assertThat(type).isEqualTo(testData.type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's better to use

assertThat(type.orElse(null)).isEqualTo(testData.type);

We can also remove SuppressWarnings above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

}

@ParameterizedTest
@ValueSource(strings = {"", "\n", " ", "-- comment;", "SHOW TABLES -- comment;", "SHOW TABLES"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a case EXPLAIN STATEMENT SET BEGIN INSERT INTO StreamingTable SELECT * FROM (VALUES (1, 'Hello World'), (2, 'Hi'), (2, 'Hi'), (3, 'Hello'), (3, 'World'), (4, 'ADD'), (5, 'LINE'));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


private static List<TestSpec> generateTestData() {
return Arrays.asList(
TestSpec.of(";", null),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we stilll need null? What's the difference between OTHER and null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ';' is very special. if we just typed a ';' , the terminal should ignore this input, and the null return will finally become Optional.empty() and notify the terminal to ignore the return; An OTHER means the parser cannot recognize the statement's type and will submit it to cluster. And if we want to notify the terminal that the input is incomplete we throw an exception.


if (tokens.size() == 0 || tokens.get(tokens.size() - 1).kind != SEMICOLON) {
// throw this to notify the terminal to continue reading input
throw new SqlExecutionException("", new SqlParserEOFException(""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add meaningful exception msg here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


/**
* ClientParser use {@link FlinkSqlParserImplTokenManager} to do lexical analysis. It cannot
* recognize special hive keywords yet.
Copy link
Member

@fsk119 fsk119 Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive has a slightly different vocabulary compared to Flink vocabulary, which causes the ClientParser will misunderstand Hive's keywords to IDENTIFIER. But the ClientParser is only responsible to check whether the statement is completed or not and only cares about a few statements. So it's acceptable to tolerate the inaccuracy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more description.


/** Enumerates the possible types of input statements. */
public enum StatementType {
QUIT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's better we can add detailed msg for every types...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethink the design, I think the ClientParser also requires to return the trimmed sql(without semicolon) to the executor.

new FlinkSqlParserImplTokenManager(
new SimpleCharStream(new StringReader(statement)));
// means to switch to "BACK QUOTED IDENTIFIER" state to support '`xxx`' in Flink SQL
tokenManager.SwitchTo(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment

please visit {@link CalciteParser}#createFlinkParser for more details.

}

/** Used to load generated data. */
private static class TestSpec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add toString method

continueReadInput();
}

Token head = currentToken, tail = tokenManager.getNextToken();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head && tail is a little confusing to me. I think it's better to use current and next

tokenManager =
new FlinkSqlParserImplTokenManager(
new SimpleCharStream(new StringReader(statement)));
// means to switch to "BACK QUOTED IDENTIFIER" state to support '`xxx`' in Flink SQL
Copy link
Member

@fsk119 fsk119 Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace xxx to <IDENTIFIER>

: StatementType.OTHER);
} else if (firstToken.kind == BEGIN) {
return Optional.of(
tokens.nextTokenMatched(STATEMENT) && tokens.nextTokenMatched(SET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nextTokenMatched may mislead others. It also move forward the pointer to the next token. Why we don't introduce lookAhead here like we do in the ParserImpl.ftl?

Token head = currentToken, tail = tokenManager.getNextToken();

// case 2, 3 and 4
boolean setNotEnded =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explain also supports executing statement set with explain details. I think we need to rethink the behavior here.

@fsk119
Copy link
Member

fsk119 commented Oct 10, 2022

The failed test is caused by the FLINK-29405

@fsk119
Copy link
Member

fsk119 commented Oct 10, 2022

@flinkbot run azure

2 similar comments
@fsk119
Copy link
Member

fsk119 commented Oct 11, 2022

@flinkbot run azure

@yuzelin
Copy link
Contributor Author

yuzelin commented Oct 11, 2022

@flinkbot run azure

Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your update. LGTM.

@fsk119 fsk119 merged commit 205f70c into apache:master Oct 12, 2022
@yuzelin yuzelin deleted the ClientParser branch October 13, 2022 01:53
huangxiaofeng10047 pushed a commit to huangxiaofeng10047/flink that referenced this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants