-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add CREATE SOURCE TABLE syntax and metadata info #7945
Conversation
public enum Type { | ||
SOURCE, | ||
NORMAL | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to create the Type
inside the CreateTable.Type
enum because that gives better meaning to the table type we want to use (source or normal). I could have added to the parent class, CreateSource.Type
, but that sounds like source is a table or stream type. I'm still not sure whether I put the type in the right place, so feedback is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas:
- Put it under
CreateSource
with a different name, likeownershipType
- Add the enum to
KsqlConstants
inksqldb-common
-CreateTableCommand
won't have to use strings anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is, what is exact purpose of this "type": read/write, materialized/non-materialized?
In the end, we also want to introduce a SOURCE STREAM, so putting it into the parent might make more sense?
@@ -36,7 +37,8 @@ public CreateTableCommand( | |||
@JsonProperty(value = "topicName", required = true) final String topicName, | |||
@JsonProperty(value = "formats", required = true) final Formats formats, | |||
@JsonProperty(value = "windowInfo") final Optional<WindowInfo> windowInfo, | |||
@JsonProperty(value = "orReplace", defaultValue = "false") final Optional<Boolean> orReplace | |||
@JsonProperty(value = "orReplace", defaultValue = "false") final Optional<Boolean> orReplace, | |||
@JsonProperty(value = "type", defaultValue = "NORMAL") final Optional<String> type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a String
instead of the CreateTable.Type
because of dependencies. I would have to add ksql-parser deps to the ksql-execution, which I'm not sure about it. So I just converted the type to a string from the CreateSourceFactory
which calls this class. I then converted back to CreateTable.Type
on the DdlCommandExec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to add ksql-parser deps to the ksql-execution, which I'm not sure about it.
What is the issue? Do we have other ENUM types that we write into the command topic? If yes, we should follow the same strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will use a boolean value instead of a Enum. I had another idea with the types to support types like external
, temporary
, etc in the future, so for compatibility I wanted to do it now, but I don't think that's really necessary. If they're supported, then users could mix the types, so we'll end up adding more parameters. I'll switch with the boolean instead.
c8be139
to
404f3b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of question (mainly for my own education...)
public enum Type { | ||
SOURCE, | ||
NORMAL | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is, what is exact purpose of this "type": read/write, materialized/non-materialized?
In the end, we also want to introduce a SOURCE STREAM, so putting it into the parent might make more sense?
NORMAL | ||
} | ||
|
||
private final Type type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add type
as field here (and not in the parent class), we need to update equals() and hashcode() accordingly. (Note sure why there is only a equals()
but no hashcode()
method...)
@@ -27,15 +27,22 @@ | |||
|
|||
@Immutable | |||
public class CreateTable extends CreateSource implements ExecutableDdlStatement { | |||
public enum Type { | |||
SOURCE, | |||
NORMAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NORMAL is not very descriptive...
@@ -259,7 +259,11 @@ protected Void visitCreateStream(final CreateStream node, final Integer indent) | |||
|
|||
@Override | |||
protected Void visitCreateTable(final CreateTable node, final Integer indent) { | |||
formatCreate(node, "TABLE"); | |||
formatCreate(node, | |||
node.getType() == CreateTable.Type.SOURCE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is SqlFormatter used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by the QueryLogger
and other injector classes that re-formats the string statement when changing something in the statement (i.e. injecting a schema fetched from Schema Registry). I don't think we should re-format string statements, but this is an old class used by some legacy code.
@@ -281,13 +281,18 @@ public Node visitCreateTable(final SqlBaseParser.CreateTableContext context) { | |||
|
|||
final Map<String, Literal> properties = processTableProperties(context.tableProperties()); | |||
|
|||
final CreateTable.Type type = context.SOURCE() != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this be null
?
Also, why is the method called SOURCE()
is all caps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context
is what ANTlr generates based on the SqlBase.g4
. For instance:
CREATE (OR REPLACE)? (SOURCE)? TABLE (IF NOT EXISTS)? sourceName
(tableElements)?
(WITH tableProperties)? #createTable
The SOURCE()
is all caps because that's how it is defined in the .g4
file. It is not null when the SOURCE
keyword is present, null otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not null
, what does it return?
ksqlTopicItems | ||
ksqlTopicItems, | ||
false, | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test for source tables, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This object is not even used anywhere in the test. I'll remove all the unused code. I'll add a test to parse CREATE SOURCE TABLE too.
new CreateTable(SOME_NAME, SOME_ELEMENTS, false, true, OTHER_PROPS, NORMAL) | ||
) | ||
.addEqualityGroup( | ||
new CreateTable(SOME_NAME, SOME_ELEMENTS, false, false, OTHER_PROPS, SOURCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTHER_PROPS
-> SOME_PROPS
?
topic | ||
topic, | ||
false, | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is TemporaryEngine
and would we need to add support for source tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll add it in a follow-up PR. This TemporaryEngine
is used by the ListSourceExecutorTest
, which tests the DESCRIBE and SHOW commands. There's a JIRA to make sure DESCRIBE correctly handles source streams & tables.
topic | ||
topic, | ||
false, | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR add a test for rejected INSERT INTO
for source tables? Should it? Or do we have some other work item for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not add it. Will be in a follow-up.
ksqlTopic | ||
ksqlTopic, | ||
false, | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above?
404f3b3
to
99871d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, with one comment
formatCreate(node, "TABLE"); | ||
formatCreate(node, | ||
node.isSource() | ||
? "SOURCE TABLE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep this method as is, and update formatCreate to add the "SOURCE" keyword if the node is a source. Then we won't have to do the same thing for streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
2e6d0bc
to
195898f
Compare
195898f
to
140f281
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reviews. All LGTM, I'm just wondering if we want to keep the isSource
as a boolean or keep it as a enum (of course for not it would only have two possible values, source, or derived).
@@ -142,10 +143,12 @@ public CreateTableCommand createTableCommand(final KsqlStructuredDataOutputNode | |||
outputNode.getKsqlTopic().getKafkaTopicName(), | |||
Formats.from(outputNode.getKsqlTopic()), | |||
outputNode.getKsqlTopic().getKeyFormat().getWindowInfo(), | |||
Optional.of(outputNode.getOrReplace()) | |||
Optional.of(outputNode.getOrReplace()), | |||
Optional.of(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about define another isDerived
here, rather than using false
to indicate is not source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter name is tricky. I initially wanted to use a Enum for future purposes like External, Temp, Source, etc types, but I found that those other types can be mixed with source too, like external source, temp source, or something. So I ended up using a boolean. IsDerived makes sense too. I think any of isDerived or isSource could work.
Description
This is part of #7474 which adds support of source tables and streams.
This PR just adds new syntax to create source tables using
CREATE SOURCE TABLE
statements. It also marks those tables internally for future reference, such asCreateTable
,CreateTableCommand
andKsqlTable
.The
KsqlTable
is the object saved in the metastore. This object contains two new variables,readOnly
andmaterialized
. ThereadOnly
just specifies the table is read-only; andmaterialized
specifies this table is materialized (or should be materialized). I split the SOURCE keyword into these two variables because they mean different things. Also, we might support source streams in the future which they are read-only objects, but not materialized.Testing done
Describe the testing strategy. Unit and integration tests are expected for any behavior changes.
Reviewer checklist