Skip to content

Conversation

@apilloud
Copy link
Member

@apilloud apilloud commented Apr 25, 2018

This PR moves our TableStore into the Calcite Schema as the only way to provide tables which removes the need to copy tables between the two. It also moves our DDL execution into the rel node, which allows calcite to execute the DDL directly.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run ./gradlew build to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@akedin
Copy link
Contributor

akedin commented Apr 25, 2018

LGTM

@apilloud
Copy link
Member Author

R: @xumingmin @xumingming This is a trivial change, but it does change the public interface around BeamQueryPlanner.

@apilloud apilloud force-pushed the cleanup branch 2 times, most recently from f7c251f to 2b0476e Compare May 3, 2018 16:24
@apilloud apilloud changed the title [BEAM-4044] [SQL] Cleanout unneeded sqlEnv [BEAM-4044] [SQL] Add tables via TableStore in Schema, execute DDL in Calcite model May 3, 2018
@apilloud
Copy link
Member Author

apilloud commented May 3, 2018

R: @kennknowles This gets us 90% of the way to sqlline.

Copy link
Contributor

@akedin akedin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mostly nits.
Are there any new tests needed?

final boolean existed;
switch (getKind()) {
case DROP_TABLE:
case DROP_MATERIALIZED_VIEW:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support views? If we don't have concrete plans to support them i'd rather remove all related code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question for the future. For pure SQL REPL use, we probably would want something to name queries for reuse. Does calcite manage these for us, and only delegates materialized views?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted. Calcite does a lot of things for us when we get out of the way, views are probably one of them.

inputPCollections.getPipeline().apply("left", leftRelNode.toPTransform());
PCollection<Row> rightRows =
inputPCollections.apply("right", rightRelNode.toPTransform());
inputPCollections.getPipeline().apply("right", rightRelNode.toPTransform());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the right thing to access the pipeline directly. Who knows what's there? Does it have a source so that it can produce elements? With PCollections I would have at least an expectation that there should be elements in it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Does this actually work? If so, is there a different data path whereby the left and right collections are passed as input? This invocation will not register them as inputs to the transform.

/**
* A {@code BeamSqlTableProvider} provides read only set of {@code BeamSqlTable}.
*/
public class BeamSqlTableProvider implements TableProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it @AutoValue+Builder?

.type(getTableType())
.name(table.getKey())
.columns(Collections.emptyList())
.build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would rewrite it this way:

tables
  .values()
  .stream()
  .map(sqlTable ->
        Table
            .builder()
            .type(getTableType())
            .name(sqlTable.getKey())
            .columns(Collections.emptyList())
            .build())
  .collect(toList());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would write this exactly how I did. I find for loops to be far easier to read then java streams.

for (Map.Entry<TupleTag<?>, PValue> input : inputs.expand().entrySet()) {
tables.put(input.getKey().getId(),
new BeamPCollectionTable(toRows(input.getValue())));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would avoid stateful if/else with loops with generics, hurts readability. Might consider extracting something like this:

if (input instanceof PCollection) {
  return 
     ImmuableMap.of(
        PCOLLECTION_NAME, 
        new BeamPCollectionTable(toRows(inputs)))
}

return
    inputs
        .expand()
        .entrySet()
        .stream()
        .collect(
            toMap(
                keyedPCollection -> keyedPCollection.getKey().getId(),
                keyedPCollection -> keyedPCollection.getValue()))

and then create BeamSqlTableProvider outside

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, here (or maybe it is nearby beneath a fold) seems like a good place to (possibly redundantly) explain that a PCollection makes a single magic table while any other kind of input uses expand() to make many tables using the tags as names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is hard to read, I disagree that java streams makes it more readable. I've restructured it like you've suggested otherwise and added the comment.


@Override
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return CalciteUtils.toCalciteRowType(this.beamTable.getSchema(), BeamQueryPlanner.TYPE_FACTORY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create this in constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

handleDropTable((SqlDropTable) sqlNode);
if (sqlNode instanceof SqlExecutableStatement) {
((SqlExecutableStatement) sqlNode).execute(env.getContext());
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment what is executable statement, what is not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added: DDL nodes are SqlExecutableStatement

if (table.getName().equals(name)) {
return new BeamCalciteTable(tableProvider.buildBeamSqlTable(table));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like it would be better to convert this to a Map<String, BeamCalciteTable> once in constructor, this way you wouldn't need to implement map.keySet() or map.get()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this comment is on the line with tableProvider.listTables() not }? If so, the output of that function can not be cached. I do however agree that it makes sense to change the return value of that API to Map<String, Table>. Simplifies a lot of code all over, so I'll do that.

private class ContextImpl implements CalcitePrepare.Context {
@Override
public JavaTypeFactory getTypeFactory() {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be wrong to return BeamQueryPlanner.TYPE_FACTORY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, changed.


public InMemoryMetaStore() {
@Override public String getTableType() {
return "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should have its own table type still

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added type of store.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. And I have said similar on a bunch of lines where the new stuff is more readable. Good cleanup along to way to this addition.

PCollectionTuple inputTuple = toPCollectionTuple(input);

BeamSqlEnv sqlEnv = new BeamSqlEnv();
BeamSqlEnv sqlEnv = new BeamSqlEnv(toTableProvider(input));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

options.setJobName("BeamPlanCreator");
Pipeline pipeline = Pipeline.create(options);
compilePipeline(sqlString, pipeline, env);
env.getPlanner().compileBeamPipeline(sqlString, pipeline);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

} else if (sqlNode instanceof SqlDropTable) {
handleDropTable((SqlDropTable) sqlNode);
if (sqlNode instanceof SqlExecutableStatement) {
((SqlExecutableStatement) sqlNode).execute(env.getContext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

for (Table table : tables) {
env.registerTable(table.getName(), metaStore.buildBeamSqlTable(table.getName()));
}
this.env = new BeamSqlEnv(metaStore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

for (Map.Entry<TupleTag<?>, PValue> input : inputs.expand().entrySet()) {
tables.put(input.getKey().getId(),
new BeamPCollectionTable(toRows(input.getValue())));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, here (or maybe it is nearby beneath a fold) seems like a good place to (possibly redundantly) explain that a PCollection makes a single magic table while any other kind of input uses expand() to make many tables using the tags as names.


@Override
public boolean isMutable() {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious - why? Is it that the underlying TableProvider is mutable? Or does this simply mean that the DDL is allowed to introduce new tables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the DDL is allowed to introduce new tables, but I don't know of anywhere it is actually checked or set to anything but true in calcite.

final boolean existed;
switch (getKind()) {
case DROP_TABLE:
case DROP_MATERIALIZED_VIEW:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question for the future. For pure SQL REPL use, we probably would want something to name queries for reuse. Does calcite manage these for us, and only delegates materialized views?

inputPCollections.getPipeline().apply("left", leftRelNode.toPTransform());
PCollection<Row> rightRows =
inputPCollections.apply("right", rightRelNode.toPTransform());
inputPCollections.getPipeline().apply("right", rightRelNode.toPTransform());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Does this actually work? If so, is there a different data path whereby the left and right collections are passed as input? This invocation will not register them as inputs to the transform.

PCollectionTuple inputPCollections) {
PCollection<Row> factStream = inputPCollections.apply(leftRelNode.toPTransform());
PInput inputPCollections) {
PCollection<Row> factStream = inputPCollections.getPipeline().apply(leftRelNode.toPTransform());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPipeline().apply() is probably not what you want here, either. It is actually a bad method - it is the same as getPipeline().begin().apply() so it always starts a new initial pipeline segment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current model is wrong, the new model is wrong. I've dropped this change and just added another PCollectionTuple.empty(input.getPipeline()). We can discuss the right way to do it as a followup.

public PCollection<Row> buildBeamPipeline(PInput inputPCollections) {
PCollection<Row> leftRows =
inputPCollections.apply(
inputPCollections.getPipeline().apply(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here, and throughout

@apilloud
Copy link
Member Author

apilloud commented May 4, 2018

run java precommit

@apilloud
Copy link
Member Author

apilloud commented May 4, 2018

Got a Beam 1 Disconnected failure.

@kennknowles
Copy link
Member

run java precommit

@apilloud
Copy link
Member Author

apilloud commented May 7, 2018

run java precommit

2 similar comments
@apilloud
Copy link
Member Author

apilloud commented May 7, 2018

run java precommit

@apilloud
Copy link
Member Author

apilloud commented May 7, 2018

run java precommit

@kennknowles kennknowles merged commit 378ca94 into apache:master May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants