Skip to content

Conversation

@akedin
Copy link
Contributor

@akedin akedin commented May 4, 2018

Support complex types in DDL

Supported syntax:

CREATE TABLE tableName (
  f_array2 ARRAY<INTEGER>,
  f_array3 ARRAY<ARRAY<INTEGER>>,
  f_map MAP<INTEGER, MAP<VARCHAR, VARCHAR>>,
  f_row ROW( f_int1 INTEGER, f_str1 VARCHAR),
  f_row2 ROW< f_int1 INTEGER, f_str1 VARCHAR>
)

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand:
    • What the pull request does
    • Why it does it
    • How it does it
    • Why this approach
  • Each commit in the pull request should have a meaningful subject line and body.
  • Run ./gradlew build to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@akedin
Copy link
Contributor Author

akedin commented May 4, 2018

Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! You made it sound like it was weeks out when we talked earlier. It saddens me how much this moves us away from calcite/server but I can't say it is the wrong decision.

LGTM

public class SqlDdlNodes {
private SqlDdlNodes() {}

/** Creates a CREATE TABLE. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annoys me. I don't like this extra indirection in calcite but I also don't like deviating from calcite unnecessaraly. Not much you can do about that.

@akedin akedin force-pushed the support-complex-types branch 2 times, most recently from 8db89ba to acf5061 Compare May 7, 2018 18:05
@akedin
Copy link
Contributor Author

akedin commented May 8, 2018

run java precommit

@akedin
Copy link
Contributor Author

akedin commented May 8, 2018

thread leak test fails: #5300 BEAM-4088

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after some more tests

+ "name varchar COMMENT 'name', \n"
+ "age int COMMENT 'age', \n"
+ "tags MAP<VARCHAR, VARCHAR>, \n"
+ "nestedMap MAP<INTEGER, MAP<VARCHAR, INTEGER>> \n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test:

  • maps inside arrays
  • arrays inside maps
  • rows inside arrays
  • rows inside maps
  • maps inside rows
  • arrays inside rows
  • nested nested rows

... etc ...

At some point we might want to use QuickCheck/SmallCheck; that is one of the best ways to test valid syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

fieldType = SimpleType()
)
[
collectionTypeName = CollectionTypeName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to support INT ARRAY in that syntax for it? TBH I don't think we need to start with that, even though it is "standard" all the related dialects don't use that syntax.

And I didn't realize JavaCC generated LL(k) parsers. That's unfortunate. So I can see why the grammar has to be factored like this. But what this doesn't support is (INT ARRAY) ARRAY or the no-parens version INT ARRAY ARRAY.

I suppose it is fine to leave as-is, but probably want to file something about the limitation, or tell me that I am wrong. And lots of tests. And also feel free to remove if you don't want the complication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this clause is for the postfix array support. I left it mostly because it was already kinda there, just limited to MULTISETS, not arrays. We don't support either at the moment, and I would rather remove it for now for consistency

"create table person (\n"
+ "id int COMMENT 'id', \n"
+ "name varchar(31) COMMENT 'name') \n"
+ "name varchar COMMENT 'name') \n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I commented on the CLI test, but this is the better place for the parser tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding QuickCheck tests

public List<SqlNode> getOperandList() {
return ImmutableNullableList.of(name, columnList, type, comment, location, tblProperties);
throw new UnsupportedOperationException(
"Getting operands CREATE TABLE is unsupported at the moment");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - where is it used? You can't really do much except copy the node if you don't know what kind of node it is, so just asking for my own education how Calcite (or BeamSQL) uses this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly used by implementations of the SqlCall to access their parameters. SqlCreate itself doesn't really need it but we might (or might not) need something like this for JDBC integration, depending on how CREATE TABLE parsing will be implemented there

akedin added 11 commits May 8, 2018 09:17
…e types

Create Schema.Fields instead of SqlDataTypeSpec, this will allow us to parse any types directly into Schema types, not limiting ourselves to primitive types supported by SqlDataTypeSpec
Add support for declaring fields of arrays of primitive types
Parse Row fields
Column is only used in Table for DDL and similar use cases. It seems better to use Schema in these places instead to avoid going back and forth between them
Support MAP<primitiveTypeKey, typeValue>
Add random arbitrary field type generation with QuickCheck to verify correctness of schema creation by CREATE TABLE
@akedin akedin force-pushed the support-complex-types branch from acf5061 to 8b349e3 Compare May 8, 2018 23:33
@akedin
Copy link
Contributor Author

akedin commented May 8, 2018

Updated:

create table tablename ( fieldName MAP<DOUBLE, ARRAY<ARRAY<ARRAY<MAP<BOOLEAN, MAP<DECIMAL, TINYINT>>>>>> ) TYPE 'text' LOCATION '/home/admin/person'

create table tablename ( fieldName MAP<BOOLEAN, MAP<FLOAT, ARRAY<BOOLEAN>>> ) TYPE 'text' LOCATION '/home/admin/person'

create table tablename ( fieldName ARRAY<ARRAY<FLOAT>> ) TYPE 'text' LOCATION '/home/admin/person'


create table tablename ( fieldName MAP<INTEGER, MAP<INTEGER, MAP<INTEGER, MAP<SMALLINT, ROW<field_0 INTEGER,field_1 TIMESTAMP,field_2 TIMESTAMP,field_3 TIMESTAMP>>>>> ) TYPE 'text' LOCATION '/home/admin/person'


create table tablename ( fieldName MAP<BOOLEAN, ROW<field_0 INTEGER,field_1 DECIMAL,field_2 VARCHAR>> ) TYPE 'text' LOCATION '/home/admin/person'



@akedin
Copy link
Contributor Author

akedin commented May 9, 2018

run java precommit

@akedin
Copy link
Contributor Author

akedin commented May 9, 2018

whoa it's green

@kennknowles
Copy link
Member

Awesome. That's next-level test assurance.

@kennknowles kennknowles merged commit da8fa25 into apache:master May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants