-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-12393][table-common] Add the user-facing classes of the new type system #8360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
* | ||
* @see DataTypes for a list of supported data types | ||
*/ | ||
public static final class ElementDataType extends DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectionDataType?
It is wired that the ElementDataType
contains an element type named elementDataType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is on purpose as MapType
is also a collection data type. The naming should not reflect the SQL type families but just serve as a container name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I agree with @wuchong on this. BTW e.g. java.util.Map
is not a java.util.Collection
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only concern to me is the class name ElementDataType
and the member field elementDataType
is confusing.
In JDK, Collection
is a group of elements, and Map
is not a collection. IMO, ARRAY
and MULTISET
are collections of elements with same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Java is also a bit confusing in this sense as methods such as java.util.Collections#emptyMap()
are also put under collections. I will correct that. Also in org.apache.flink.table.types.logical.LogicalTypeRoot#MAP
. Thanks for the feedback.
* | ||
* @return a new, reconfigured data type instance | ||
*/ | ||
public abstract DataType andNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about rename andNull()
--> asNullable()
and notNull()
--> asNotNull()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The goal was to be close to SQL:
INT NOT NULL
->INT().notNull()
. notNull
andandNull
are shorter thanasNullable
andasNotNullable
.
Alternatively, what about withNull()
or withoutNull
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the notNull
as is. How about andNull
-> nullable()
? I guess this method will be rarely used anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to be close to SQL.
In SQL, a nullable field is declared like name INT NULL
. But it's not good choice in API INT().null()
.
What about notNull()
and nullable()
? It's easy to understand and close to SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm also ok with notNull()
and nullable()
. Any third opinion? @dawidwys
* Resolution in days. The precision is the number of digits of days. It must have a value | ||
* between 1 and 6 (both inclusive). If no precision is specified, it is equal to 2 by default. | ||
*/ | ||
public static Resolution DAY(int precision) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about to provide a DAY()
method which uses 2 precision as default? The same as YEAR and SECOND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @twalthr
Looks really good. I had few comments, mostly to the tests infrastructure. I think we can significantly improve the readability of the tests.
* | ||
* @return a new, reconfigured data type instance | ||
*/ | ||
public abstract DataType andNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave the notNull
as is. How about andNull
-> nullable()
? I guess this method will be rarely used anyways.
/** | ||
* Adds a hint that data should be represented using the given class when entering or leaving | ||
* the table ecosystem. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a note that not all classes are supported? Sth like:
<p>The supported conversion classes are limited to {@link LogicalType#supportsInputConversion(Class)} or {@link LogicalType#supportsOutputConversion(Class)}.
* | ||
* @see DataTypes for a list of supported data types | ||
*/ | ||
public static final class AtomicDataType extends DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we have those as top level classes? It's easier to read short files.
* | ||
* @see DataTypes for a list of supported data types | ||
*/ | ||
public static final class ElementDataType extends DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I agree with @wuchong on this. BTW e.g. java.util.Map
is not a java.util.Collection
;)
assertEquals(java.sql.Timestamp.class, dataType.getConversionClass()); | ||
testLogicalType(new TimestampType(false, 9), dataType); | ||
|
||
testLogicalType(new TimestampType(true, 9), dataType.andNull()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we replace this method with:
@Test
public void testDataType() {
...
assertThat(dataType.andNull(), hasLogicalType(new TimestampType(true, 9)));
...
}
private static Matcher<DataType> hasLogicalType(LogicalType logicalType) {
return new FeatureMatcher<DataType, LogicalType>(
CoreMatchers.equalTo(logicalType),
"logical type of the data type",
"logical type") {
@Override
protected LogicalType featureValueOf(DataType actual) {
return actual.getLogicalType();
}
};
}
This makes it more explicit what is actually tested without looking into the method. It also produces nicer description in case of error.
|
||
testLogicalType(new TimestampType(true, 9), dataType.andNull()); | ||
|
||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also write a simple Matcher
for this as it will be reused for all DataTypes
|
||
@Test | ||
public void testAtomicDataTypes() { | ||
testLogicalType(new CharType(2), DataTypes.CHAR(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rework those testLogicalType
assertions into a single Parameterized
test? This would give a clear message which type actually failed.
This would also ensure all assertions are always run. Right now the tests will fail on a first error.
/** | ||
* Tests for {@link DataType}, its subclasses and {@link DataTypes}. | ||
*/ | ||
public class DataTypesTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add tests that check that default conversion classes are used if none provided?
BTW I think if we rework the test infrastructure into parameterized one with sort of TestSpec
I think it will be much easier to read and test all cases.
The TestSpec
could look sth like this:
test(dataType)
.hasLogicalType(...)
.hasConversionClassI(...)
And then the method:
@Test
public testDataType(TestSpec spec) {
assertThat(spec.getDataType(), hasLogicalType(spec.getDataType));
spec.getExpectedConversionClass().ifPresent(convClass -> {
asserThat(spec.getDataType(), hasConversionClass(convClass));
}
}
|
||
testLogicalType( | ||
new DayTimeIntervalType(DayTimeResolution.MINUTE_TO_SECOND, DayTimeIntervalType.DEFAULT_DAY_PRECISION, 2), | ||
DataTypes.INTERVAL(DataTypes.MINUTE(), DataTypes.SECOND(2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Not sure about that, but maybe we could static import the DataTypes
? It makes it easier to read.
* | ||
* @see CharType | ||
*/ | ||
public static DataType.AtomicDataType CHAR(int n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return just the DataType
rather than more concrete subclasses? I think we should try to expose as little internal structure as possible. This would give us more flexibility over the API, and it would still be quite easy to expose more if needed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. First I was a bit skeptical because it would require casting in util methods but we can also provide a visitor.
public FieldsDataType( | ||
LogicalType logicalType, | ||
@Nullable Class<?> conversionClass, | ||
Map<String, DataType> fieldDataTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit strange to use Map<String, DataType>
to represent fields, because RowType is sequential, is it better to use array to express fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. At least we should use LinkedHashMap to preserve the fields order. And it would be nice to provide getFieldNames()
and getFieldTypes()
for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field order is defined by the logical type. The map adds just metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would allow a LinkedHashMap
we would also need to verify the order with the logical type. The logical type should be the single source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm just not sure what the getFieldDataTypes
will be used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method will only used by us to figure out which format the user requested. It will be used in the next PR around FLINK-12254.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. Tests look beautiful right now!
Thanks everyone for the feedback. I will merge this now. |
…pe system Introduces the DataType class and subclasses. Users are able to do a star import of DataTypes and declare types like: `MULTISET(MULTISET(INT())`. Close to SQL. As mentioned in FLIP-37, data types allow to specify format hints to the planner using `TIMESTAMP(9).bridgedTo(java.sql.Timestamp)`. This closes apache#8360.
What is the purpose of the change
This PR introduces the
DataType
class and subclasses. Users are able to do a star import ofDataTypes
and declare types like:MULTISET(MULTISET(INT())
. Very close to SQL. As mentioned in FLIP-37, data types allow to specify format hints to the planner usingTIMESTAMP(9).bridgedTo(java.sql.Timestamp.class)
.Brief change log
DataType
structureDataTypes
enumeration of typesVerifying this change
See
org.apache.flink.table.types.DataTypesTest
.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation