Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-12253][table-common] Setup a class hierarchy for the new logical type system #8335

Closed
wants to merge 27 commits into from

Conversation

twalthr
Copy link
Contributor

@twalthr twalthr commented May 2, 2019

What is the purpose of the change

This PR implements all logical types that Flink's type system should support in the near future. As the name indicates, these type classes are purely logical to allow declarations and intentions across modules and in the API. Planners/runtime can still decide which type and precision is supported physically.

An exact listing can be found in FLIP-37: https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit#

The most important content of this PR are the JavaDoc comments that clearly define each type and should avoid ambiguity. Catalogs/connectors/planners should adapt to those definitions.

Brief change log

  • 27 logical types added

Verifying this change

  • org.apache.flink.table.types.LogicalTypesTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

twalthr added 27 commits May 2, 2019 16:00
@flinkbot
Copy link
Collaborator

flinkbot commented May 2, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ✅ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ✅ 4. The change fits into the overall [architecture].
  • ✅ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments for: a6a8f2a & 143887c. In general +1

private static void testAll(
LogicalType nullableType,
String typeString,
Class[] input,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input -> supportedInputClasses
output -> supportedOutputClasses

or a Javadoc?

ANY(
LogicalTypeFamily.EXTENSION);

private final Set<LogicalTypeFamily> families;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use EnumSet here?


private static final int DEFAULT_LENGTH = 1;

private static final String DEFAULT_FORMAT = "CHAR(%d)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DEFAULT_FORMAT -> FORMAT?

*
* @return a deep copy
*/
public LogicalType copy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this method final with a comment to implement LogicalType copy(boolean isNullable) instead?

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for d2fd35f


private static final int DEFAULT_LENGTH = 1;

private static final String DEFAULT_FORMAT = "VARCHAR(%d)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it a DEFAULT_format or just a FORMAT can we change the FORMAT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the type. E.g. ROW and ARRAY could be represented in different ways. But I will correct it to FORMAT while merging.

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment for 2bfc47d

testAll(
new BooleanType(),
"BOOLEAN",
new Class[]{Boolean.class},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be new Class[]{Boolean.class, boolean.class}?

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for 063398d

@dawidwys
Copy link
Contributor

dawidwys commented May 3, 2019

@flinkbot approve-until quality

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for c0652a4

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for db0e8cb

if (scale < MIN_SCALE || scale > precision) {
throw new ValidationException(
String.format(
"Decimal scale must be between %d and %d (both inclusive).",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Decimal scale must be between %d and %d (both inclusive)." -> "Decimal scale must be between %d and %d(precision) (both inclusive)."

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment for 52229e1

testAll(
new TinyIntType(),
"TINYINT",
new Class[]{Byte.class},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Class[]{Byte.class, byte.class} ?

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for a6fe3ff

testAll(
new SmallIntType(),
"SMALLINT",
new Class[]{Short.class},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Class[]{Short.class, short.class}?

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for 908aa48

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twalthr Good job! It really looks concise. I also really enjoyed reading all the javadocs!

I put some minor comments, but generally +1 on all the changes.

import java.util.Set;

/**
* Logical type of a time WITHOUT timezone consisting of {@code hour:minute:second[.fractional]} with up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can Time have a timezone in general? Doesn't timezone imply there has to be date component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL standard also defines a TIME WITH TIME ZONE. I will explicitly mention that we won't support that. Java does also not provide a time class for such a type.

* Internal timestamp kind for time attribute metadata.
*/
@Internal
public enum TimestampKind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we extract this and use it for all Timestamp types?

* 9 (both inclusive). If no precision is specified, {@code p} is equal to 6.
*
* <p>Compared to {@link ZonedTimestampType}, the time zone offset information is not stored physically
* in ever datum. Instead, the type assumes {@link java.time.Instant} semantics in UTC time zone at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ever -> every

* interval of months of 50 leads to {@code +04-02}).
*
* <p>The serialized string representation is {@code INTERVAL YEAR(p)}, {@code INTERVAL YEAR(p) TO MONTH},
* or {@code INTERVAL YEAR(p)} where {@code p} is the number of digits of years (=year precision).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INTERVAL YEAR(p) -> INTERVAL MONTH

* the following resolutions: interval of years, interval of years to months, or interval of months.
*
* <p>An interval of year-month consists of {@code +years-months} with values ranging from {@code -9999-11}
* to {@code +9999-11}. The value is the same for all resolutions of this group (for example, an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this example...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

An interval of year-month consists of {@code +years-months} with values ranging from {@code -9999-11}
 * to {@code +9999-11}. A value representation is looks the same for all resolutions. For example, an
 * interval of months of 50 is always represented in an interval of year to month format: {@code +04-02}.

@@ -285,6 +286,27 @@ public void testDayTimeIntervalType() {
);
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add a check or two for different supported array conversions?
E.g. to verify that new ArrayType(new ArrayType(new TimestampType())) does not support java.sql.Timestamp[].class.


/**
* Logical type of a user-defined distinct type. A distinct type specifies an identifier and is backed
* by a source type. A distinct types has the same internal representation as a source type, but is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types -> type

* Logical type of an arbitrary serialized type. This type is a black box within the table ecosystem
* and is only deserialized at the edges. The any type is an extension to the SQL standard.
*
* <p>The serialized string representation is {@code ANY(c, s)} where {@code s} is the originating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{@code s} -> {@code c}

} catch (Exception e) {
throw new TableException(String.format(
"Unable to generate a string representation of the serializer snapshot of '%s' " +
"describing the class '%s' for the any type.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for the any type -> for the ANY type

testAll(
new AnyType<>(Human.class, new KryoSerializer<>(Human.class, new ExecutionConfig())),
"ANY(org.apache.flink.table.types.LogicalTypesTest$Human, " +
"ADNvcmcuYXBhY2hlLmZsaW5rLnRhYmxlLnR5cGVzLkxvZ2ljYWxUeXBlc1Rlc3QkSHVtYW4AAATyxpo9cAA" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not hardcode the snapshot? Changes to the KryoSerializer will break this test.

Copy link
Contributor

@dawidwys dawidwys May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget this comment. It should work. We should make sure anyway that we support old versions of internal serializer snapshots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that hardcoding is ok in this case to indicate changes in the serialization format.

@twalthr
Copy link
Contributor Author

twalthr commented May 6, 2019

Thanks for the review @dawidwys. I merged all commits separately. I will close this PR now.

@twalthr twalthr closed this May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants