Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-16995][table-common] Add new data structure interfaces in table-common #11651

Merged
merged 5 commits into from
Apr 14, 2020

Conversation

wuchong
Copy link
Member

@wuchong wuchong commented Apr 7, 2020

What is the purpose of the change

This add the new data structure interfaces to table-common which has been discussed in FLIP-95.

The planner and connector refactoring will happen in a separate PR.

Brief change log

New interfaces and JavaDocs for all of them added.

Verifying this change

  • All of them are interfaces except DecimalData and TimestampData, the corresponing tests will be moved to table-common when refactoring planner.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 7, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 2345d03 (Tue Apr 07 03:08:18 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 7, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@JingsongLi
Copy link
Contributor

Thanks @wuchong, yes, we can introduce formats again, but:

  1. We will have two kind of formats in Flink repo in this time.
  2. There are lots of the preparation work is waiting for us to do.
  3. I am not sure these formats can work in planner 100%.

Another way is:

  1. Preparation for decimal, modify decimal structure.
  2. Preparation for string, modify string structure.
  3. Preparation for Array, modify Array structure.
    .....

Just a suggestion.

@wuchong
Copy link
Member Author

wuchong commented Apr 7, 2020

Hi @JingsongLi , I would prefer to refactor all of them together, not one by one. This will reduce the refactor work and review work.

We will have two kind of formats in Flink repo in this time.

We will have a follow up refactor PR soon. This situation will not last long.

I am not sure these formats can work in planner 100%.

This is on purpose to split the interface and refactoring into separete PRs. This PR just focuses on the user-facing APIs and Javadocs, to keep align with FLIP-95 proposal. So more people who care API can join in the reviewing. The following-up PR should focus on making planner 100% work on this new structures.

@wuchong
Copy link
Member Author

wuchong commented Apr 9, 2020

Rebased and added GenericRowData, GenericMapData, GenericArrayData because they are all public APIs proposed in FLIP-95.

@twalthr
Copy link
Contributor

twalthr commented Apr 10, 2020

Thanks for the PR @wuchong. I was writing a lot of JavaDocs in the last days and tried to also further improve them for the data structures. I push a commit to this branch. In particular it tries to:

  • fix typos and grammar mistakes
  • nicely format the JavaDocs
  • cross reference classes for improve the experience for new users
  • explain nullability and immutability everywhere

If you don't like certain changes, feel free to undo them again.

Copy link
Member Author

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twalthr for helping to improve the Javadoc. All of them looks good to me.
I only removed "This data structure is immutable" from Javadoc of StringData.

If you don't have any other concens, I would like to merge this then.

* and {@link CharType} in Flink Table/SQL.
* An internal data structure representing data of {@link CharType} and {@link VarCharType}.
*
* <p>This data structure is immutable.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryStringData is mutable, because it is lazy (de)serialized. The binary is lazily materialized when needed, so it's not thread-safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchong What I meant with immutable is that it does not provide any setters. Because GenericRowData has setters. But I'm fine with removing it here.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wuchong wuchong merged commit 3933b04 into apache:master Apr 14, 2020
@wuchong wuchong deleted the rowdata branch April 14, 2020 14:10
}

// big endian; consistent with BigInteger.toByteArray()
byte[] bytes = new byte[8];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wuchong , this is a wrong implementation here. Can you take https://issues.apache.org/jira/browse/FLINK-16922 and fix that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will do that in FLINK-16996.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants