Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-26712][table-planner] Metadata keys should not conflict with physical columns #19236

Closed
wants to merge 2 commits into from

Conversation

twalthr
Copy link
Contributor

@twalthr twalthr commented Mar 24, 2022

What is the purpose of the change

This reduces the likelihood for name collisions between metadata columns and physical columns. It might break some connector implementations that used SupportsReadable/WritingMetadata and name-based column arithmetics. Since this might only be done by very advanced connector implementations (such as our FileSystem connector) and we don't recommend using name-based column arithmetics but index-based ones. This change should not affeect many users and should be done asap as it changes the JSON plan.

Brief change log

  • Prefix metadata columns with $metadata$

Verifying this change

This change added tests and can be verified as follows: TableScanTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? JavaDocs

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 24, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@@ -99,6 +99,25 @@ class TableScanTest extends TableTestBase {
util.verifyExecPlan("SELECT * FROM MetadataTable")
}

@Test
def testDDLWithMetadataThatConflictsWithPhysicalColumn(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the same test be done for the sink table as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @twalthr . Left a few comments.

@@ -71,6 +71,9 @@
@Internal
public final class DynamicSourceUtils {

// Ensures that physical and metadata columns don't collide.
public static final String METADATA_COLUMN_PREFIX = "$metadata$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, but maybe we can define this constant in a common place for both source and sink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the source and sink stack are completely separate, that's why we also have separate util classes

IntStream.range(0, metadata.size())
.mapToObj(
i -> {
// Access metadata columns from behind because the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from behind -> backwards, or in reverse ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked it up and chose from the back

@@ -99,6 +99,25 @@ class TableScanTest extends TableTestBase {
util.verifyExecPlan("SELECT * FROM MetadataTable")
}

@Test
def testDDLWithMetadataThatConflictsWithPhysicalColumn(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @twalthr !

@twalthr twalthr closed this in 0097b5a Apr 4, 2022
twalthr added a commit to twalthr/flink that referenced this pull request Apr 4, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
twalthr added a commit that referenced this pull request Apr 4, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes #19236.
JasonLeeCoding pushed a commit to JasonLeeCoding/flink that referenced this pull request May 27, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
czy006 pushed a commit to czy006/flink that referenced this pull request Jun 30, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
czy006 pushed a commit to czy006/flink that referenced this pull request Jun 30, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
zstraw pushed a commit to zstraw/flink that referenced this pull request Jul 4, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
czy006 pushed a commit to czy006/flink that referenced this pull request Jul 7, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
czy006 pushed a commit to czy006/flink that referenced this pull request Jul 11, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
czy006 pushed a commit to czy006/flink that referenced this pull request Jul 15, 2022
…hysical columns

This reduces the likelihood for name collisions between metadata columns and physical
columns. It might break some connector implementations that used SupportsReadable/WritingMetadata
and name-based column arithmetics. We don't recommend using name-based column arithmetics but
index-based ones.

This closes apache#19236.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants