Skip to content

Conversation

@dongwoo6kim
Copy link
Contributor

What is the purpose of the change

To fix thrift dependency conflict when flink-sql-connector-hive and flink-parquet both exist in the path.

Brief change log

Reshade dependencies that are already shaded in parquet. For naming I referred to this

Verifying this change

This change is about dependency without code change, so I believe it should be covered by existing tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes, shade dependency
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 8, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

<relocations>
<relocation>
<pattern>org.apache.parquet</pattern>
<shadedPattern>org.apache.hive.shaded.parquet</shadedPattern>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit) Shading naming convention looks inconsistent with the one in flink-sql-connector-2.3.9.
How about chaing this to org.apache.flink.hive.shaded.parquet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongwoo6kim Thanks for the pr and sorry for late. I left some comments. PTAL.

<relocations>
<relocation>
<pattern>org.apache.parquet</pattern>
<shadedPattern>org.apache.hive.shaded.parquet</shadedPattern>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

</relocation>
<relocation>
<pattern>shaded.parquet</pattern>
<shadedPattern>org.apache.flink.hive.reshaded.parquet</shadedPattern>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking whether it's a good shade pattern in there. it's same to the shade pattern in connector-hive jar. It may overwride the classes in connector-hive jar.
How about reshading to org.apache.hive.reshaded.parquet like we don in FLINK-23074?

Copy link
Contributor Author

@dongwoo6kim dongwoo6kim Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia Thanks for the reply. I've understood that the pattern should be different from connector-hive jar.
org.apache.hive.reshaded.parquet looks good to me.

However I have one more question about this below part

<shadedPattern>org.apache.flink.hive.shaded.parquet</shadedPattern>

Don't we need to also change flink-sql-connector-hive-2.3.9, to avoid overwriding classes from connector-hive.jar?
This part seems to be changed from this pr, I'm wondering if you are aware of this change made before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't been aware of that before. But from the comment, seems we also should change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia I've applied your review. Thanks for the help, PTAL

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Have you ever checked the modify still work? If so, I'll merge then..

@dongwoo6kim
Copy link
Contributor Author

@luoyuxia I have checked that it still works for current master branch and also release-1.18 , release-1.17.

Query

SET table.sql-dialect = default;

CREATE TEMPORARY TABLE IF NOT EXISTS test_table
(
    id       STRING,
    status   STRING,
    click_id       STRING
) WITH (
      'connector' = 'filesystem',
      'path' = '{hdfs_path}',
      'format' = 'parquet'
      );

SELECT id
FROM test_table
GROUP BY id;

Before fixing this issue

image

After fixing this issue

Can get the result

@luoyuxia
Copy link
Contributor

@dongwoo6kim Thanks for verifying. Merging.... Also, could you please help backport to 1.16 & 1.17 & 1.18. Thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants