Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-10365] [FLINK-10366] [s3] Create common bases for File System implementations #6714

Closed
wants to merge 6 commits into from

Conversation

StephanEwen
Copy link
Contributor

What is the purpose of the change

We currently have have three bundled/shaded filesystem connectors that build on top of Hadoop's classes. More will probably come, when we add more bundles file system connector libraries, for example for GCS. Each of them re-builds the shaded Hadoop module, including creating the relocated config, adapting native code loading, etc.

Similarly, there is a lot of code coming for the S3 connectors that will be shared between the Hadoop- and Presto-based implementations.

This PR creates common bases projects for shaded Hadoop and common S3 functionality to be reused.

Brief change log

  • Create the flink-fs-hadoop-shaded module and factors out the shaded Hadoop FS classes from the shaded S3 file systems into that module.
  • Bumps the Hadoop dependency to 3.1 to get access to newer connectors and better/later utilities. Adjusts the shading of the Hadoop configuration.
  • Creates an S3 base module flink-s3-fs-base as the common denominator for the Hadoop- and Presto-based implementations
  • Adjusts the Hadoop-based s3 connector to use the common denominator module
  • Adjust Presto-based S3 adapter to use the common denominator module
  • Consolidates shared classes for S3 in flink-s3-fs-base module
  • Upgrades the build script shading checks to new patterns.

I put each change in a separate commit, for easier reviews.

Verifying this change

The test reworks and upgrades dependencies, it does not change functionality.
The existing integration test cases and end-to-end tests still the existing functionality.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: yes

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Copy link
Contributor

@kl0u kl0u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I already had a look at the branch previously, so as soon as Travis gives the green light, feel free to merge @StephanEwen

@StephanEwen
Copy link
Contributor Author

Manually merged in 9d56e69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants