New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-25583] Support compacting small files for FileSink. #18680
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 2ff898d (Wed Feb 09 08:53:09 UTC 2022) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
90f5341
to
e50fb69
Compare
The branch is force pushed to add a hotfix before all commits, to fix the bug in SinkTransformationTranslator that sets parallelism of transformations added in the per-commit topology without checking if the parallelism is set. |
...mon/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/BucketWriter.java
Outdated
Show resolved
Hide resolved
...che/flink/streaming/api/functions/sink/filesystem/OutputStreamBasedCompactingFileWriter.java
Outdated
Show resolved
Hide resolved
...ctors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java
Outdated
Show resolved
Hide resolved
...ctors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java
Outdated
Show resolved
Hide resolved
...ctors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/connector/file/sink/compactor/operator/CompactorOperator.java
Show resolved
Hide resolved
...ctors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java
Show resolved
Hide resolved
...src/main/java/org/apache/flink/connector/file/sink/compactor/operator/CompactorOperator.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/connector/file/sink/compactor/operator/CompactorOperator.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/connector/file/sink/compactor/operator/CompactorOperator.java
Outdated
Show resolved
Hide resolved
e50fb69
to
d3ecea6
Compare
@gaoyunhaii |
...src/main/java/org/apache/flink/connector/file/sink/compactor/operator/CompactorOperator.java
Outdated
Show resolved
Hide resolved
...rg/apache/flink/streaming/api/functions/sink/filesystem/OutputStreamBasedPartFileWriter.java
Outdated
Show resolved
Hide resolved
...r-files/src/main/java/org/apache/flink/connector/file/sink/compactor/DecoderBasedReader.java
Outdated
Show resolved
Hide resolved
...rg/apache/flink/streaming/api/functions/sink/filesystem/OutputStreamBasedPartFileWriter.java
Outdated
Show resolved
Hide resolved
...-files/src/main/java/org/apache/flink/connector/file/sink/compactor/ConcatFileCompactor.java
Show resolved
Hide resolved
...-files/src/main/java/org/apache/flink/connector/file/sink/compactor/FileCompactStrategy.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/flink/connector/file/sink/compactor/OutputStreamBasedFileCompactor.java
Outdated
Show resolved
Hide resolved
1327927
to
de9803e
Compare
All comments so far are resolved now. |
304fc4e
to
cb3873d
Compare
Thanks @pltbkd for the update! I try to fixed some tests. Also, could you have a look that if the two |
…rallelism is set when translating Sink.
2ceef38
to
1e885d3
Compare
Cleaning up of the state is added when the remaining state is drained in the handlers. Nothing special is necessary while snapshotting the state. I suppose we don't need to implement a customized |
1e885d3
to
2ffd741
Compare
@flinkbot run azure |
…implement in implementations of InProgressFileWriter.
…ods in PendingFileRecoverable.
…Cleanup in FileSinkCommittable, delete compactedFileToCleanup in FileCommitter.
2ffd741
to
4841282
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge after ci get greeen.
@flinkbot run azure |
…ileSink. This closes apache#18680.
…ileSink. This closes apache#18680.
What is the purpose of the change
This pull request makes the FileSink to support compacting small files, based on the new extended sink API.
Brief change log
Verifying this change
This change added some unit tests.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes)Documentation