-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-33059] Support transparent compression for file-connector for all file input formats #23443
[FLINK-33059] Support transparent compression for file-connector for all file input formats #23443
Conversation
f52b5a0
to
6283307
Compare
Hi @rmetzger, I saw you authored parts of this code, can you please do a review or point me to another reviewer ? |
R: @xintongsong I see your name in that code history, would you have time to take a look ? |
@tzulitai you offered help for reviewing, don't hesitate to ping me on this when you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added logic makes sense IMO.
Added 2 comments, but in general I would consider separating the whole INFLATER_INPUT_STREAM_FACTORIES
into a new class as there are a couple functions that uses it and seems quite detachable from FlieInputFormats
.
After a quick peek I think something like the following could work:
public class InflaterInputStreamFactories {
public static void register(String fileExt, InflaterInputStreamFactory<?> factory) { ... }
public static InflaterInputStreamFactory<?> get(Path path) { ... }
private static InflaterInputStreamFactory<?> get(String fileExt) { ... }
@VisibleForTesting
public static Set<String> getSupportedCompressionFormats() { ... }
}
Also, ConcurrentHashMap
can be utilized insead of the synchronized
block, but other than that the current logic could be moved as is now.
This probably goes beyond the current PR, but I think it worth to note it. WDYT?
@@ -157,6 +157,10 @@ protected static InflaterInputStreamFactory<?> getInflaterInputStreamFactory( | |||
} | |||
} | |||
|
|||
public static Set<String> getSupportedCompressionFormats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd mark this @VisibleForTesting
, because only tests use this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thx for pointing out
@@ -136,6 +138,26 @@ public static String createTempFileDirExtension( | |||
return f.toURI().toString(); | |||
} | |||
|
|||
public static String createTempTextFileDirForAllCompressionFormats(File tempDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of bringing the specific FileInputFormat
into this general utility, it would be cleaner to pass Set<String> extensions
as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, better to be more generic here, thx !
I agree with the ConcurrentHashMap suggestion. Regarding creating a class just for wrapping a map that is used only in the FileInputFormat it seems overkill to me. An anyway it is indeed outside of the scope of this filesize-fix PR. |
@ferenc-csaky thanks for reviewing this PR ! I have addressed your comments, do I have your LGTM when the tests pass ? |
ec89f4e
to
a0a6a7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, if tests pass! 👍
a0a6a7a
to
6f0b256
Compare
…th based on splittability
…test to all compression formats
…on't use FileInputFormat#createSplits. If input files are compressed, ensure that the size of the split is not the compressed file size and that the compression decorator is called.
6f0b256
to
3ad7e33
Compare
Thank you @echauchot ! |
My pleasure ! Merging |
What is the purpose of the change
Support transparent compression for file-connector for all file input formats.
Brief change log
Verifying this change
FileInputFormatTest#testFileInputFormatWithCompressionFromFileSource
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation