-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-14254][table] Introduce FileSystemOutputFormat for batch #9864
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 7e10d61 (Wed Dec 04 15:13:39 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
CI report:
Bot commandsThe @flinkbot bot supports the following commands:
|
...-table-common/src/main/java/org/apache/flink/table/sink/filesystem/RowPartitionComputer.java
Outdated
Show resolved
Hide resolved
...able-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemOutputFormat.java
Outdated
Show resolved
Hide resolved
...e/flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
...e/flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
...e/flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
...ble-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemFileCommitter.java
Outdated
Show resolved
Hide resolved
...ble-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemFileCommitter.java
Outdated
Show resolved
Hide resolved
...able-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemOutputFormat.java
Outdated
Show resolved
Hide resolved
...flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/PartitionWriter.java
Outdated
Show resolved
Hide resolved
...e/flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
LGTM. Thanks for your PR @JingsongLi . |
@KurtYoung Can you take a look when you're free? |
ping @KurtYoung ~ |
why putting this to |
It also doesn't feel right when you introducing 10+ new classes but only test one of them |
You mean we can put it to |
I have added some test in my local, I want to add tests after making sure the main logical is no such dispute. I will add tests ASAP. |
We can try to find more appropriate modules for these but |
I have moved to blink planner to let them be internal implementation. |
Hi @KurtYoung , Added tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed some of the abstractions you brought up and left some comments
...ink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java
Outdated
Show resolved
Hide resolved
...ink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java
Outdated
Show resolved
Hide resolved
...ink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java
Outdated
Show resolved
Hide resolved
...ink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java
Outdated
Show resolved
Hide resolved
...flink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
...flink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
...flink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/FileCommitter.java
Outdated
Show resolved
Hide resolved
BTW, this could move to |
...ink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java
Outdated
Show resolved
Hide resolved
...k-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionComputer.java
Outdated
Show resolved
Hide resolved
/** | ||
* Path generator to generate new path to write and prepare task temporary directory. | ||
*/ | ||
final class PathGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to be an inner class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we expose it to writers, it must be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean why it's a class inside FileCommitter? The class name seems not tight to FileCommitter IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let it be an independent interface. FileCommitter
can be interface too.
...flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PathGenerator.java
Outdated
Show resolved
Hide resolved
...e/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/ContextImpl.java
Outdated
Show resolved
Hide resolved
...-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/NonPartitionWriter.java
Outdated
Show resolved
Hide resolved
...-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/NonPartitionWriter.java
Outdated
Show resolved
Hide resolved
...le-runtime-blink/src/main/java/org/apache/flink/table/filesystem/GroupedPartitionWriter.java
Outdated
Show resolved
Hide resolved
...-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionPathMaker.java
Outdated
Show resolved
Hide resolved
...-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionPathMaker.java
Outdated
Show resolved
Hide resolved
Hi @KurtYoung , I refactored the codes, and integrated hive to |
2c09900
to
bc77f17
Compare
Split responsibilities of |
/** | ||
* Hive {@link FileSystemFactory}, hive need use job conf to create file system. | ||
*/ | ||
public class HiveFileSystemFactory implements FileSystemFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest rename this to HadoopFileSystemFactory
since there's nothing specific about Hive here.
* to remote, so we should not create too frequently. | ||
*/ | ||
@Internal | ||
public interface MetaStoreFactory extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to TableMetaStoreFactory
?
/** | ||
* Create a {@link TableMetaStore}. | ||
*/ | ||
TableMetaStore createTableMetaStore() throws Exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A TableMetaStore
should be created for a specific table. So I think it's more natural if this API accepts a table path -- DB name and table name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableMetaStoreFactory
already specify DB name and table name, we don't want to let invoker to get DB name and table name every time and every where, this is meaningless, and where factory exists is just for a single table.
public Optional<Path> getPartition( | ||
LinkedHashMap<String, String> partSpec) throws Exception { | ||
try { | ||
return Optional.of(new Path(client.getPartition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if table is not partitioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will invoke PartitionLoader.loadNonPartition
, never reach here.
Partition partition = HiveTableUtil.createHivePartition(database, tableName, | ||
new ArrayList<>(partSpec.values()), newSd, new HashMap<>()); | ||
partition.setValues(new ArrayList<>(partSpec.values())); | ||
client.add_partition(partition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have to handle cases when table is not partitioned or the partition already exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two interface are just wrap client.getPartition
and client.add_partition
, should not exist other logical.
* A factory to create file systems. | ||
*/ | ||
@Internal | ||
public interface FileSystemFactory extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use org.apache.flink.core.fs.FileSystemFactory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like to introduce getScheme
in FileSystemFactory
. And it is not serializable.
/** | ||
* Utils for file system. | ||
*/ | ||
public class FileSystemUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like PartitionPathUtils
to me
* @return An escaped path name. | ||
*/ | ||
private static String escapePathName(String path) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless blank line
*/ | ||
private static String escapePathName(String path) { | ||
|
||
// __DEFAULT_NULL__ is the system default value for null and empty string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's DEFAULT_NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong comment, I will remove it.
* @param partitionSpec The partition spec. | ||
* @return An escaped, valid partition name. | ||
*/ | ||
public static String generatePartName(LinkedHashMap<String, String> partitionSpec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generatePartName -> generatePartitionPath?
* eg: /tmp/cp-1/task-0/p0=1/p1=2/fileName. | ||
*/ | ||
@Internal | ||
public class TempFileManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to PartitionTempFileManager
?
/** | ||
* Generate a new path with directories. | ||
*/ | ||
public Path generateTempFile(String... directories) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createPartitionDir
will be more accurate, change parameter name to partitions
TASK_DIR_PREFIX + taskNumber); | ||
} | ||
|
||
public Path getTaskTemporaryPath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to expose this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for committer to clean task temporary dir, but I think we can move this clean
to FileManager
.
|
||
private String newFileName() { | ||
return String.format( | ||
checkpointName(checkpointId) + "-" + taskName(taskNumber) + "-file-%d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why duplicating these info? We already have cpId and taskId in parent path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, these files will be moved to final directory, so with these information, we can reduce name conflicts.
} | ||
|
||
private static long getCheckpointId(String fileName) { | ||
return Long.parseLong(fileName.substring(3, fileName.length())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fileName.length() is useless
|
||
private transient boolean inited; | ||
|
||
private transient HiveShim hiveShim; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is just refactoring code. But HiveShim is now a Serializable. So actually we can simply hold a HiveShim instance and don't need the hiveVersion
field.
new HiveMetaStoreFactory(jobConf, hiveVersion, dbName, tableName)); | ||
builder.setOverwrite(overwrite); | ||
builder.setStaticPartitions(staticPartitionSpec); | ||
builder.setTmpPath(new org.apache.flink.core.fs.Path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring the caller to specify a temp path seems strange to me. IMO caller of the API should only care about what the final path should be and not how temp paths are generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note temp file is come from table location, FileSystemOutputFormat
have no idea to how to generate temp file. Actually, FileSystemOutputFormat
just need know temp dir, it don't need know final path.
Now the generation of the temp directory is uncertain. You can see below hive codes have todo. I don't want bring this thing to FileSystemOutputFormat
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If temp path has to come from table location, e.g. as a sub-dir of table location, then the builder should ask for the table location and generate the temp path it needs. Otherwise, we should clearly define what kind of path we need in the builder contract. It's not good practice to define an API that takes an arbitrary path and implicitly rely on callers to pass something of a specific structure.
@@ -178,27 +171,6 @@ public void setStaticPartition(Map<String, String> partitionSpec) { | |||
} | |||
} | |||
|
|||
private void validatePartitionSpec() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planner already verified it. It should be verify by framework.
@Override | ||
public void finalizeGlobal(int parallelism) throws IOException { | ||
try { | ||
committer.commitUpToCheckpoint(CHECKPOINT_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, will this be invoked each time a checkpoint is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, finalizeGlobal
is only called in JM once when a batch job finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,
For batch mode, it is invoked only called in JM once.
For streaming mode, it is invoked by CheckpointListener.notifyCheckpointComplete
. you can take a look to StreamingFileSink.notifyCheckpointComplete
.
* 2.{@link #loadNonPartition}: just rename all files to final output path. | ||
*/ | ||
@Internal | ||
public class FileSystemLoader implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to FileSystemPartitionLoader or just PartitionLoader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not just partition loader, it will load files without partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I am OK to PartitionLoader
.
* @param <T> The type of the consumed records. | ||
*/ | ||
@Internal | ||
public class NonPartitionWriter<T> implements PartitionWriter<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-consider the naming. It's confusing that a NonPartitionWriter
is a PartitionWriter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change name to SingleDirectoryWriter
bc77f17
to
0ea64d4
Compare
@KurtYoung @lirui-apache Hope you take a look again. |
0ea64d4
to
8258014
Compare
Thanks @JingsongLi for the update. I suppose the purpose of introducing these abstractions is to support writing partitions to different external systems other than Hive. Can we have a summary about what a user/developer needs to implement in order to achieve that? |
Hi @lirui-apache , there is no plan to support other external systems at present or in the foreseeable future.
And there is a summary, actually just need implement:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JingsongLi for the explanation. LGTM.
I will have a final pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM, I only have some structure related comments.
return this; | ||
} | ||
|
||
public Builder<T> setPartitionComputer(PartitionComputer<T> computer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never used function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we can have BaseRowPartitionComputer
.
Object field = in.getField(index); | ||
String partitionValue = field != null ? field.toString() : null; | ||
if (partitionValue == null || "".equals(partitionValue)) { | ||
partitionValue = defaultPartName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultPartName
looks like is actually defaultPartValue
to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from hive world, but I can change it to defaultPartValue
.
if (computer == null) { | ||
if (conversionClass == Row.class) { | ||
//noinspection unchecked | ||
computer = (PartitionComputer<T>) new RowPartitionComputer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just letting HiveTableSink
set RowPartitionComputer
? It looks quite hack here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's use setPartitionComputer
.
PartitionTempFileManager manager, | ||
PartitionComputer<T> computer) throws Exception { | ||
this.computer = computer; | ||
this.format = context.createNewOutputFormat(manager.getStaticPartSpecs().size() == 0 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can find a way to pass in the static partition specs directly without relying on the PartitionTempFileManager
* Util for get a {@link PartitionWriterFactory}. | ||
*/ | ||
static <T> PartitionWriterFactory<T> get( | ||
boolean dynamicPartition, boolean grouped) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide partition columns and static partition specs could make us pass necessary information to corresponding writers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, static partition specs is enough.
private final int taskNumber; | ||
private final long checkpointId; | ||
private final Path taskTmpDir; | ||
private final LinkedHashMap<String, String> staticParts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is not necessary
@Override | ||
public void open(int taskNumber, int numTasks) throws IOException { | ||
try { | ||
PartitionTempFileManager fileManager = committer.createTempFileManager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class can directly create PartitionTempFileManager
, no need to go through committer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's completely separate task related things from committer.
Thanks @KurtYoung for your review, updated. |
…or batch This closes apache#9864
What is the purpose of the change
Introduce FileSystemOutputFormat to support all table file system connector with partition support in batch mode.
Brief change log
FileSystemOutputFormat use PartitionWriter to write:
FileSystemOutputFormat use FileCommitter to commit temporary files.
PartitionWriters and Committer support transaction, this is for streaming checkpoint support. For batch, it will just single transaction to start and end.
Verifying this change
Add FileSystemOutputFormatTest.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation