[FLINK-14254][table] Introduce FileSystemOutputFormat for batch #9864

JingsongLi · 2019-10-09T08:42:28Z

What is the purpose of the change

Introduce FileSystemOutputFormat to support all table file system connector with partition support in batch mode.

Brief change log

FileSystemOutputFormat use PartitionWriter to write:

DynamicPartitionWriter: Dynamic partition writer to writing multiple partitions at the same time, it maybe consumes more memory.
GroupedPartitionWriter: for grouped dynamic partition inserting. It will create a new format when partition changed.
NonPartitionWriter: for non-partition-aware writer. It just use one format to write in a transaction.

FileSystemOutputFormat use FileCommitter to commit temporary files.

PartitionWriters and Committer support transaction, this is for streaming checkpoint support. For batch, it will just single transaction to start and end.

Verifying this change

Add FileSystemOutputFormatTest.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? JavaDocs

flinkbot · 2019-10-09T08:44:34Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 7e10d61 (Wed Dec 04 15:13:39 UTC 2019)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2019-10-09T09:04:30Z

CI report:

e66114b : FAILURE Build
b788776 : FAILURE Build
19ff8f1 : SUCCESS Build
98c8356 : FAILURE Build
501f1c3 : SUCCESS Build
ae434f5 : SUCCESS Build
2ab2896 : CANCELED Build
ebcdd08 : CANCELED Build
a818c7d : SUCCESS Build
9084a7e : FAILURE Build
74a1052 : CANCELED Build
22f5647 : UNKNOWN
5b35025 : SUCCESS Build
1a42be3 : CANCELED Build
15d9f43 : SUCCESS Build
3005408 : SUCCESS Build
bc77f17 : FAILURE Build
0ea64d4 : FAILURE Build
8258014 : SUCCESS Build
fe07a8e : SUCCESS Build
7e10d61 : SUCCESS Build
259c438 : UNKNOWN

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build

...-table-common/src/main/java/org/apache/flink/table/sink/filesystem/RowPartitionComputer.java

...able-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemOutputFormat.java

...e/flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/FileCommitter.java

...ble-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemFileCommitter.java

...able-common/src/main/java/org/apache/flink/table/sink/filesystem/FileSystemOutputFormat.java

...flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/PartitionWriter.java

...e/flink-table-common/src/main/java/org/apache/flink/table/sink/filesystem/FileCommitter.java

docete · 2019-10-25T08:52:53Z

LGTM. Thanks for your PR @JingsongLi .

JingsongLi · 2019-10-25T10:16:31Z

@KurtYoung Can you take a look when you're free?

JingsongLi · 2019-10-30T08:00:30Z

ping @KurtYoung ~

KurtYoung · 2019-10-31T07:16:01Z

why putting this to flink-table-common?

KurtYoung · 2019-10-31T07:18:45Z

It also doesn't feel right when you introducing 10+ new classes but only test one of them

JingsongLi · 2019-10-31T07:30:02Z

why putting this to flink-table-common?

You mean we can put it to flink-connector-filesystem?
1.Now class in flink-connector-filesystem is BucketingSink, its all classed are deprecated.
2.If we add to filesystem module, filesystem format like csv, parquet, and hive need add the filesystem dependent.
I just have above slight concerns, but I am OK to move classed to flink-connector-filesystem too.

JingsongLi · 2019-10-31T07:31:26Z

It also doesn't feel right when you introducing 10+ new classes but only test one of them

I have added some test in my local, I want to add tests after making sure the main logical is no such dispute. I will add tests ASAP.

KurtYoung · 2019-10-31T08:10:31Z

We can try to find more appropriate modules for these but flink-table-common is definitely not one of them.

JingsongLi · 2019-10-31T11:51:56Z

We can try to find more appropriate modules for these but flink-table-common is definitely not one of them.

I have moved to blink planner to let them be internal implementation.

JingsongLi · 2019-11-01T08:11:42Z

It also doesn't feel right when you introducing 10+ new classes but only test one of them

Hi @KurtYoung , Added tests.

KurtYoung

I reviewed some of the abstractions you brought up and left some comments

...ink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java

...flink-table-planner-blink/src/main/java/org/apache/flink/table/filesystem/FileCommitter.java

KurtYoung · 2019-11-04T03:16:17Z

BTW, this could move to flink-table-runtime-blink?

...ink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionWriter.java

...k-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionComputer.java

KurtYoung · 2019-11-04T08:02:52Z

...flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileCommitter.java

+	/**
+	 * Path generator to generate new path to write and prepare task temporary directory.
+	 */
+	final class PathGenerator {


No need to be an inner class?

Now we expose it to writers, it must be public.

I mean why it's a class inside FileCommitter? The class name seems not tight to FileCommitter IMO.

I will let it be an independent interface. FileCommitter can be interface too.

...flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PathGenerator.java

...e/flink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/ContextImpl.java

...-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/NonPartitionWriter.java

...le-runtime-blink/src/main/java/org/apache/flink/table/filesystem/GroupedPartitionWriter.java

...-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/PartitionPathMaker.java

JingsongLi · 2019-11-05T09:44:26Z

Hi @KurtYoung , I refactored the codes, and integrated hive to FileSystemOutputFormat. Hope you can take a look.
CC: @lirui-apache

JingsongLi · 2019-11-06T09:54:45Z

Split responsibilities of FileSystemCommitter, added comments to each role.

lirui-apache · 2019-11-07T11:33:04Z

...ink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveFileSystemFactory.java

+/**
+ * Hive {@link FileSystemFactory}, hive need use job conf to create file system.
+ */
+public class HiveFileSystemFactory implements FileSystemFactory {


I suggest rename this to HadoopFileSystemFactory since there's nothing specific about Hive here.

lirui-apache · 2019-11-07T11:38:12Z

...nk-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/MetaStoreFactory.java

+ * to remote, so we should not create too frequently.
+ */
+@Internal
+public interface MetaStoreFactory extends Serializable {


Rename to TableMetaStoreFactory?

lirui-apache · 2019-11-07T11:44:33Z

...nk-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/MetaStoreFactory.java

+	/**
+	 * Create a {@link TableMetaStore}.
+	 */
+	TableMetaStore createTableMetaStore() throws Exception;


A TableMetaStore should be created for a specific table. So I think it's more natural if this API accepts a table path -- DB name and table name.

TableMetaStoreFactory already specify DB name and table name, we don't want to let invoker to get DB name and table name every time and every where, this is meaningless, and where factory exists is just for a single table.

lirui-apache · 2019-11-07T12:00:22Z

...link-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveMetaStoreFactory.java

+		public Optional<Path> getPartition(
+				LinkedHashMap<String, String> partSpec) throws Exception {
+			try {
+				return Optional.of(new Path(client.getPartition(


What happens if table is not partitioned?

Will invoke PartitionLoader.loadNonPartition, never reach here.

lirui-apache · 2019-11-07T12:01:09Z

...link-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveMetaStoreFactory.java

+			Partition partition = HiveTableUtil.createHivePartition(database, tableName,
+					new ArrayList<>(partSpec.values()), newSd, new HashMap<>());
+			partition.setValues(new ArrayList<>(partSpec.values()));
+			client.add_partition(partition);


Don't we have to handle cases when table is not partitioned or the partition already exists?

These two interface are just wrap client.getPartition and client.add_partition, should not exist other logical.

KurtYoung · 2019-11-06T01:00:16Z

...k-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemFactory.java

+ * A factory to create file systems.
+ */
+@Internal
+public interface FileSystemFactory extends Serializable {


why not use org.apache.flink.core.fs.FileSystemFactory?

I don't like to introduce getScheme in FileSystemFactory. And it is not serializable.

KurtYoung · 2019-11-06T01:11:00Z

...ink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemUtils.java

+/**
+ * Utils for file system.
+ */
+public class FileSystemUtils {


More like PartitionPathUtils to me

KurtYoung · 2019-11-06T01:14:00Z

...ink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemUtils.java

+	 * @return An escaped path name.
+	 */
+	private static String escapePathName(String path) {
+


useless blank line

KurtYoung · 2019-11-06T01:14:08Z

...ink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemUtils.java

+	 */
+	private static String escapePathName(String path) {
+
+		// __DEFAULT_NULL__ is the system default value for null and empty string.


what's DEFAULT_NULL?

Wrong comment, I will remove it.

KurtYoung · 2019-11-06T01:15:31Z

...ink-table-runtime-blink/src/main/java/org/apache/flink/table/filesystem/FileSystemUtils.java

+	 * @param partitionSpec The partition spec.
+	 * @return An escaped, valid partition name.
+	 */
+	public static String generatePartName(LinkedHashMap<String, String> partitionSpec) {


generatePartName -> generatePartitionPath?

KurtYoung · 2019-11-08T02:42:13Z