[IOTDB-560] add TSRecordOutputFormat to write TsFile via Flink DataSet/DataStream API. by WeiZhong94 · Pull Request #1084 · apache/iotdb

WeiZhong94 · 2020-04-21T13:07:22Z

TsFile is a columnar storage file format in Apache IoTDB. It is designed for time series data and supports efficient compression and query and is easy to be integrated into big data processing frameworks.

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams and becoming more and more popular in IOT scenes. So, it would be great to integrate IoTDB and Flink.

This pull request adds a TSRecordOutputFormat to support write TsFiles on flink via DataStream/DataSet API.

More detail can be found in discussion thread [1], or in the IoTDB wiki [2]

[1]https://lists.apache.org/thread.html/r6dd6afe4e8e4ca42e3ddfbc80609597788f90b214e7a81788c3b51b3%40%3Cdev.iotdb.apache.org%3E

[2]https://cwiki.apache.org/confluence/display/IOTDB/%5BImprovement+Proposal%5D+Add+Flink+Connector+Support+for+TsFile

sunjincheng121 · 2020-04-24T01:19:34Z

Thanks for the PR @WeiZhong94 !
Sorry, I'm busy with the Flink forward SF meeting. I'll review this PR as soon as possible next week！
BTW：Could you please have look at this PR @qiaojialin @jixuan1989 :)

sunjincheng121 · 2020-04-25T11:16:04Z

@WeiZhong94 Could you please rebase the PR. and I would like to have look at it then.

sunjincheng121

Thanks for the PR @WeiZhong94 !
Overall is pretty good! I only left a few comments in the PR and one as follows:
Though the "OutputFormat" can be used in DataStream API, it would be better if we support writing TsFile via "StreamingFileSink", which is integrated with the checkpointing mechanism to provide exactly once semantics. Of course we could do this in follow up PR.

What do you think?

sunjincheng121 · 2020-04-25T11:34:56Z

example/flink/src/main/java/org/apache/iotdb/flink/FlinkTsFileBatchSink.java

+import java.util.stream.Collectors;
+
+/**
+ * The example of writing TsFile via Flink DataSet API.


writing TsFile -》 writing to TsFile ?

sunjincheng121 · 2020-04-25T11:36:38Z

example/flink/src/main/java/org/apache/iotdb/flink/FlinkTsFileStreamSink.java

+import java.util.stream.Collectors;
+
+/**
+ * The example of writing TsFile via Flink DataStream API.


writing TsFile -》 writing to TsFile ?

sunjincheng121 · 2020-04-25T11:38:48Z

flink-tsfile-connector/README.md

 }
 ```

+### TSRecordOutputFormat Example


Example of TSRecordOutputFormat ?

sunjincheng121 · 2020-04-25T11:42:19Z

flink-tsfile-connector/src/main/java/org/apache/iotdb/flink/tsfile/TSRecordConverter.java

+import java.io.IOException;
+import java.io.Serializable;
+
+public interface TSRecordConverter<T> extends Serializable {


Would be better to add JDK Doc?

sunjincheng121 · 2020-04-25T11:43:49Z

flink-tsfile-connector/src/main/java/org/apache/iotdb/flink/tsfile/TSRecordConverter.java

+
+	void open(Schema schema) throws IOException;
+
+	void covertAndCollect(T input, Collector<TSRecord> collector) throws IOException;


Add JDK Doc? Add semantic description of this method。

sunjincheng121 · 2020-04-25T11:55:49Z

flink-tsfile-connector/src/main/java/org/apache/iotdb/flink/tsfile/TSRecordConverter.java

+
+	void open(Schema schema) throws IOException;
+
+	void covertAndCollect(T input, Collector<TSRecord> collector) throws IOException;


Regarding the method name covertAndCollect ,I think it again, It is not pretty clear for the semantic. I think in in TSRecordConverter the main goal of covertAndCollect is covert the T to TSRecord. So, I would like to change the name from covertAndCollect to convert which make the semantic more clearly. What do you think?

BTW： typo covert -> convert

sunjincheng121 · 2020-04-25T11:57:05Z

flink-tsfile-connector/src/main/java/org/apache/iotdb/flink/tsfile/TSRecordOutputFormat.java

+	public TSRecordConverter<T> getConverter() {
+		return converter;
+	}
+}


Please add an empty row.

sunjincheng121 · 2020-04-25T12:01:55Z

flink-tsfile-connector/src/main/java/org/apache/iotdb/flink/tsfile/TsFileOutputFormat.java

+import java.net.URISyntaxException;
+import java.util.Optional;
+
+public abstract class TsFileOutputFormat<T> extends FileOutputFormat<T> {


Add JDK Doc

qiaojialin · 2020-04-27T02:14:16Z

flink-tsfile-connector/README.md

+	Types.FLOAT,
+	Types.INT,
+	Types.INT,
+	Types.FLOAT,
+	Types.INT,
+	Types.INT


Hi, should these be LONG?

qiaojialin · 2020-04-27T06:04:17Z

flink-tsfile-connector/src/main/java/org/apache/iotdb/flink/tsfile/RowTSRecordConverter.java

+			DataPoint templateDataPoint = templateRecord.dataPointList.get(dataPointIndexMapping[i]);
+			Object o = input.getField(i);
+			if (o != null) {
+				Class typeClass = o.getClass();


templateDataPoint.type could be used to switch the data type.

WeiZhong94 · 2020-04-29T12:58:14Z

@sunjincheng121 @qiaojialin Thanks for your review! Sorry for the late reply because my work is really busy recently :(. I have addressed your comments, please take a look.

Though the "OutputFormat" can be used in DataStream API, it would be better if we support writing TsFile via "StreamingFileSink", which is integrated with the checkpointing mechanism to provide exactly once semantics. Of course we could do this in follow up PR.

Yes, "StreamingFileSink" is much better than "OutputFormat" for streaming job. To support writing TsFile via "StreamingFileSink", we need to implement the flink interface "org.apache.flink.api.common.serialization.BulkWriter". Current blocker is that the BulkWriter wraps the output target as a "FSDataOutputStream" object. It is possible to write TsFile data via "FSDataOutputStream" but needs further discussion as the "FSDataOutputStream" does not support the "truncate" method, which is required by the "TsFileOutput" interface.

sunjincheng121 · 2020-04-30T03:40:08Z

Hi @WeiZhong94 I am fine with current OutputFormat solution for now. and please have look at the CI issue. Thanks.

qiaojialin

Hi, thanks, only two data types error in readme. As for the truncate method in TsFileOutput, I think it is ok to not support. It's only used in restarting the IoTDB server.

flink-tsfile-connector/README.md

…t/DataStream API.

WeiZhong94 · 2020-05-04T06:26:12Z

@sunjincheng121 @qiaojialin Thanks for your reply! I have correct the README.md. It seems that the travis test failure is not involved in this PR, so I just rebased this PR to trigger the test, hope this can work.

sunjincheng121 reviewed Apr 25, 2020

View reviewed changes

qiaojialin reviewed Apr 27, 2020

View reviewed changes

WeiZhong94 force-pushed the IOTDB-560-2 branch from 858341e to 8368256 Compare April 29, 2020 12:29

qiaojialin approved these changes May 2, 2020

View reviewed changes

flink-tsfile-connector/README.md Outdated Show resolved Hide resolved

flink-tsfile-connector/README.md Outdated Show resolved Hide resolved

WeiZhong94 force-pushed the IOTDB-560-2 branch from 8368256 to 4b0e226 Compare May 4, 2020 06:22

WeiZhong94 added 4 commits May 4, 2020 14:22

[IOTDB-560] add TSRecordOutputFormat to write TsFile via Flink DataSe…

53e770d

…t/DataStream API.

address comments

31a3388

address comments

1a51814

address comments

4b0e226

qiaojialin merged commit fd62d4f into apache:master May 6, 2020


		void open(Schema schema) throws IOException;

		void covertAndCollect(T input, Collector<TSRecord> collector) throws IOException;

Conversation

WeiZhong94 commented Apr 21, 2020

Uh oh!

sunjincheng121 commented Apr 24, 2020

Uh oh!

sunjincheng121 commented Apr 25, 2020

Uh oh!

sunjincheng121 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeiZhong94 commented Apr 29, 2020

Uh oh!

sunjincheng121 commented Apr 30, 2020

Uh oh!

qiaojialin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WeiZhong94 commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants