Flink: infer source parallelism for FLIP-27 source in batch execution mode #10832

stevenzwu · 2024-07-31T23:53:44Z

No description provided.

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

stevenzwu · 2024-08-01T00:15:17Z

flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBounded.java

-                "testBasicRead",
-                TypeInformation.of(RowData.class))
+        sourceBuilder
+            .buildStream(env)


switched this class to test the new buildStream API

Do we have tests remaining to check the fromSource API for bounded sources?

yes, will only switch this class

...ink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedGenericRecord.java

stevenzwu · 2024-08-01T00:16:41Z

...9/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceInferParallelism.java

+    Field privateField =
+        MiniClusterExtension.class.getDeclaredField("internalMiniClusterExtension");
+    privateField.setAccessible(true);
+    InternalMiniClusterExtension internalExtension =


Use reflect to retrieve InternalMiniClusterExtension to get MiniCluster in order to get execution graph to verify source parallelism.

If you call

env.getTransformations().get(0).getParallelism()

before env.executeAsync() then you could get the parallelism. Would this help?

I just tried it with debugger. the value is the default parallelism of 4 while the expected inferred source parallelism is 1 after the executeAsync()

DataStream<Row> dataStream = IcebergSource.forRowData() .tableLoader(CATALOG_EXTENSION.tableLoader()) .table(table) .flinkConfig(config) // force one file per split .splitSize(1L) .buildStream(env) .map(new RowDataToRowMapper(FlinkSchemaUtil.convert(table.schema()))); int sourceParallelism = env.getTransformations().get(0).getParallelism();

stevenzwu · 2024-08-01T00:38:35Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

+     *
+     * @return data stream from the Iceberg source
+     */
+    public DataStream<T> buildStream(StreamExecutionEnvironment env) {


This is a new public API. I also thought about the method name as createStream. but decided this name for now. open to other suggestion.

also think it is better to require StreamExecutionEnvironment here instead of having it a builder method so that it is clear it is not required for the build() method.

The sentence about env is also true for outputTypeInfo, and watermarkStrategy. Maybe adding them as a parameter would be reasonable too

currently, outputTypeInfo can be inferred from the ReaderFunction if using provided RowData or Avro reader.

if (outputTypeInfo == null) { this.outputTypeInfo = inferOutputTypeInfo(table, context, readerFunction); }

watermarkStrategy is defaulted WatermarkStrategy.noWatermarks. so it is not mandatory either.

outputTypeInfo is not needed anymore with the Converter interface. Removed watermark strategy for now. We can always add it back in the future if it is needed.

stevenzwu · 2024-08-01T05:40:56Z

the new TestIcebergSpeculativeExecutionSupport hangs after this change. Stuck in the wait. need to investigate.

    tEnv.fromDataStream(slowStream)
        .executeInsert(String.format("%s.%s", DATABASE_NAME, OUTPUT_TABLE_NAME))
        .await();

stevenzwu · 2024-08-01T23:43:42Z

...nk/src/test/java/org/apache/iceberg/flink/source/TestIcebergSpeculativeExecutionSupport.java

-      if (getRuntimeContext().getTaskInfo().getAttemptNumber() <= 0) {
+      // Simulate slow subtask 0 with attempt 0
+      TaskInfo taskInfo = getRuntimeContext().getTaskInfo();
+      if (taskInfo.getIndexOfThisSubtask() == 0 && taskInfo.getAttemptNumber() <= 0) {


after this change of inferring source parallelism, this test would hang (even with the inferring parallelism flag turned off). It seems that speculative execution won't kick in somehow. This line of change seems to fix the problem however (tried local run 50 times without a failure). Not sure exactly why. Regardless of the reason, this seems like a good change anyway.

@pvary @venkata91 @becketqin let me know if you have any idea.

Interesting. Thanks for tagging me. Do you mean adding taskInfo.getIndexOfThisSubtask() == 0 solves the hang issue?

@venkata91 that is correct.

stevenzwu · 2024-08-01T23:52:54Z

...k/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceWithWatermarkExtractor.java

+      new Configuration()
+          // disable classloader check as Avro may cache class/object in the serializers.
+          .set(CoreOptions.CHECK_LEAKED_CLASSLOADER, false)
+          // disable inferring source parallelism


inferred parallelism might mess up the watermark and record ordering comparison. disable it to avoid the flakiness

Why is this so?
Do you have any idea? Would it be an issue in prod?

good question. let me dig more.

TestIcebergSourceSql assume the parallelism is 1 for testWatermarkOptionsAscending and testWatermarkOptionsDescending. Table has 2 files. This test just check split assignment is ordered with single reader.

tableEnvironment.getConfig().set("table.exec.resource.default-parallelism", "1");

I think TestIcebergSourceWithWatermarkExtractor similarly assumes parallelism of 4. inferring parallelism would change source parallelism to the number of splits and potentially inferring with the assertion on ordering of the read records..

Maybe explicitly setting the parallelism in those tests would be better.
WDYT?

actually, we don't need the disabling for this particular test as it doesn't go through the buildStream(env) path where infer parallelism happens. will revert the change

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

pvary · 2024-08-12T12:36:31Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

+     * Optional. Default is no watermark strategy. Only relevant if using the {@link
+     * Builder#buildStream(StreamExecutionEnvironment)}.
+     */
+    public Builder<T> watermarkStrategy(WatermarkStrategy<T> newStrategy) {


Is there a way to provide a useful WatermarkStrategy?
I think it is possible to provide a useful TimestampAssigner, but I don't see how can someone provide a useful WatermarkGenerator. The only possible way to generate watermarks are with the watermarkColumn is provided, but even then the WatermarkGenerator should not be used.

Do I miss something?

this is only for the buildStream method. this just moved the watermark strategy from env.fromSource to the builder.

for regular build method, users would also need to set the watermark strategy, which most likely would be none.

DataStream<RowData> stream = env.fromSource( sourceBuilder().build(), WatermarkStrategy.noWatermarks(), "IcebergSource", TypeInformation.of(RowData.class));

will remove WatermarkStrategy from here. we can always add another buildStream with a new arg in the future if there is really a need.

pvary · 2024-08-12T12:47:21Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

@@ -503,28 +569,10 @@ public IcebergSource<T> build() {
            new OrderedSplitAssignerFactory(SplitComparators.watermark(watermarkExtractor));
      }

-      ScanContext context = contextBuilder.build();
+      this.context = contextBuilder.build();


This is ugly as hell.
Side-effect of the build method....
Maybe creating an init() with all the side-effects?
And then build() which uses the initialized attributes?

agree side-effect is undesirable. let me think of a way to refactor the code. maybe extract the ScanContext building into a separate method from build()

the Converter interface will obsolete this problem.

...9/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceInferParallelism.java

… mode

stevenzwu · 2024-08-26T15:01:24Z

thanks @pvary for the review

…ource

…11009)

* main: (208 commits) Docs: Fix Flink 1.20 support versions (apache#11065) Flink: Fix compile warning (apache#11072) Docs: Initial committer guidelines and requirements for merging (apache#10780) Core: Refactor ZOrderByteUtils (apache#10624) API: implement types timestamp_ns and timestamptz_ns (apache#9008) Build: Bump com.google.errorprone:error_prone_annotations (apache#11055) Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062) Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018) Kafka Connect: Disable publish tasks in runtime project (apache#11032) Flink: add unit tests for range distribution on bucket partition column (apache#11033) Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027) Core: Add benchmark for appending files (apache#11029) Build: Ignore benchmark output folders across all modules (apache#11030) Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846) Docs: bump latest version to 1.6.1 (apache#11036) OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024) Core: Generate realistic bounds in benchmarks (apache#11022) Add REST Compatibility Kit (apache#10908) Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009) Docs: Add Druid docs url to sidebar (apache#10997) ...

tedyu · 2024-10-05T22:38:30Z

flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

+            flinkConf,
+            scanContext.limit(),
+            () -> {
+              List<IcebergSourceSplit> splits = planSplitsForBatch(planningThreadName());


I am not sure whether it is intentional to modify the split list instance field in planSplitsForBatch.

It is intentional. See the comment for the method. We cache it as it will be reused later.

stevenzwu requested a review from pvary July 31, 2024 23:53

github-actions bot added the flink label Jul 31, 2024

stevenzwu force-pushed the flip27-source-infer-parallelism branch 2 times, most recently from e6c3a59 to de63e1d Compare August 1, 2024 00:13

stevenzwu commented Aug 1, 2024

View reviewed changes

stevenzwu force-pushed the flip27-source-infer-parallelism branch from 72ea6fb to 3ead397 Compare August 1, 2024 21:37

stevenzwu commented Aug 1, 2024

View reviewed changes

stevenzwu force-pushed the flip27-source-infer-parallelism branch 2 times, most recently from 1035dad to aaf3765 Compare August 6, 2024 16:28