Flink: refactor sink shuffling statistics collection #10331

stevenzwu · 2024-05-13T22:45:55Z

to support sketch statistics and auto migration from Map stats to reservoir sampling sketch if cardinality is detected high

stevenzwu · 2024-05-14T00:28:22Z

Moved DataStatistics away from generic and use a type to distinguish btw Map and Sketch statistics. A couple of reasons.

generics getting a bit too complicated.
support auto migration/promotion of Map stats to Sketch if the cardinality is detected to be high. Default statistics type should be Auto. but if auto didn't work well in some cases, users can set the type to Map or Sketch explicitly.

Will add the sketch range partitioner in a separate PR following this one.

stevenzwu · 2024-05-14T00:39:31Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java

-    this.dataStatistics = statisticsSerializer.createInstance();
-  }
+  private final StatisticsType type;
+  private final Map<SortKey, Long> keyFrequency;


combine both Map and Sketch stats in the same aggregated statistics object would allow run-time switch from Map stats to Sketch.

stevenzwu · 2024-05-14T00:40:40Z

...link/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsSerializer.java

+    if (record.type() == StatisticsType.Map) {
+      keyFrequencySerializer.serialize(record.keyFrequency(), target);
+    } else {
+      rangeBoundsSerializer.serialize(Arrays.asList(record.rangeBounds()), target);


Reused list serializer from Flink. paying a small penalty for array to list conversion for that.

stevenzwu · 2024-05-14T00:42:00Z

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java

+  }
+
+  @SuppressWarnings("unchecked")
+  private void merge(DataStatistics taskStatistics) {


this method shows the stats type migration from Map to Sketch

....19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java

stevenzwu · 2024-05-14T00:44:18Z

.../v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java

+
+    if (localStatistics.type() == StatisticsType.Map) {
+      Map<SortKey, Long> mapStatistics = (Map<SortKey, Long>) localStatistics.result();
+      if (statisticsType == StatisticsType.Auto


this is stats migration (Map -> Sketch) at operator side during collection phase.

With AUTO, if any task, or coordinator decides that we move to sketch then it might be a good idea for everyone to move to sketch to save memory, and transformations.
Do we want to have an extra message in this case, or at least switch when a global stat comes where we already switched to stat?

Good question

each operator makes independent decision on switching from Map to Sketch during local collection phase.

when operators received the global statistics from coordinator, operators should also check if type switch is needed. but looks like I missed this logic. will add.

stevenzwu · 2024-05-14T00:44:53Z

...1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsSerializer.java

+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+@Internal
+class DataStatisticsSerializer extends TypeSerializer<DataStatistics> {


this single serializer can handle both map and sketch stats type

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsUtil.java

stevenzwu · 2024-05-14T00:47:04Z

flink/v1.19/build.gradle

@@ -66,6 +66,8 @@ project(":iceberg-flink:iceberg-flink-${flinkMajorVersion}") {
      exclude group: 'org.slf4j'
    }

+    implementation libs.datasketches


the jar file is about 1MB. so not too big to be included

...link/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsSerializer.java

pvary · 2024-05-14T08:07:05Z

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java

+        SketchUtil.convertMapToSketch(taskMapStats, taskSketch::update);
+        coordinatorSketchStatistics.update(taskSketch);


I'm wondering which is better:

Getting a map from task -> converting task map to sketch -> merging the coordinator and the map sketch

Updating the coordinator sketch, by adding the values from the map directly

Which one is performing better? Which results in better approximation in the resulting sketch?

If we consciously use the 1st solution, then we probably want to send a message to the tasks when we switch to sketch to not bother sending the whole map, but just the sketch (it might be a smaller message)

yes, once coordinator switched to sketch, all operators will switch too upon the receiving of the global statistics

....19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java

...1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsSerializer.java

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapDataStatistics.java

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsType.java

yegangy0718 · 2024-05-20T05:56:56Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatistics.java

+    this.checkpointId = checkpointId;
+    this.type = type;
+    this.keyFrequency = keyFrequency;
+    this.rangeBounds = rangeBounds;


do we want to add a check at here to make sure keyFrequency and rangeBounds won't have value at the same time

yegangy0718 · 2024-05-26T05:10:41Z

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java

  private final int parallelism;
+  private final TypeSerializer<DataStatistics> statisticsSerializer;
+  private final int downstreamParallelism;
+  private final StatisticsType statisticsType;


the field is not being used

good catch. the variable is needed but missed as you pointed out in the other comment below on coordinator migration from Map to Sketch

yegangy0718 · 2024-05-26T05:54:59Z

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java

+      Map<SortKey, Long> taskMapStats = (Map<SortKey, Long>) taskStatistics.result();
+      if (coordinatorStatisticsType == StatisticsType.Map) {
+        taskMapStats.forEach((key, count) -> coordinatorMapStatistics.merge(key, count, Long::sum));
+        if (coordinatorMapStatistics.size() > switchToSketchThreshold) {


So for coordinator, unlike operator which needs to check if StatisticsType = Auto, we will convert it from map to sketch once the size reaches the threshold?

good catch. it is related to your other comment that statisticsType was not used. it should be used and checked here.

yegangy0718 · 2024-05-26T06:37:18Z

....19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java

+          StatisticsEvent statisticsEvent =
+              StatisticsEvent.createAggregatedStatisticsEvent(
+                  checkpointId, globalStatistics, aggregatedStatisticsSerializer);
+          for (int i = 0; i < context.currentParallelism(); ++i) {


We have a function #parallelism at line 187 to get the current parallelism. Do we want to remove the function

good catch. let me remove the parallelism() method as it is too trivial to be kept

yegangy0718 · 2024-05-26T07:01:10Z

.../v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java

    }
+
+    this.taskStatisticsType = StatisticsUtil.collectType(statisticsType, globalStatistics);


Does Iceberg repo follows the type that, we always use this. to refer to the class variable? If that's the case, then let's update globalStatistics to this.globalStatistics like what we do in line 113

Same comment for line 124 and 125

Iceberg style only uses this. if trying to modify the value/state (like setter or constructor)

yegangy0718 · 2024-05-26T07:26:16Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java

+  }
+
+  /**
+   * To understand how range bounds are used in range partitioning, heere is an example for human


typo heere to here

yegangy0718 · 2024-05-27T06:05:13Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java

+   * <li>Target size is "coordinator reservoir size * over sampling ration (10) / operator
+   *     parallelism"
+   * <li>Min is 1K to achieve good accuracy while memory footprint is still relatively small
+   * <li>Max is 100K to cap the memory footprint on coordinator


From the current implementation, operator reservoir size depends on coordinator reservoir size completely. Do we check the operator reservoir min max value?

I feel not needed. coordinator reservoir size is already correlated with parallelism/partitions. operator reservoir size probably can purely tie to OPERATOR_OVER_SAMPLE_RATIO.

yegangy0718 · 2024-05-27T06:30:41Z

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java

+    taskMapStats.forEach(
+        (sortKey, count) -> {
+          for (int i = 0; i < count; ++i) {
+            sketchConsumer.accept(sortKey);


Do we consider to execute the sketchConsumer.accept in parallel to make it faster

thought about it. but it would require a thread pool. let's start simple. if this turns out to be an issue later, we can improve it then.

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java

…h statistics and auto migration from Map stats to reservoir sampling sketch if cardinality is detected high

…unit test to cover the 2 scenarios of operator stats migrations

…rrent checkpoints properly.

pvary · 2024-06-04T19:50:48Z

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java

+  private final Comparator<StructLike> comparator;
+  private final NavigableMap<Long, Aggregation> aggregationsPerCheckpoint;
+
+  private volatile AggregatedStatistics completedStatistics;


How is the thread model work for the event handling? Do we need the volatile here?

good question. we don't really need volatile here as coordinator event handling is always single thread. let me remove the volatile.

this.coordinatorExecutor = Executors.newSingleThreadExecutor(coordinatorThreadFactory);

pvary

Just one small question from my side

stevenzwu · 2024-06-05T17:43:17Z

thanks @pvary and @yegangy0718 for the code review

github-actions bot added flink build labels May 13, 2024

stevenzwu force-pushed the refactor-stats branch 2 times, most recently from b97bf41 to 0882cb8 Compare May 14, 2024 00:08

stevenzwu requested a review from pvary May 14, 2024 00:28

stevenzwu force-pushed the refactor-stats branch from 6af1b2a to a4e5e4a Compare May 14, 2024 00:29

stevenzwu commented May 14, 2024

View reviewed changes

pvary reviewed May 14, 2024

View reviewed changes

...link/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsSerializer.java Show resolved Hide resolved

pvary reviewed May 14, 2024

View reviewed changes

....19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java Show resolved Hide resolved

pvary reviewed May 14, 2024

View reviewed changes

...1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsSerializer.java Show resolved Hide resolved

pvary reviewed May 14, 2024

View reviewed changes

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapDataStatistics.java Outdated Show resolved Hide resolved

pvary reviewed May 14, 2024

View reviewed changes

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java Show resolved Hide resolved

pvary reviewed May 14, 2024

View reviewed changes

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsType.java Outdated Show resolved Hide resolved

stevenzwu added this to In Progress in [Priority 2] Flink: add more shuffling support for streaming writer May 14, 2024

yegangy0718 reviewed May 27, 2024

View reviewed changes

pvary reviewed May 29, 2024

View reviewed changes

...9/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java Outdated Show resolved Hide resolved

pvary reviewed May 29, 2024

View reviewed changes

flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SketchUtil.java Show resolved Hide resolved

stevenzwu added 6 commits June 3, 2024 14:33

Flink: refactor sink shuffling statistics collection to support sketc…

825a518

…h statistics and auto migration from Map stats to reservoir sampling sketch if cardinality is detected high

add operator stats migration upon receiving global stats. also added …

0aa40c9

…unit test to cover the 2 scenarios of operator stats migrations

make MapDataStatistics and SketchDataStatistics non-public

a86917b

fix copy-and-paste error in Javadoc as Peter pointed out

62b7740

add more tests to TestSketchUtil

9727389

removed some code from SketchUtil

fa096f2

stevenzwu force-pushed the refactor-stats branch from dda7e5a to 4caf53b Compare June 3, 2024 22:42

stevenzwu added 2 commits June 3, 2024 16:31

address Gang's comments

f4fb0b5

remove unneeded Javadoc

b8edbd5

stevenzwu force-pushed the refactor-stats branch from 4caf53b to b8edbd5 Compare June 3, 2024 23:31

stevenzwu mentioned this pull request Jun 4, 2024

Flink: handle rescale (up or down) better in range partitioner #10441

Open

Address Peter's concern on dropping partial results by handling concu…

415f651

…rrent checkpoints properly.

pvary reviewed Jun 4, 2024

View reviewed changes

pvary approved these changes Jun 4, 2024

View reviewed changes

remove volatile

ad5452a

stevenzwu merged commit cbe391d into apache:main Jun 5, 2024
41 checks passed

stevenzwu deleted the refactor-stats branch June 5, 2024 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink: refactor sink shuffling statistics collection #10331

Flink: refactor sink shuffling statistics collection #10331

stevenzwu commented May 13, 2024

stevenzwu commented May 14, 2024 •

edited

stevenzwu May 14, 2024

stevenzwu May 14, 2024

stevenzwu May 14, 2024

stevenzwu May 14, 2024

pvary May 14, 2024

stevenzwu May 14, 2024

stevenzwu May 14, 2024

stevenzwu May 14, 2024

pvary May 14, 2024

stevenzwu May 14, 2024

yegangy0718 May 20, 2024

yegangy0718 May 26, 2024

stevenzwu Jun 3, 2024

yegangy0718 May 26, 2024

stevenzwu Jun 3, 2024

yegangy0718 May 26, 2024

stevenzwu Jun 3, 2024

yegangy0718 May 26, 2024

yegangy0718 May 26, 2024

stevenzwu Jun 3, 2024

yegangy0718 May 26, 2024

yegangy0718 May 27, 2024

stevenzwu Jun 3, 2024

yegangy0718 May 27, 2024

stevenzwu Jun 3, 2024

pvary Jun 4, 2024

stevenzwu Jun 5, 2024

pvary left a comment

stevenzwu commented Jun 5, 2024

		SketchUtil.convertMapToSketch(taskMapStats, taskSketch::update);
		coordinatorSketchStatistics.update(taskSketch);

		}

		this.taskStatisticsType = StatisticsUtil.collectType(statisticsType, globalStatistics);

Flink: refactor sink shuffling statistics collection #10331

Flink: refactor sink shuffling statistics collection #10331

Conversation

stevenzwu commented May 13, 2024

stevenzwu commented May 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pvary left a comment

Choose a reason for hiding this comment

stevenzwu commented Jun 5, 2024

stevenzwu commented May 14, 2024 •

edited