Replacing SortBy with custom partitioner #245

ovj · 2017-08-16T23:13:24Z

Replacing bulkInsert sortby with custom partitioner such that;

Single output spark partition doesn't have records from more than one hoodie partition.
Trying to fit as many records with same hoodie partition as we can from same input spark partition into output spark partition.
@vinothchandar, @prazanna ,@n3nash , @jianxu

vinothchandar · 2017-08-18T16:43:58Z

Can you make the partitioner pluggable? Also could you share the performance analysis you did, since thats core aspect of this PR

vinothchandar · 2017-08-18T16:44:26Z

and please merge both the commits together..

vinothchandar · 2017-08-18T23:10:34Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+ */
+public class BulkInsertPartitioner<T extends HoodieRecordPayload> extends Partitioner {
+    private static Logger logger = LogManager.getLogger(BulkInsertPartitioner.class);
+    private static final int BUCKET_MULTIPLIER = 4;


javadoc/comments

whats special about 4, that makes it work for all workloads ?

vinothchandar · 2017-08-18T23:11:21Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+    private HoodiePartitionMapper hoodiePartitionMapper;
+    private int numOfPartitions = -1;
+
+    private BulkInsertPartitioner(int outputSparkPartitions) {


parity between constructor arg and member variable naming, pls

Removed the constructor once I add partitioner as an argument. I will have to init it via "repartitionRecords" method. Can you please look at it once again?

vinothchandar · 2017-08-18T23:12:12Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+ */
+public class BulkInsertPartitioner<T extends HoodieRecordPayload> extends Partitioner {
+    private static Logger logger = LogManager.getLogger(BulkInsertPartitioner.class);
+    private static final int BUCKET_MULTIPLIER = 4;


whats special about 4, that makes it work for all workloads ?

vinothchandar · 2017-08-18T23:14:41Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+        // As we are going over input records twice; we need to cache input records.
+        JavaRDD<HoodieRecord<T>> cachedRecords = records.persist(StorageLevel.MEMORY_AND_DISK_SER());
+        BulkInsertPartitioner<T> bulkInsertPartitioner = new BulkInsertPartitioner<>(outputSparkPartitions);
+        bulkInsertPartitioner.init(records);


lets pass in the records into the constructor? is there a specific reason for the init method?

Once we are passing Partitioner as an argument then we would not have control over object creation for all cases. In that case this would be needed. Please see the updated PR.

vinothchandar · 2017-08-18T23:16:41Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+
+        return cachedRecords.mapPartitionsWithIndex(
+            (partitionNumber, recordIterator) ->
+                new Iterator<Tuple2<Tuple2<HoodieRecord, Integer>, Integer>>() {


Please rework this using the LazyIterableIterator

vinothchandar · 2017-08-18T23:18:42Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+    /**
+     * Helper class for counting.
+     */
+    private class Counter implements Serializable {


please replace this with a standard AtomicLong

or similar..

vinothchandar · 2017-08-18T23:21:24Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

@@ -0,0 +1,16 @@
+package com.uber.hoodie.table;


license.. Please ensure things build locally, before pushing into PR..

sorry my bad. I was running it locally with skip.rat=true

vinothchandar

Reviewing still

vinothchandar · 2017-08-18T23:28:13Z

hoodie-client/src/main/java/com/uber/hoodie/table/DefaultBulkInsertPartitioner.java

+    public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> records, int outputSparkPartitions) {
+        init(records, outputSparkPartitions);
+        // As we are going over input records twice; we need to cache input records.
+        JavaRDD<HoodieRecord<T>> cachedRecords = records.persist(StorageLevel.MEMORY_AND_DISK_SER());


bulkInsert() api targets full loads with multi TB input. Are you proposing to cache even under these cases? The spilling here (likely to happen) will cause significant extra IO. Have you tested across such workloads?

No. Not tested it for multi TB. Max I have tested it for is 800GB of input size.

Spark does reservoir sampling to handle large inputs such as these

The reason why we are doing caching (DISK+MEMORY) is because we are reading it twice. So it is good if we cache it even for large datasets.

For SortBy - I do see that initial boundary computation may not need spark to read complete data but to later sort them it will have to read the entire data (per partition/executor) right? am I missing anything here?

I have collected the results for 64GB dataset with 32 executors and 16 hoodie writers (insertParallelism). [This is almost equivalent to 1TB for 512 executors].

With SortBy

With Custom partitioner

If I grok this correctly, its ~32 min (27 + 3.2 + 1.7) vs 20 min (11 + 6.4 + 2).. The puzzling this is the different mostly seems to come from RDDWrapper.java.. what is this piece of code?

Also overall, I think a lot of wht you are trying to accomplish is already in the UpsertPartitioner, so not sure if we need a new partitioner for this.. Happy to chat more f2f if needed.. Still not sold fully..

RDDWrapper is actually just waiting for Hoodie to finish writing. This code is exactly same for both the runs.

Yes you are right. Overall it is very close to UpsertPartitioner but there are fundamental differences in it. It's better if we chat f2f. :) will ping you.

Can you compare this with simply doing .insert() with the necessary tuning.. Really like to understand the fundamental difference in approach here before adding in a new partitioner

sg lets f2f

ovj · 2017-08-18T23:55:19Z

whats special about 4, that makes it work for all workloads ? >> Nothing special I just wanted to ensure that number of buckets are some multiple of output partitions.

ovj · 2017-08-25T22:31:52Z

Attaching the results

With SortBy

With UpsertPartitioner

vinothchandar

Thanks for revising it. left some comments..
I still have the high level question that : given we are fixing & re-using existing UpsertPartitioner, can we just leave bulkInsert as it is, using sortBy and just use insert when you need the faster performance.. In this case, this entire PR will add just the fixes to UpsertPartitioner

vinothchandar · 2017-08-26T00:13:52Z

hoodie-client/src/main/java/com/uber/hoodie/table/UpsertPartitioner.java

+/**
+ * Packs incoming records to be upserted, into buckets (1 bucket = 1 RDD partition)
+ */
+public class UpsertPartitioner<T extends HoodieRecordPayload> extends Partitioner {


Please lets move this back to how structure was orginally - its easier to see the changes.. Like to be very prudent here, since this is can potentially cause large regressions..

I would appreciate a simple, incremental diff on UpsertPartitioner

vinothchandar · 2017-08-26T00:15:10Z

hoodie-client/src/main/java/com/uber/hoodie/table/UpsertPartitioner.java

+    }
+
+    /**
+     * Helper class for an insert bucket along with the weight [0.0, 0.1]


so you have removed the weights? and just picking an insert bucket based on floorMod?

vinothchandar · 2017-08-26T00:29:20Z

hoodie-client/src/main/java/com/uber/hoodie/table/UpsertPartitioner.java

+            return updateLocationToBucket.get(location.getFileId());
+        } else {
+            final List<InsertBucket> insertBuckets = partitionPathToInsertBuckets.get(keyLocation._1().getPartitionPath());
+            final int insertBucketIndex = Math.floorMod(keyLocation._1().getRecordKey().hashCode(), insertBuckets.size());


if the weights are gone, how do we ensure the small files are not expanded beyond the maximum configured file size? This will have effects for a future update. We need to handle this case.

ovj · 2017-08-26T00:45:39Z

@vinothchandar is it ok to limit the scope of this PR to just enable user to define their own partitioner for bulkInsert api? And keeping sortBy as the default way for bulkInsert? That way it is zero impact for existing users. Let me know what you think. I will open another issue to fix UpsertPartitioner random issue.

vinothchandar · 2017-08-28T05:40:01Z

yeah that plan sounds good, Will review both PRs sometime this week

vinothchandar · 2017-09-05T17:11:02Z

hoodie-client/src/main/java/com/uber/hoodie/HoodieWriteClient.java

     * @return JavaRDD[WriteStatus] - RDD of WriteStatus to inspect errors and counts
     */
-    public JavaRDD<WriteStatus> bulkInsert(JavaRDD<HoodieRecord<T>> records, final String commitTime) {
+    public JavaRDD<WriteStatus> bulkInsert(JavaRDD<HoodieRecord<T>> records, final String commitTime,
+        Option<BulkInsertPartitioner> bulkInsertPartitioner) {


Can you just pass in a plain spark partitioner? Seems like all that is being passed in, is the input RDD and a parallelism..

vinothchandar · 2017-09-05T17:11:18Z

hoodie-client/src/main/java/com/uber/hoodie/table/BulkInsertPartitioner.java

+ * - Output spark partition will have records from only one hoodie partition.
+ * - Average records per output spark partitions should be almost equal to (#inputRecords / #outputSparkPartitions).
+ */
+public interface BulkInsertPartitioner<T extends HoodieRecordPayload> {


Let's get rid of this interface..

vinothchandar · 2017-09-05T17:11:53Z

hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClient.java

@@ -43,23 +48,6 @@
 import com.uber.hoodie.exception.HoodieRollbackException;
 import com.uber.hoodie.index.HoodieIndex;
 import com.uber.hoodie.table.HoodieTable;
-import org.apache.avro.generic.GenericRecord;


Please avoid any non code changes in PRs. It just makes it hard to review..

vinothchandar · 2017-09-05T17:13:05Z

hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClient.java

@@ -158,7 +157,7 @@ public void testFilterExist() throws Exception {

        JavaRDD<HoodieRecord> smallRecordsRDD = jsc.parallelize(records.subList(0, 75), 1);
        // We create three parquet file, each having one record. (two different partitions)
-        List<WriteStatus> statuses = writeClient.bulkInsert(smallRecordsRDD, newCommitTime).collect();
+        List<WriteStatus> statuses = writeClient.bulkInsert(smallRecordsRDD, newCommitTime, Option.empty()).collect();


lets provide an overloaded method bulkInsert(rdd, commitTime) in the HoodieWriteClient itself, so its consistent with other APIs.

vinothchandar

Main thing is just passing in a Spark Partitioner object..

ovj · 2017-09-09T00:24:16Z

Spoke with @vinothchandar offline. Updated the PR. Now we are going to let users define their own partitioner logic.

vinothchandar reviewed Aug 18, 2017

View reviewed changes

ovj force-pushed the custom_partitioner branch from 398cc20 to 517a9f2 Compare August 18, 2017 23:19

vinothchandar reviewed Aug 18, 2017

View reviewed changes

vinothchandar requested changes Aug 18, 2017

View reviewed changes

vinothchandar reviewed Aug 18, 2017

View reviewed changes

ovj force-pushed the custom_partitioner branch 4 times, most recently from 71d229f to c93d2b8 Compare August 25, 2017 22:15

vinothchandar requested changes Aug 26, 2017

View reviewed changes

ovj force-pushed the custom_partitioner branch from c93d2b8 to 717e8a2 Compare August 28, 2017 19:01

vinothchandar reviewed Sep 5, 2017

View reviewed changes

vinothchandar requested changes Sep 5, 2017

View reviewed changes

Adding support for UserDefinedBulkInsertPartitioner

43638a8

ovj force-pushed the custom_partitioner branch from 717e8a2 to 43638a8 Compare September 8, 2017 23:10

vinothchandar merged commit 5c639c0 into apache:master Sep 9, 2017

vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023

Upgrade to version release-v0.13.1 (apache#245)

69e770e

Replacing SortBy with custom partitioner #245

Replacing SortBy with custom partitioner #245

Conversation

ovj commented Aug 16, 2017

vinothchandar commented Aug 18, 2017

vinothchandar commented Aug 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinothchandar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ovj commented Aug 18, 2017

ovj commented Aug 25, 2017

vinothchandar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ovj commented Aug 26, 2017

vinothchandar commented Aug 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinothchandar left a comment

Choose a reason for hiding this comment

ovj commented Sep 9, 2017