Performance tuning #800

garyli1019 · 2019-07-23T01:25:18Z

Hello, I am having a performance issue when I was upserting ~100GB data into a 700GB table already managed by Hudi in HDFS. The upsert part does have some duplicates with existing table because I am setting up a buffer to cover all the delta in case my spark job doesn't start on time.

spark config I used(external shuffle is true as default in my cluster):

spark2-submit \
        --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
        --conf spark.network.timeout=480s \
        --conf spark.executor.memoryOverhead=3g \
        --conf spark.dynamicAllocation.maxExecutors=50 \
        --conf spark.executor.cores=1 \
        --conf spark.driver.maxResultSize=4g \
        --conf spark.task.maxFailures=10 \
        --conf spark.yarn.max.executor.failures=500 \
	--conf spark.rdd.compress=true \
	--conf spark.kryoserializer.buffer.max=1024m \
        --master yarn \
        --deploy-mode client \
        --num-executors 20 \
        --executor-memory 12g \
        --driver-memory 5g \

Key Hudi Configs:

PARQUET_SMALL_FILE_LIMIT_BYTES = 200MB
PARQUET_FILE_MAX_BYTES = 256MB
BLOOM_FILTER_NUM_ENTRIES = "2000000"
hoodie.upsert.shuffle.parallelism = "800"

I am using Datasource Writer to append the delta data. I tried to use CMS garbage collector but it doesn't change too much. A 200MB parquet file has ~3-6 million records in my case. Do you have any idea how to make count at HoodieSparkSqlWriter faster?
Thank you so much!

The text was updated successfully, but these errors were encountered:

vinothchandar · 2019-07-23T09:56:13Z

That job pretty much just writes out parquet (or versions parquet files if you have updates). From what I see single tasks are gc-ing for hours when though the input is more or less 7M records or so. I have seen similar issue (not this bad though) caused on yarn due to interference from other jobs or yarn not blacklisting a bad host..

As a next step, can we try configuring a larger heap (say double it) or obtain a heapdump of such a process and we can see whats going on (i.e if there is a leak)

vinothchandar · 2019-07-23T09:59:05Z

@n3nash any ideas?

garyli1019 · 2019-07-23T18:37:50Z

Sure, I can try that.
The delta data was very dirty for sure(many incoming old data need to rewrite existing parquet files). The task duration seems to increase exponentially with the shuffle read size.

Also, this job is not releasing executors when the tasks were finished. e.g. I gave this job 100 executors. Two tasks are running for 20 hours and others finished in minutes. This job will keep 100 executors for 20 hours. Is that possible to improve this?

vinothchandar · 2019-07-23T23:08:18Z

So, this seems more about the memory per executor rather than number of executors.. Not releasing the executors is very weird for a spark app to do tbh.. interesting.. You already have 12GB executor memory, which should be plenty ..

All in all, its GC-ing constantly and if you let it run, as with any java app, it will keep GCing and chew up cpu .. Can you try to do a heapdump using jmap and pull it into something like Eclipse MAT ? we can then see whats taking up the memory..

garyli1019 · 2019-07-24T00:22:31Z

Maybe this https://github.com/apache/incubator-hudi/blob/master/hoodie-common/src/main/java/com/uber/hoodie/common/util/SerializationUtils.java#L85 is too small? Seems like this is not related to spark.kryoserializer.buffer.max?

Also could be https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java#L119 is too small. 12G * 0.4 * 0.6 = 2.88G seems enough for split merge, but I will try larger executor memory to confirm...

vinothchandar · 2019-07-24T15:42:18Z

Still think getting a heapdump is the best way, since that will tell us what’s actually held in memory.

vinothchandar · 2019-08-01T05:20:21Z

hi.. any updates?

garyli1019 · 2019-08-01T19:12:58Z

Hi, sorry been a little busy this week. I will write a summary once I get enough information.

garyli1019 · 2019-08-09T23:22:37Z

Few things I did to improve performance:

changed max parquet size from 256MB to 128MB cuts the shuffle read size to half, which improve the performance without increasing the executor memory
increase the single executor memory improve the performance as well
avoid merging too many historical data
I am not able to get heapdump at this moment due to the cluster access issue, but I will try to print GC info to get some insight.

vinothchandar · 2019-08-12T00:18:48Z

@garyli1019 thanks for the update!

garyli1019 closed this as completed Aug 9, 2019

bwu2 mentioned this issue Feb 14, 2020

Hudi upsert hangs #1328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tuning #800

Performance tuning #800

garyli1019 commented Jul 23, 2019

vinothchandar commented Jul 23, 2019 •

edited

vinothchandar commented Jul 23, 2019

garyli1019 commented Jul 23, 2019

vinothchandar commented Jul 23, 2019

garyli1019 commented Jul 24, 2019

vinothchandar commented Jul 24, 2019

vinothchandar commented Aug 1, 2019

garyli1019 commented Aug 1, 2019

garyli1019 commented Aug 9, 2019

vinothchandar commented Aug 12, 2019

Performance tuning #800

Performance tuning #800

Comments

garyli1019 commented Jul 23, 2019

vinothchandar commented Jul 23, 2019 • edited

vinothchandar commented Jul 23, 2019

garyli1019 commented Jul 23, 2019

vinothchandar commented Jul 23, 2019

garyli1019 commented Jul 24, 2019

vinothchandar commented Jul 24, 2019

vinothchandar commented Aug 1, 2019

garyli1019 commented Aug 1, 2019

garyli1019 commented Aug 9, 2019

vinothchandar commented Aug 12, 2019

vinothchandar commented Jul 23, 2019 •

edited