Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tuning #800

Closed
garyli1019 opened this issue Jul 23, 2019 · 10 comments
Closed

Performance tuning #800

garyli1019 opened this issue Jul 23, 2019 · 10 comments

Comments

@garyli1019
Copy link
Member

Hello, I am having a performance issue when I was upserting ~100GB data into a 700GB table already managed by Hudi in HDFS. The upsert part does have some duplicates with existing table because I am setting up a buffer to cover all the delta in case my spark job doesn't start on time.

spark config I used(external shuffle is true as default in my cluster):

spark2-submit \
        --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
        --conf spark.network.timeout=480s \
        --conf spark.executor.memoryOverhead=3g \
        --conf spark.dynamicAllocation.maxExecutors=50 \
        --conf spark.executor.cores=1 \
        --conf spark.driver.maxResultSize=4g \
        --conf spark.task.maxFailures=10 \
        --conf spark.yarn.max.executor.failures=500 \
	--conf spark.rdd.compress=true \
	--conf spark.kryoserializer.buffer.max=1024m \
        --master yarn \
        --deploy-mode client \
        --num-executors 20 \
        --executor-memory 12g \
        --driver-memory 5g \

Key Hudi Configs:

PARQUET_SMALL_FILE_LIMIT_BYTES = 200MB
PARQUET_FILE_MAX_BYTES = 256MB
BLOOM_FILTER_NUM_ENTRIES = "2000000"
hoodie.upsert.shuffle.parallelism = "800"

I am using Datasource Writer to append the delta data. I tried to use CMS garbage collector but it doesn't change too much. A 200MB parquet file has ~3-6 million records in my case. Do you have any idea how to make count at HoodieSparkSqlWriter faster?
Thank you so much!

Screen Shot 2019-07-22 at 11 52 32 AM
Screen Shot 2019-07-22 at 11 53 14 AM
Screen Shot 2019-07-22 at 11 54 08 AM

@vinothchandar
Copy link
Member

vinothchandar commented Jul 23, 2019

That job pretty much just writes out parquet (or versions parquet files if you have updates). From what I see single tasks are gc-ing for hours when though the input is more or less 7M records or so. I have seen similar issue (not this bad though) caused on yarn due to interference from other jobs or yarn not blacklisting a bad host..

As a next step, can we try configuring a larger heap (say double it) or obtain a heapdump of such a process and we can see whats going on (i.e if there is a leak)

@vinothchandar
Copy link
Member

@n3nash any ideas?

@garyli1019
Copy link
Member Author

Sure, I can try that.
The delta data was very dirty for sure(many incoming old data need to rewrite existing parquet files). The task duration seems to increase exponentially with the shuffle read size.

Also, this job is not releasing executors when the tasks were finished. e.g. I gave this job 100 executors. Two tasks are running for 20 hours and others finished in minutes. This job will keep 100 executors for 20 hours. Is that possible to improve this?

Screen Shot 2019-07-23 at 11 22 37 AM

@vinothchandar
Copy link
Member

So, this seems more about the memory per executor rather than number of executors.. Not releasing the executors is very weird for a spark app to do tbh.. interesting.. You already have 12GB executor memory, which should be plenty ..

All in all, its GC-ing constantly and if you let it run, as with any java app, it will keep GCing and chew up cpu .. Can you try to do a heapdump using jmap and pull it into something like Eclipse MAT ? we can then see whats taking up the memory..

@garyli1019
Copy link
Member Author

Maybe this https://github.com/apache/incubator-hudi/blob/master/hoodie-common/src/main/java/com/uber/hoodie/common/util/SerializationUtils.java#L85 is too small? Seems like this is not related to spark.kryoserializer.buffer.max?

Also could be https://github.com/apache/incubator-hudi/blob/master/hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java#L119 is too small. 12G * 0.4 * 0.6 = 2.88G seems enough for split merge, but I will try larger executor memory to confirm...

@vinothchandar
Copy link
Member

Still think getting a heapdump is the best way, since that will tell us what’s actually held in memory.

@vinothchandar
Copy link
Member

hi.. any updates?

@garyli1019
Copy link
Member Author

Hi, sorry been a little busy this week. I will write a summary once I get enough information.

@garyli1019
Copy link
Member Author

Few things I did to improve performance:

  • changed max parquet size from 256MB to 128MB cuts the shuffle read size to half, which improve the performance without increasing the executor memory
  • increase the single executor memory improve the performance as well
  • avoid merging too many historical data
    I am not able to get heapdump at this moment due to the cluster access issue, but I will try to print GC info to get some insight.

@vinothchandar
Copy link
Member

@garyli1019 thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants