[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

nicochen · 2024-04-26T09:22:18Z

What happened?

I use several flink sql tasks to ingest data into mix-hive formated table. Task managers of flink had been periodically killed as it exceeds yarn container memory restriction, while its memory consumption of heap are significantly less than startup requested.
I used a gprof tool to trace and statistics how a tm process requests memoy from OS which is like:
Total: 2297.2 MB
1516.5 66.0% 66.0% 1516.5 66.0% deflateInit2_
559.2 24.3% 90.4% 559.3 24.3% os::malloc@905260
192.9 8.4% 98.8% 192.9 8.4% os::malloc@905400
11.7 0.5% 99.3% 11.7 0.5% updatewindow
8.3 0.4% 99.6% 8.3 0.4% readCEN
4.7 0.2% 99.8% 4.7 0.2% init
2.5 0.1% 99.9% 2.5 0.1% inflateInit2_
0.6 0.0% 100.0% 1517.1 66.0% Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init
I am suspicious of Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init.

After using arthas tool to trace zlibCompressor stacks, the problem is the code never called compressor.close to release memory blocks and always renew a new block from OS when writing a file.

I fixed this bug locally and ran it on my environment more than 2 months.

Affects Versions

master

What engines are you seeing the problem on?

Flink, Spark

How to reproduce

Use a large dataset like what I used '20000000 records per day'. It unusual to be reproduced with small datasets as I tested.

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

nicochen added the type:bug Something isn't working label Apr 26, 2024

nicochen mentioned this issue May 8, 2024

[AMORO-6789]Memory leaks when using mix -hive format to write parquet files. #2820

Merged

3 tasks

zhoujinsong closed this as completed in #2820 May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

nicochen commented Apr 26, 2024

[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

Comments

nicochen commented Apr 26, 2024

What happened?

Affects Versions

What engines are you seeing the problem on?

How to reproduce

Relevant log output

Anything else

Are you willing to submit a PR?

Code of Conduct