Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

Closed
2 tasks done
nicochen opened this issue Apr 26, 2024 · 0 comments · Fixed by #2820
Closed
2 tasks done

[Bug]: Memory leaks when using mix -hive format to write parquet files #2789

nicochen opened this issue Apr 26, 2024 · 0 comments · Fixed by #2820
Labels
type:bug Something isn't working

Comments

@nicochen
Copy link
Contributor

What happened?

I use several flink sql tasks to ingest data into mix-hive formated table. Task managers of flink had been periodically killed as it exceeds yarn container memory restriction, while its memory consumption of heap are significantly less than startup requested.
I used a gprof tool to trace and statistics how a tm process requests memoy from OS which is like:
Total: 2297.2 MB
1516.5 66.0% 66.0% 1516.5 66.0% deflateInit2_
559.2 24.3% 90.4% 559.3 24.3% os::malloc@905260
192.9 8.4% 98.8% 192.9 8.4% os::malloc@905400
11.7 0.5% 99.3% 11.7 0.5% updatewindow
8.3 0.4% 99.6% 8.3 0.4% readCEN
4.7 0.2% 99.8% 4.7 0.2% init
2.5 0.1% 99.9% 2.5 0.1% inflateInit2_
0.6 0.0% 100.0% 1517.1 66.0% Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init
I am suspicious of Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init.

After using arthas tool to trace zlibCompressor stacks, the problem is the code never called compressor.close to release memory blocks and always renew a new block from OS when writing a file.

I fixed this bug locally and ran it on my environment more than 2 months.

Affects Versions

master

What engines are you seeing the problem on?

Flink, Spark

How to reproduce

Use a large dataset like what I used '20000000 records per day'. It unusual to be reproduced with small datasets as I tested.

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant