You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use several flink sql tasks to ingest data into mix-hive formated table. Task managers of flink had been periodically killed as it exceeds yarn container memory restriction, while its memory consumption of heap are significantly less than startup requested.
I used a gprof tool to trace and statistics how a tm process requests memoy from OS which is like:
Total: 2297.2 MB
1516.5 66.0% 66.0% 1516.5 66.0% deflateInit2_
559.2 24.3% 90.4% 559.3 24.3% os::malloc@905260
192.9 8.4% 98.8% 192.9 8.4% os::malloc@905400
11.7 0.5% 99.3% 11.7 0.5% updatewindow
8.3 0.4% 99.6% 8.3 0.4% readCEN
4.7 0.2% 99.8% 4.7 0.2% init
2.5 0.1% 99.9% 2.5 0.1% inflateInit2_
0.6 0.0% 100.0% 1517.1 66.0% Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init
I am suspicious of Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init.
After using arthas tool to trace zlibCompressor stacks, the problem is the code never called compressor.close to release memory blocks and always renew a new block from OS when writing a file.
I fixed this bug locally and ran it on my environment more than 2 months.
Affects Versions
master
What engines are you seeing the problem on?
Flink, Spark
How to reproduce
Use a large dataset like what I used '20000000 records per day'. It unusual to be reproduced with small datasets as I tested.
Relevant log output
No response
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
What happened?
I use several flink sql tasks to ingest data into mix-hive formated table. Task managers of flink had been periodically killed as it exceeds yarn container memory restriction, while its memory consumption of heap are significantly less than startup requested.
I used a gprof tool to trace and statistics how a tm process requests memoy from OS which is like:
Total: 2297.2 MB
1516.5 66.0% 66.0% 1516.5 66.0% deflateInit2_
559.2 24.3% 90.4% 559.3 24.3% os::malloc@905260
192.9 8.4% 98.8% 192.9 8.4% os::malloc@905400
11.7 0.5% 99.3% 11.7 0.5% updatewindow
8.3 0.4% 99.6% 8.3 0.4% readCEN
4.7 0.2% 99.8% 4.7 0.2% init
2.5 0.1% 99.9% 2.5 0.1% inflateInit2_
0.6 0.0% 100.0% 1517.1 66.0% Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init
I am suspicious of Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init.
After using arthas tool to trace zlibCompressor stacks, the problem is the code never called compressor.close to release memory blocks and always renew a new block from OS when writing a file.
I fixed this bug locally and ran it on my environment more than 2 months.
Affects Versions
master
What engines are you seeing the problem on?
Flink, Spark
How to reproduce
Use a large dataset like what I used '20000000 records per day'. It unusual to be reproduced with small datasets as I tested.
Relevant log output
No response
Anything else
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: