-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8044][CORE]Avoid to make directbuffer out of control while reading or droping disk level block #6586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Not all JDKs are OpenJDK, but, I'm also not sure why this would be better? we don't want to avoid a buffer necessarily? |
|
Test build #33984 has finished for PR 6586 at commit
|
|
@srowen for the reason: *according to this patch, this patch is for the block contains disk level another thing to point, we use jdk not open jdk, I refer to openjdk because I can't found jdk source code. |
|
@srowen
|
|
@srowen may can split the buffer into slices to read or write through channel. |
|
Yes, but there are also advantages to letting it use byte buffers. They are held in a thread-local soft reference so are cached and reused, and are released under memory pressure. I don't think we want or need to manage this. What's the problem, that this increases off-heap memory usage? yes, but you can increase the amount of overhead YARN allows for this. Here you're just trading for potentially less-efficient manual on-heap management. I am not clear this is a good idea. |
|
@srowen if it just to adjust memoryoverhead arg, it will customized for every different application or even if the data increases day by day, may it will change again. How about slice on the Bytebuffer, to ensure the direct buffer pool be more controllable. it will ensure the direct buffer pool not larger than 64MB * 3 |
|
@srowen If I just test 1 cycle, TestOutPutStream have a minor strength, may due to directbuffer creation and destroy is a time cost thing. cycle: 1, data: 10Mb cycle: 1, data: 50MB cycle: 1, data: 100MB cycle: 1, data: 500MB While cycle is increased to 10. cycle: 10, data 10MB cycle: 10, data 50MB cycle: 10, data 100MB cycle:10, data:500MB And also according to test, the time to create a direct buffer is in direct proportion of data size. |
|
Test build #34173 has finished for PR 6586 at commit
|
|
Test build #34179 has finished for PR 6586 at commit
|
|
So, the problem here isn't heap memory right? because although this allocates and caches buffers quite freely, they are in soft refs, which can be removed if memory gets low. You are worried about off-heap memory right? Because nothing understands when "off heap memory is low". I think your change slows things down a bit and adds a little extra complexity, but I don't know that we've shown there's a real problem here. Yes Spark jobs already use a lot of off heap memory but this is not a problem per se. |
|
I haven't dug into this patch in detail yet, but to quickly address one comment: has the Netty bug been fixed in Spark by upgrading yet? If that's the case, is this patch still as urgent? |
|
@JoshRosen Although I also not sure that patch is needed in common or not. |
|
and I also think it may be more better to de-allocate direct Buffer while we send chunkFetchSuccess to requester. |
|
I believe this should be closed. The memory being allocated here is "on purpose" and not doing so adds complexity and slows things down. this is not a tradeoff that everyone wants to make when you can simply adjust your off-heap memory overhead. |
|
@srowen I am in vacation in the past 10 days, sorry too late to see that. |
according openJDK source code: because it will create a ThreadLocal directBuffer pool, and is not provider a 100% percent way to sure the direct buffer to be released.