-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pointer.deallocator is a performance killer #231
Comments
This needs to be synchronized across all threads because if |
Sorry, closed when instead I decided to revisit this and comment later. |
@saudet I have been looking at this a little bit more.
After looking at this more and building some more precise tests, I was not quite accurate. I do agree that these may be allocated too often, and maybe there is some improvements to be had there, but I think there is some potential improvements we can have here as well. It turns out that the sleep is not so much problematic, as the fact that the global lock exists at all. I am trying to run a very highly parallelized workload, and I believe it is probably bringing this out a bit more. Even though each thread is holding the synchronized block for less than a millsecond, other threads may end up blocking on the lock for 100's of milliseconds just because of the level of lock contention. I have some prototypes of an optimistic lock-free, but fallback to locking that are so far showing significant improvements. But I was hoping you could help me. This project is still fairly new to me, and I have not branched out into the native code yet. However conceptually I don't understand the difference between |
It was lock-free with a fallback previously, but it didn't work, because we have to update the linked list atomically anyway. If you can implement a lock-free linked list that we can update atomically, then that would probably work. This looks interesting: https://en.wikipedia.org/wiki/Non-blocking_linked_list |
|
I already got this done. I actually see a couple ways we could do this. Maybe you can help me. I have my one use case with kmeans on deeplearning4j, but I don't know what other use patterns are common. Does this linked list ever get too long? Is removing from it common? We can solve it simply by using a ConcurrentLinkedQueue, or we can use an AtomicReference depending on performance vs code readability preferences. What I am finding hard to deal with is the physical Bytes. I can estimate them, but having them atomically updated without knowing how much is going to be used is really hard. Do you think some slight fudging with that value would be acceptable? It means over provisioning might be possible, but hopefully not too much, nor would it get worse with time. |
A long time ago, I tried to use containers like that, but the garbage collector was never able to recognize those as phantom reachable. Maybe that has changed. If you could give it a try, that would be great! Of course, in general, we can't know what other parts of the process are doing with the memory and can never be sure exactly how much memory we are using. |
I've also been working on better scaling of this implementation -- running into the same concurrency issues on Scanner.scan. I've prototyped a second interface where a BytePointer can be passed into the scan call -- managed by the callers. This helps quite a bit. Anyone here to examine pull requests? |
For performance, we should disable all that anyway, and deallocate everything manually, perhaps with the help of a class such as PointerScope: http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/ |
In my profiling I am seeing a ton of blocking on this lock:
https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/Pointer.java#L546
I don't see why the synchronization here needs to happen across all instances? Should this not synchronize on
this
?I am also unsure about holding the lock while doing a
Thread.sleep
. I assume this is to try and let the gc() have effect beforetrimMemory()
is invoked (which still may or may not happen). Could the gc / sleep / trim happen outside of the lock?If none of the above are possible (I am rather unfamiliar with this code), then maybe we could do an optimistic check of volatile state before we lock?
Again, just shots in the dark, but I am seeing this in my profiles a fair bit.
The text was updated successfully, but these errors were encountered: