-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!!asBuffer is not threadsafe!! and usability issues with Pointer #155
Comments
Right, I've thought about position and limit not being part of the state of the object, but if that were the case, we would need to create new |
I duplicate the pointers all over the place in order to avoid any problems later. Actually, as far as the pointer goes I don't use position and limit always equals capacity. The nature of the code I am writing right now (neural networks) is inherently multithreaded which means even using asBuffer with locking is a bad idea because that pointer could be used in an api call concurrently. |
Right, I understand that, but do you have any other better ideas than duplicating objects? |
I do not. If there was no mutable data on the pointers then there would be zero perceivable change (from an outside perspective of measuring the characteristics of a running program) because it would amount to no measurable gc overhead. It would, however, result in a larger portion of programs working correctly under a larger number of circumstances. So I don't understand why you consider duplicating the pointers (not their respective buffers but the pointers) a bad idea or why you want to try for a better one. That being said using by setting the fields of the pointers I do offsetting and getbuffer in a threadsafe manner so as far as I am concerned I have worked around the issue. What I worry about is if I use a library that is using javacpp and they don't understand the implications. For instance in your documentation for set position you state:
This is not correct because no one would expect an array offset to change the underlying array datatype. And if you are returning an object anyway; it could just be a new object. If software is built by reading the documentation then it is quite possible there are latent and difficult to find bugs throughout the software where these semantics are bad. Fundamentally I find it a much better tradeoff to force duplication rather than create a design which is prone to the type of bugs that happen in multithreaded code when doing non-threadsafe things. I do not think you feel equally about this tradeoff. |
Well, they made the same choice for NIO buffers, so I just did it that way because people are used to it. Also, forcing duplication creates unnecessary overhead when it's not needed, and given that one of JavaCPP's goal is speed... That being said, the documentation could be longer and more explicit, obviously. Pull requests are welcome. :) |
I also think the nio buffer interface is poor for the same reasons. A poor interface is a poor interface it doesn't matter if it came from a standards committee. There are other places where the nio buffers are short sighted (only able to mmap up to 2G at a time). It is actually shocking to me that they didn't contract out to a c++ compiler expert to help with those types of interfaces because exactly these types of issues would have been avoided. I find a lot of the java standards simply ignorant and irresponsibly done for these reasons and I would like to find out who exactly proposed this and who ratified it because in my opinion neither of those two entities did their jobs. Every major os mmap interface avoids all of the above issues and they all predate the nio buffer standards by quite some time. Aside from that the above that is fair :). |
Do you have some numbers to back up your claims, especially on Android (Dalvik/ART)? If not, let's do some benchmarking first, and then we can talk. :) Also, it's always possible to extend the API in some way. |
What about if we added a |
I am not certain what that would entail. Really all I need is a bulk transfer method to/from arrays (also with an offset) of the same datatype that doesn't change position. Everything else I can work around/ignore. I use pointer offsetting quite a bit but as I said I duplicate the pointer and do the offsetting manually myself. The best solution at this point probably would be to write a c++ library that does exactly what I want in threadsafe ways and use it for the binding. If the speed of your program is fundamentally based on whether you are duplicating pointers or not you have other issues. Speed is also much less of a concern than correctness and this design is fundamentally incorrect. It is possible to be both correct and fast and subtle threading errors are a much higher concern to me than speed issues caused by constructing small objects. This is the core argument and it hasn't changed. You would have to prove that the speed gained by the design used in a realistic system makes the system instability worth it and that would be a difficult proof in either dalvik or any other runtime. So benchmark all you want but the speed is a secondary characteristic and you won't find any measurable difference in any decent sized program if the javacpp pointers are immutable vs. if they are not. That to me is an irrational argument. |
Another way to look at it is why are people setting the various parts of mutable data and what do they do with the mutated object? If, for instance, they are offsetting it to move data into and out of it that is silly; the operation to move data into the object should have the offset included (and shouldn't alter position). If they are doing it in order to have an offset into their API pointers then again you could either pass the offset into the function or you could create a new pointer. What other cases are there for altering the member variables of the pointers? |
If all you need is bulk transfer to/from Java arrays, we don't need to use NIO buffers. Just call |
This is more the direction I am thinking. It is easy to add detection for common memory errors (buffer overwrite) in the library and in the enclosing utility layer just above it. This efficiently allows marshalling of any jvm primitive datatype into and out of those buffers and this isn't possible with nio buffers. Finally the structures declaring those buffers are immutable although I do not like them being described as these allocated types; I just didn't want to add all the arguments to the functions required to make that happen and this shows off the concept well. It also shows that you can write very small c++ libraries and use them very efficiently from javacpp (both directions; I just didn't setup the system to create javacpp pointers from the typed buffers. |
Sounds like you're looking for the indexer package then: |
To start with, your concept of exactly what an indexer is is not very clear; it doesn't appear to get much beyond a nio buffer as it can't marshall data between types (aside from to/from double) unless it is simply the ability to have multiple dimensions. Aside from that.
The library I provided does efficient marshalling between types and can be extended to arrays of any POD (plain-old-datatype) type. The allocation call has enough information to allow for things like memory maps and going through the manager for the operations allows tracking duplicate free or access after free. Furthermore it is far faster than any other system for copying data between datatypes; something that is again common when moving data onto/off of neural networks (or really any kind of gpu programming). Here are timing tests: (run-time-tests)
typed-buffer-marshal-test
"Elapsed time: 19.921377 msecs"
nio-buffer-marshal-test
"Elapsed time: 170.820526 msecs"
typed-buffer-same-type-time-test
"Elapsed time: 10.928848 msecs"
new-buffer-same-type-time-test
"Elapsed time: 5.347048 msecs" Again, I can add conversion to javacpp datatypes (if the buffer is a java primitive type) and thus to nio buffers from my types easily but that isn't necessary at this moment for this conversation. This is in about 400 lines of c++ and 400 lines of clojure. |
I see, but that's the kind of thing we could do on top of indexer. There
really isn't any reason to do such simple operations in C++, unless we need
to support inefficient runtimes like on Android. Scala even has
@specialized, something like C++ templates for performance with primitive
types.
|
You could do it but you can't get the performance I do; that was my original point. Copying a javacpp pointer here and there isn't going to make a difference which was my original point. The absolute best you can get in the case where you are marshaling between types (java array of one type, and nio buffer of another type) is > 10 times slower than what I get in this system because you have to revert to a set-single-item at at time interface instead of a bulk interface and that is not counting if I decide to implement a more interesting type like a half float.
The above is a clear reason and since you are so concerned about performance that introducing threading issues was a worthwhile tradeoff for extremely questionable gains I thought you might be interested in scenarios where you can't, not matter what you do, get the performance story correct. Marshaling between types of POD objects to/from native buffers is exactly one place you can't get it right and this seems to me the only purpose of the position member of the pointer class (anything else you could just offset the address and go on with your life). The position member is broken, btw, you can try to get the actual address of say a double pointer with position set at 1 in native code. The system will offset the address by 1 byte, not 1 double. |
Would you have something concrete to propose? If you do, please send it
over and we'll work from there.
|
* Make `Pointer.asBuffer()` thread-safe (issue #155)
In the meantime, I've made |
As buffer being threadsafe is a solid step forward, but I proposed many concrete things:
The bug fix is useful, however, and javacpp is extremely useful and do very much thank you for that. |
This is the definition of Pointer.position() right now: public <P extends Pointer> P position(long position) {
this.position = position;
return (P)this;
} We erase that, and what do we put instead? That kind of concrete. |
The thing is, what do we do for subclasses? In the parent class, we can put something like the following: public Pointer position(long position) {
return new Pointer(this).position(position);
} But say we have some native void foo(ClassA pointer); A user that needs to call that function with a position of 42 could do class ClassA extends Pointer {
// ...
public ClassA position(long position) {
return new ClassA(this).position(position);
}
} Do you feel this is acceptable? Or do you have something else in mind? If you do, please provide more details. It's hard to understand what you are proposing if you cannot be more concrete. |
Totally, I was thinking along the same lines and was going to try a concrete implementation this weekend. Basically I think you could remove the position member variable but keep the api the same. You would have to have a virtual create method or something like that... The concept of typed pointers I think is very appropriate for objects and not appropriate in general. Meaning if the pointer is a tuple of address and type as opposed to a DoublePointer wrapping an address then some of these types of issues things go away. For object pointers, as I said, the typing system works great and leads to a great tool to work with. For buffers of primitive POD type objects (objects where memset and memcpy semantics are valid meaning they don't have nested constructors or embedded pointers) or for the most general concept of a pointer (an int32 or int64 value) I think it falls over a bit. For example there are significant machinations in my cuda layer to deal with the pointer types and this machination would be simpler were the pointers more runtime typed and less compile time typed. In any case, I think your response of "show me something more concrete" is very fair now that I understand it and I was waiting until I had time to try to remove position and then get something simple to work all the way through before responding. |
@cnuernber Any updates on this? it would be awesome if we could come up with a better solution! In any case, the original issue ( |
Recent discussions with @frankfliu got me wondering about this again, and I think I found a satisfactory solution. Regardless of thread safety, there is a usability issue with the fact that users expect to find some Now, If you can think of something more to add to this, please let me know! Thank you for all the input |
This has been released with JavaCPP 1.5.4. tl;dr |
The code for .asBuffer on any pointer is not threadsafe. In the base java class, both position and limit are mangled to call an underlying call and then reset.
This results in intermittent crashes if the expectation is that .asBuffer is threadsafe and the documentation does not state that it is not (or I missed it).
A workaround is to lock the ptr or duplicate it before calling asbuffer. AT the very least it would be extremely helpful if methods that are not threadsafe are in fact documented as being that way and a list of methods that aren't threadsafe in once place would be great.
I personally think the inclusion of the position pointer on the pointer.java object is a mistake and in fact the inclusion of any non-constant member variables that can change during the pointer's lifetime should be re-evaluated so that we can write code that does not crash intermittently in hard-to-debug and reproduce situations.
The text was updated successfully, but these errors were encountered: