Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Indexer to allow an hyper-rectangular selection #391

Closed
matteodg opened this issue Apr 13, 2020 · 50 comments
Closed

Improve Indexer to allow an hyper-rectangular selection #391

matteodg opened this issue Apr 13, 2020 · 50 comments

Comments

@matteodg
Copy link
Member

What about improve the Indexer classes to use the same concert of "hyper-rectangular" selection that HDF5 library is using?
See https://portal.hdfgroup.org/display/HDF5/Reading+From+or+Writing+To+a+Subset+of+a+Dataset

@matteodg
Copy link
Member Author

I'm giving thus a shot by myself, as it would be great to create from an existing Indexer another one which is just a different view of the same backing array/buffer/pointer (defined using an hyper-rectangular selection by offset, count, stride, block for each coordinate),

@saudet
Copy link
Member

saudet commented Apr 14, 2020

That's obviously something we need, but this requires to bring in a lot of stuff. This is probably going to be more challenging than you imagine. Please also consider other solutions for this, such as ND4J or NdArray from TensorFlow: https://github.com/tensorflow/java/tree/master/tensorflow-tools

@matteodg
Copy link
Member Author

Wow, thanks! I checked those options but:

  • NDArrays from ND4J: very flexible (even GPU memory) but >400MB of library, up to 1GB with Cuda
  • NdArray from tensorflow-tools: nice Index concept (I submitted a PR for adding hyperslab Index), but it seems limited as it cannot interoperate with JavaCPP pointers (currently arrays, buffers and raw memory). Is there a clean way to pass the address and size from JavaCPP to the raw memory databuffers?
  • Indexer: to me it seems the more promising, except the ability to create indexers from other indexers (=different views of the same backing data source) is missing.

I added a WIP PR #392 about this issue.

@saudet
Copy link
Member

saudet commented Apr 16, 2020

Thanks!

@karllessard So, it looks like it would be a good time to start talking about this again. What do you think?

@karllessard
Copy link

@matteodg , for safety reasons, TF Tools does not expose publicly the endpoints to map a buffer from a native address. But you can still access them by extending a class from RawDataBufferFactory, that is what TensorFlow is doing to map its tensor buffers from JavaCPP pointers.

Would that work for you?

@matteodg
Copy link
Member Author

Thanks for the hint: I was worried about how the memory area would still be accessible if the Pointer reference has already been GC'ed (and the pointer dellocated by JavaCPP?).

@karllessard
Copy link

I think it depends on each case. In TensorFlow, the life of the Pointer is tied to the life of the tensor itself, so as long at the tensor is living, its data remains accessible (via tensor.data()).

Now, if a user decides to keep a reference to that data outside the tensor scope and if the tensor is freed, then yes, it might result in errors. e.g.

IntNdArray data;
try (Tensor t = TInt32.vectorOf(1, 2, 3, 4)) {
    data = t.data();
}
data.getInt(0); // unexpected result

So unfortunately, we must rely on the "good behaviour" of our users. To preserve the data beyond the lifecycle of a tensor, they need to copy it.

@saudet
Copy link
Member

saudet commented Apr 17, 2020

@karllessard We understand that it's possible to do all that manually, but the point is, we need something that is easier to use (and also safer to use). Asking users to implement that interface isn't going to make anything safer. It's just going to prevent people from getting work done. Some of them may give it a try and implement it, and do it incorrectly, making it less safe than a correct implementation that we could provide them with. If you still stand by your opinion that it should be hard to use, then there's nothing more to discuss. I just wanted to remind you that more and more users are going to request this.

@matteodg
Copy link
Member Author

Our use case is mainly for interoperability with the HDF5 Preset from JavaCPP. Since it is the only HDF5 library allowing to use 64bit indexing, we would like to use that backend and be able to create views of datasets that the library returns.

@karllessard
Copy link

@saudet : I hear your point and I'm not closed to the idea of relaxing the safety surrounding the endpoints. I also think there is a bit of redundancy between tensorflow-tools and JavaCPP indexers, wouldn't it make sense for JavaCPP to use tensorflow-tools instead? That library is very lightweight and can also be renamed to something else to avoid any confusion, since it has absolutely no dependency on TensorFlow itself.

@matteodg : Just to understand a bit more your use case, you are working with HDF5 datasets using JavaCPP and you are simply looking for tools to read or write the buffer data obtained by H5Dread in a 64-bits n-dimensional space? If so, tensorflow-tools can do this, assuming the dimensions are written in the right sequence in the buffer (sorry but I never worked with the HDF5 library directly so I'm not aware of these details...)

@matteodg
Copy link
Member Author

@karllessard Yes, that's exactly the use case, I'm dealing with lately, so all these offered solutions are great. Moreover I need a layer it is also detachable from the specific library that reads the HDF5 (as we are using the old NCSA HDF 1.8 library as a fallback, because sometimes JavaCPP HDF5 Preset is crashing unpredictably) or any other storage. So what is tensorflow-tools doing by wrapping any backend is great.

What about including directly in tensorflow-tools a JavaCppDataBufferFactory which extends RawDataBufferFactory as you suggested? Or the other way around: JavaCPP can depend on tensorflow-tools (which declares an API) and provide another backend using the Java ServiceLoader mechanism?

Another good thing would be to have every DataBuffer returned by the JavaCppDataBufferFactory keeping the reference to the underlying Pointer just like the NioDataBufferFactory databuffers are keeping a reference of the NIO Buffers or the RawDataBufferFactory underlying UnsafeMemoryHandle is keeping a reference of the original array.

I have to say I like very much the Index construct to slice an NdArray: maybe different than the paradigm we are using in the code right now (more similar to Indexer, that's why I was more down to use that instead), but I see it is very powerful and extensible.

On a side note I agree on renaming tensorflow-tools library as it is really separate from TensorFlow and general purpose.

OK, that's enough: too many points ;-)

@saudet
Copy link
Member

saudet commented Apr 17, 2020

@saudet : I hear your point and I'm not closed to the idea of relaxing the safety surrounding the endpoints. I also think there is a bit of redundancy between tensorflow-tools and JavaCPP indexers, wouldn't it make sense for JavaCPP to use tensorflow-tools instead? That library is very lightweight and can also be renamed to something else to avoid any confusion, since it has absolutely no dependency on TensorFlow itself.

It wouldn't make sense as part of the main artifact of JavaCPP, but it could be made available as a separate module, sure. But if the rest of TensorFlow isn't going to use that module, it won't help in terms of usability anyway. We would then have 2 incompatible implementations that couldn't be used interchangeably.

@karllessard
Copy link

karllessard commented Apr 17, 2020

Thanks @matteodg , those are all very valuable comments,tensorflow-tools is still at a stage when it can be revisited,

On a side note I agree on renaming tensorflow-tools library as it is really separate from TensorFlow and general purpose.

I'll think of a different name and propose it during TF Java team meeting (SIG JVM) at the end of the month, which you are invited to join if you want. The library can still be distributed under the TensorFlow organization (to gain some visibility) or not, we'll see what we come up with.

What about including directly in tensorflow-tools a JavaCppDataBufferFactory which extends RawDataBufferFactory as you suggested?

One thing interesting about this library is that it is lightweight and does not have any external dependency. So if it cannot be done the other way around (i.e. JavaCPP depending on this library), then it could be done as a separate module like @saudet was suggesting, either on the JavaCPP side or on TF Tools side.

Another solution would be to make the dependency to JavaCPP in TF Tools optional and detect its presence in the classpath dynamically but these kind of solutions are always a bit cumbersome.

Another good thing would be to have every DataBuffer returned by the JavaCppDataBufferFactory keeping the reference to the underlying Pointer

For TensorFlow, in my previous example, the tensor would still be closed and the memory released even if the Pointer is referenced by the data object. There might be ways to handle this though (maybe with reference count?) Still, having a reference to the Pointer in the DataBuffer can help us to fail gracefully if a user tries to access a memory area that is no longer valid instead of potentially returning garbage data.

But if the rest of TensorFlow isn't going to use that module, it won't help in terms of usability anyway.

@saudet : the idea was that TensorFlow would be using it as well. What would happen of the indexers currently in JavaCPP, would they be part of this module as well? In TensorFlow, we currently don't use them but if they are mapped to the DataBuffer interface, they might be a replacement to the default layouts in TF Tools. Or the other way around (i.e. indexers would now use TF Tools DataLayout).

saudet added a commit that referenced this issue Apr 18, 2020
@saudet
Copy link
Member

saudet commented Apr 18, 2020

For TensorFlow, in my previous example, the tensor would still be closed and the memory released even if the Pointer is referenced by the data object. There might be ways to handle this though (maybe with reference count?) Still, having a reference to the Pointer in the DataBuffer can help us to fail gracefully if a user tries to access a memory area that is no longer valid instead of potentially returning garbage data.

No, that's fine. The idea is just to keep a reference to Pointer, at least when it's the one that owns the data, so we can easily track its lifetime with things like ReferenceQueue and PointerScope. If users still decide to deallocate prematurely the memory, then that's going to be their responsibility.

@saudet : the idea was that TensorFlow would be using it as well. What would happen of the indexers currently in JavaCPP, would they be part of this module as well? In TensorFlow, we currently don't use them but if they are mapped to the DataBuffer interface, they might be a replacement to the default layouts in TF Tools. Or the other way around (i.e. indexers would now use TF Tools DataLayout).

Ok, it's good to hear that we're converging in intent. Indexer is actually at a lower level of abstraction than NdArray, DataBuffer and DayaLayout. It's basically like CPython's Buffer Protocol, and NumPy uses that. As discussed previously, I believe NdArray in Java should be based on that kind of abstraction as well. For example, the most efficient way to access an element of an NdArray is with methods like float getFloat(long... coordinates), correct? Those need to create an array and loop over it for each access, which causes overhead. Basically NdArray isn't geared towards performance when accessing a single element at a time. In contract, Indexer provides a couple of optimized versions of those kinds of methods for low-dimensional arrays, because they are intended to be used that way, a single element at a time, so they need to be fast. Here's a couple of quick and dirty benchmarks just to show you how dramatic the difference is:

long time = System.nanoTime();
FloatPointer largePointer = new FloatPointer(1024 * 1024 * 1024);
FloatIndexer largeIndexer = FloatIndexer.create(largePointer, 1024, 1024, 1024);
for (int i = 0; i < 1024; i++) {
    for (int j = 0; j < 1024; j++) {
        for (int k = 0; k < 1024; k++) {
            largeIndexer.put(new long[] {i, j, k}, 2 * largeIndexer.get(new long[] {i, j, k}));
        }
    }
}
System.out.println("Took " + (System.nanoTime() - time) / 1000000 + " ms");

Took 20474 ms

long time = System.nanoTime();
FloatPointer largePointer = new FloatPointer(1024 * 1024 * 1024);
FloatIndexer largeIndexer = FloatIndexer.create(largePointer, 1024, 1024, 1024);
for (int i = 0; i < 1024; i++) {
    for (int j = 0; j < 1024; j++) {
        for (int k = 0; k < 1024; k++) {
            largeIndexer.put(i, j, k, 2 * largeIndexer.get(i, j, k));
        }
    }
}
System.out.println("Took " + (System.nanoTime() - time) / 1000000 + " ms");

Took: 3215 ms

Which is pretty close to an equivalent C/C++ program that doesn't check bounds:

int main() {
    float *large = new float[1024 * 1024 * 1024];
    for (int i = 0; i < 1024; i++) {
        for (int j = 0; j < 1024; j++) {
            for (int k = 0; k < 1024; k++) {
                large[i * 1024 * 1024 + j * 1024 + k] = 2 * large[i * 1024 * 1024 + j * 1024 + k];
            }
        }
    }
    return large[1024 * 1024 * 1024 - 1];
}

g++ -O3 test.cpp
time ./a.out
real 0m2.498s
user 0m0.839s
sys 0m1.648s

The use cases, the level of abstraction of Indexer vs DataBuffer, DataLayout, and NdArray is not the same. (BTW, I still don't understand what we're gaining by splitting NdArray into 3 levels of abstraction like this. Could you clarify?) FWIW, the only reason I came up with Indexer in the first place is because Java doesn't provide a multidimensional 64-bit version of Buffer. We might get something like that with MemoryLayouts in Java 20 or something, but it's not there yet since it doesn't allow us to use preallocated buffers like Buffer or sun.misc.Unsafe allow us to.

matteodg added a commit to matteodg/javacpp that referenced this issue Apr 19, 2020
…Index` and a `CustomStridesIndex` to keep the previous logic
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 19, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 19, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 19, 2020
…Index` and a `CustomStridesIndex` to keep the previous logic
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 19, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 19, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
…lass `CustomStridesIndex` to keep the previous logic. Add also `DefaultIndex`
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue Apr 20, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue May 2, 2020
@saudet
Copy link
Member

saudet commented May 2, 2020

@mcimadamore This sounds great for what MemoryHandles currently provides to index with strides, but returning back to the topic of this thread, what if we want to do something more complex such as with "hyperslabs"? Are users allowed to implement something like the following for MemorySegment? https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/indexer/HyperslabIndex.java

matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
…lass `CustomStridesIndex` to keep the previous logic. Add also `DefaultIndex`
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
….strides() (package private) only for backward compatibility, because we need to return strides() in Indexer.
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
…x.strides(), add Index.ONE as single dimension index of size one, remove Indexer.ONE_STRIDE
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
matteodg added a commit to matteodg/javacpp that referenced this issue May 3, 2020
@matteodg
Copy link
Member Author

matteodg commented May 7, 2020

maybe very particular to my use case, but the powerful Index concept of NdArrays makes not clear how to coordinate an indexing involving multiple dimensions

Interesting case, indeed. Right now Index is used to slice in a given dimension, and a NdArray slice is obtain by slicing one or more of its dimensions. But each elements of a given dimension is always indexed the same way. Now in your case, if I understand correctly, you want to permute the columns differently depending on which row you are at? So the Index used for the columns must behave differently depending on the row index? Mmh, maybe all coordinates must then be provided to the Index.mapCoordinate(), might be something to look at.

Forgot to reply to this: yes, basically is my case with rows and columns inverted, but the concept is the same. The created view NdArray will allow to indirect indexing through permutations, so I do not need to create a column-sorted matrix (this because I need both the unsorted and sorted matrices at a certain moment).

@karllessard Just to keep you up-to-date: I ended up using JavaCPP Indexer's as it provides a single Index for all coordinates at the same time allowing me to play with permutations of row elements in the same column and I do not see an easy way to do the same in https://github.com/tensorflow/java without rewriting DimensionalSpace.
Thanks anyway for the useful comments!

@karllessard
Copy link

I saw guys that you added a lot of stuff in JavaCPP indexers to get this working.

While it's totally fine, I'm just a bit concerned though about the nature of JavaCPP, that seems to do way more that just providing "a missing bridge between Java and native C++" and starts to become a framework. Would it make sense to provide the indexers as a separate artifact, for example?

@matteodg
Copy link
Member Author

matteodg commented May 8, 2020

Yes, I actually agree and I was going to suggest that too.

@saudet
Copy link
Member

saudet commented May 9, 2020

It's still pretty slim, we didn't need to add that much. The main problem that Indexer solves is providing fast 64-bit multidimensional indexing, which is something that should have been in the JDK from the start, but still isn't, and that C++ libraries need, so yes, it's related to C++. To do that, it needs a way to allocate 64-bit buffers, and well we have to use JNI for that because sun.misc.Unsafe isn't always available. It would still depend on JNI code generated via JavaCPP anyway, even if we moved it to some other project. When MemorySegment lands in the JDK, it should make this exercise obsolete, but the lack of reply from @mcimadamore about the usability of MemoryHandles for custom indexing schemes concerns me. So, in the end, maybe we'll need to come up with something on our own anyway.

There's also the issue of what we should do to support GPUs, among other things. That should be our main concern here. On the CPU with Java, we don't need something like NumPy. We can do loops and stuff manually and get speed that is pretty close to C++. The whole point of creating a NumPy-like interface for Java would be to allow computation on the GPU, and that means compiling native code with CUDA and what not. PyTorch actually markets itself as a replacement for NumPy that works on GPUs:

What is PyTorch?

It’s a Python-based scientific computing package targeted at two sets of audiences:

  • A replacement for NumPy to use the power of GPUs
  • a deep learning research platform that provides maximum flexibility and speed

https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html

We need to be able to do the same with something similar in Java. Maybe the default mode would be 100% pure Java or whatever, but it should be able to use TensorFlow as backend. We would need an implementation of NDArray that does not access memory directly, that uses TensorFlow ops exclusively. How do you see that happening @karllessard ?

@mcimadamore
Copy link

@mcimadamore This sounds great for what MemoryHandles currently provides to index with strides, but returning back to the topic of this thread, what if we want to do something more complex such as with "hyperslabs"? Are users allowed to implement something like the following for MemorySegment? https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/indexer/HyperslabIndex.java

In general you can attach whatever index pre-processing capability you want with MemoryHandles::filterCoordinates. Once you have a function that goes from a logical index (or tuples of indices) into a index into the basic memory segment you can insert that function as a filter of the coordinate - and you will get back a var handle which features the desired access coordinates, with the right behavior.

In your example the filtering function could be something like this (taken from your example):

@Override
    public long index(long i, long j, long k) {
        return (offsets[0] + hyperslabStrides[0] * (i / blocks[0]) + (i % blocks[0])) * strides[0]
                + (offsets[1] + hyperslabStrides[1] * (j / blocks[1]) + (j % blocks[1])) * strides[1]
                + (offsets[2] + hyperslabStrides[2] * (k / blocks[2]) + (k % blocks[2])) * strides[2];
    }

So, assuming you have a plain indexed var handle whose only coordinate is a long (the offset of the element in the segment to be addressed), if you attach a method handle wrapping the above method to the such var handle, you will get back a var handle that takes three longs - in other words you will go from:

VarHandle(MemoryAddress, long)

to

VarHandle(MemoryAddress, long, long, long)

where, on each access, the above function will be computed, yield a long index value which can then be used to access the underlying memory region.

In other words - offset and stride access are only two ready-made combinator that developers can use - which are meant to support common use cases; but in the next iteration of the API (which we are close to finalize) there's a rich VarHandle combinator API which lets you express pretty much what you want (as long as the translation can be reasonably expressed in a functional way, same deal as with MethodHandle combinators).

@saudet
Copy link
Member

saudet commented May 12, 2020

@mcimadamore Ok, but with MemoryHandles::filterCoordinates() that way we're not getting anything more than what we can already get with a simple method overload like we're doing in JavaCPP. I suppose we'll have to wait until you are able to make more information public about the "rich VarHandle combinator API" to see if it's going to be useful in this use case. Is there any information about it already available?

@mcimadamore
Copy link

mcimadamore commented May 12, 2020

Ok, but with MemoryHandles::filterCoordinates() that way we're not getting anything more than what we can already get with a simple method overload like we're doing in JavaCPP. I suppose we'll have to wait until you are able to make more information public about the "rich VarHandle combinator API" to see if it's going to be useful in this use case. Is there any information about it already available?

The question was:

it should make this exercise obsolete, but the lack of reply from @mcimadamore about the usability of MemoryHandles for custom indexing schemes concerns me. So, in the end, maybe we'll need to come up with something on our own anyway.

To which I've replied. Implementing custom addressing schemes as method handle adapters gives a chance to C2 to inline the entire access expression, which is something that I doubt would happen with the overload strategy in the JavaCPP case (benchmark needed of course).

The information you seek is already available out there (all JEPs, CSRs, code review were public all along, really). As with all things, sometimes you also have to get your hands dirty and try things out if you want to get a better sense of how things are used in practice.

@saudet
Copy link
Member

saudet commented May 13, 2020

To which I've replied. Implementing custom addressing schemes as method handle adapters gives a chance to C2 to inline the entire access expression, which is something that I doubt would happen with the overload strategy in the JavaCPP case (benchmark needed of course).

Yes, of course, overloaded methods do get fully inlined and optimized. That's what I showed above:
#391 (comment)

The information you seek is already available out there (all JEPs, CSRs, code review were public all along, really). As with all things, sometimes you also have to get your hands dirty and try things out if you want to get a better sense of how things are used in practice.

I understand, but still I'm pretty sure the information I would like to find isn't available. If you feel otherwise, please help me find it. What I'm looking should allow us to do something like this:

static long index(long i, long offset, long hyperslabStride, long block, long stride) {
    return (offset + hyperslabStride * (i / block) + (i % block)) * stride;
}
// ...
VarHandle floatHandle = MemoryHandles.varHandle(float.class, ByteOrder.nativeOrder());
MethodHandle index = MethodHandles.lookup().findStatic(thatClass, "index", MethodType.methodType(long.class, long.class, long.class, long.class, long.class, long.class));
VarHandle indexedElementHandle = MemoryHandles.withIndex(floatHandle, 4, index, offsets, hyperslabStrides, blocks, strides);

How can we do that?

@karllessard
Copy link

We need to be able to do the same with something similar in Java. Maybe the default mode would be 100% pure Java or whatever, but it should be able to use TensorFlow as backend. We would need an implementation of NDArray that does not access memory directly, that uses TensorFlow ops exclusively. How do you see that happening @karllessard ?

NdArray and DataBuffer are interfaces that could be implemented in many ways, the implementations already available in TF Tools are handling the generic cases but nothing prevent us to add more specialized versions of these objects directly in tensorflow-core, it would be interesting to see how we can do it for GPU.

Note that TF Tools only focus on I/O operations (per-design) and not linear algebra like NumPy or other implementation of ND arrays that I saw (ND4J, DJL, ...). So basically that would just mean to read/write data from/to GPU memory. I never looked at this though, do we really need TF ops or is DMA also available?

@saudet
Copy link
Member

saudet commented May 13, 2020

@karllessard Well, we do need some way to shuffle data in GPU memory for sure, see bytedeco/javacpp-presets#863

/cc @okdzhimiev

@mcimadamore
Copy link

I understand, but still I'm pretty sure the information I would like to find isn't available. If you feel otherwise, please help me find it. What I'm looking should allow us to do something like this:

static long index(long i, long offset, long hyperslabStride, long block, long stride) {
    return (offset + hyperslabStride * (i / block) + (i % block)) * stride;
}
// ...
VarHandle floatHandle = MemoryHandles.varHandle(float.class, ByteOrder.nativeOrder());
MethodHandle index = MethodHandles.lookup().findStatic(thatClass, "index", MethodType.methodType(long.class, long.class, long.class, long.class, long.class, long.class));
VarHandle indexedElementHandle = MemoryHandles.withIndex(floatHandle, 4, index, offsets, hyperslabStrides, blocks, strides);

The documentation for the combinator API is fully available in the CSR that has been submitted few weeks ago:

https://bugs.openjdk.java.net/browse/JDK-8243496

Inside you will find code changes, spec changes and a link to the foreign package javadoc. If you look inside MemoryHandles you should fine adequate documentation on what the various combinators actually do.

As for your specific case, I think the code would look something like this:

VarHandle floatHandle = MemoryLayout.sequenceLayout(MemoryLayouts.JAVA_FLOAT).varHandle(PathElement.sequenceElement()); // (MemoryAddress, long) -> float

MethodHandle index = MethodHandles.lookup().findStatic(thatClass, "index", MethodType.methodType(long.class, long.class, long.class, long.class, long.class, long.class));

VarHandle indexedElementHandle = MemoryHandles.filterCoordinates(1, index); // (MemoryAddress, long, long, long, long, long)

So, the resulting indexed handle will take 5 coordinates and will produce a long which will be forwarded to the original var handle. If you want to inject some of the coordinates to some known constant value that will not change across accesses, you can use the MemoryHandles::insertCoordinates combinator:

VarHandle indexedElementHandle = MemoryHandles.insertCoordinates(1, index, offset); // (MemoryAddress, long, long, long)

And so forth.

@saudet
Copy link
Member

saudet commented May 15, 2020

The documentation for the combinator API is fully available in the CSR that has been submitted few weeks ago:

https://bugs.openjdk.java.net/browse/JDK-8243496

Inside you will find code changes, spec changes and a link to the foreign package javadoc. If you look inside MemoryHandles you should fine adequate documentation on what the various combinators actually do.

Great, thanks! Next time though, please just copy/paste the URL instead of saying something vague like "rich VarHandle combinator API". I don't consider these changes to be "rich", so such subjective statements lead to confusion. And on my side, I will make sure to keep in mind that everything you mention is available online, somewhere, and that I just need to ask, politely. :)

As for your specific case, I think the code would look something like this:

VarHandle floatHandle = MemoryLayout.sequenceLayout(MemoryLayouts.JAVA_FLOAT).varHandle(PathElement.sequenceElement()); // (MemoryAddress, long) -> float

MethodHandle index = MethodHandles.lookup().findStatic(thatClass, "index", MethodType.methodType(long.class, long.class, long.class, long.class, long.class, long.class));

VarHandle indexedElementHandle = MemoryHandles.filterCoordinates(1, index); // (MemoryAddress, long, long, long, long, long)

So, the resulting indexed handle will take 5 coordinates and will produce a long which will be forwarded to the original var handle. If you want to inject some of the coordinates to some known constant value that will not change across accesses, you can use the MemoryHandles::insertCoordinates combinator:

VarHandle indexedElementHandle = MemoryHandles.insertCoordinates(1, index, offset); // (MemoryAddress, long, long, long)

And so forth.

I see, that's rather user-unfriendly, but I guess it should be able to get the job done. Thanks!

@matteodg Any comments regarding that API?

@mcimadamore
Copy link

mcimadamore commented May 15, 2020

I don't consider these changes to be "rich"

This is of course subjective - if you look at the amount of combinators added to MemoryHandles in the latest iteration:

http://cr.openjdk.java.net/~mcimadamore/8243491_v3/javadoc/jdk/incubator/foreign/MemoryHandles.html

You will see quite few combinators there. While these are not the full set of combinators available on MethodHandles - they're pretty close, and they have been hand-picked precisely to make sure that use cases such as the one you describe (along with many others) were expressible. Which is what I meant by "rich".

Now, you might not like the API idiom forced down by MethodHandle/VarHandle, but that's a different (and also equally subjective!) problem.

I'm not an expert of the Indexer API, but, by the looks of it, it seems like it can support up to 3 long access coordinates efficiently - after which it bails out using varargs. The memory access API doesn't have this limitation (thanks to polymorphic signature methods, a VM feature exposed by both VarHandle and MethodHandle). Also, Indexer is rather limited in the coordinate types that can be expressed (long) whereas the aforementioned combinator API allows you to twist and turn coordinate types as you like. And, composition comes for free (you can construct more complex access VarHandle from simpler ones - that's the same principle that has served us very well for MethodHandles in the last 10 years).

So, is Indexer more user-friendly? Of course, but there's more in the VarHandle based approach than meets the eye (another area where VarHandle are particularly strong is when it comes to memory fencing, which is crucial to get concurrent access as fast as possible in a lock-free way).

@matteodg
Copy link
Member Author

@matteodg Any comments regarding that API?

I actually checked out the MethodHandle and VarHandle, which I did not really know: it seems a low level kind of API but I see it is very powerful and efficient as stated in last @mcimadamore comment.
As a user though I do like the expressiveness of the Index concept, especially the one in tensorflow-java API (see the org.tensorflow.tools.ndarray.index package with all kind of implementations) except the limitation to serve each dimension separately.

@mcimadamore
Copy link

mcimadamore commented May 15, 2020

@matteodg - I think you hit the nail on the head; the goal for the new memory access API is to serve as a foundation for building expressive and efficient access abstractions. Currently the only foundation frameworks can build on top of is Unsafe - with all that comes with it (e.g. hard to access, manual bit twiddling, no safety belts). The memory access API is, in most cases, a competitive replacement to Unsafe which, combined with the MemoryLayout API allows for several degree of freedom when it comes to express accessing modes (while still retaining safety, deterministic deallocation and efficiency).

What would be very interesting would be to see if the tensorflow-java's Index could be implemented under the hood using memory access VarHandles (e.g. by collecting all the required transformation on the VarHandle associated with some root ndarray in order to view only the elements in the slice).

@saudet
Copy link
Member

saudet commented Sep 10, 2020

The changes including HyperslabIndex have been released with JavaCPP 1.5.4!
@matteodg Thanks again for your contribution

@saudet saudet closed this as completed Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants