Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

Closed
AlexDBlack opened this issue Apr 8, 2019 · 8 comments

Comments

Projects
None yet
4 participants
@AlexDBlack
Copy link
Member

commented Apr 8, 2019

Reported in Gitter:

Jose A. Corbacho @mccorby 02:59
Hi,

I am working on a project that requires transferring tensors from DL4J to Pytorch. I am using the npy format to move the data from one side (Android) to another (a server running Pytorch)
I can successfully send a byte array from the server to the client where I build an INDArray using Nd4j.createNpyFromByteArray
I do then some operations and return an INDArray as a byte array using Nd4j.toNpyByteArray(myINDArray)
However when the byte array is received on the server side I get errors. I have seen that when doing toNpyByteArray(myINDArra) the byte array starts with

\x93NUMPY1\x93NUMPY\x01F'descr': '<?4', 'fortran_order': False, 'shape': (5,), } rest of data

This is what the server sends (using numpy.save()) and this can be parsed by DL4J
\x93NUMPY\x01\x00v\x00{'descr': '<f4', 'fortran_order': False, 'shape': (5,), } rest of data

I am using DL4J 1.0.0-beta3 in Android

Am I missing something?

Thanks
@raver119

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

We need source code.

@mccorby

This comment has been minimized.

Copy link

commented Apr 8, 2019

This is the code on the Android side

    public byte[] add(@NotNull byte[] tensor1, @NotNull byte[] tensor2) {
        INDArray array1 = Nd4j.createNpyFromByteArray(tensor1);
        INDArray array2 = Nd4j.createNpyFromByteArray(tensor2);
        INDArray result = array1.add(array2);

        try {
            return Nd4j.toNpyByteArray(result);
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }

On the server side, this is what is used

def numpy_tensor_deserializer(tensor_bin) -> torch.Tensor:
    """"Strategy to deserialize a binary input in npy format into a Torch tensor"""
    input_file = TemporaryFile()
    input_file.write(tensor_bin)
    # read data from file
    input_file.seek(0)
    return torch.from_numpy(numpy.load(input_file))

numpy.load(input_file) fails as pickle cannot load the input

Numpy version is 1.16.2

I will try the snapshot with npz and let you know if I find something else

Thanks

@mccorby

This comment has been minimized.

Copy link

commented Apr 8, 2019

Note that the byte[] obtained after doing add in Android already starts with the bytes Python cannot parse

@mccorby

This comment has been minimized.

Copy link

commented Apr 8, 2019

It looks as if numpyHeaderForNd4j is also adding the magic header for numpy. I've got some progress (still not working though) by removing the magic header in convertToNumpy

I'll keep you posted if I progress more

@roessland

This comment has been minimized.

Copy link

commented Apr 10, 2019

I ran into this issue too. Here is some extra info:

INDArray mat = Nd4j.zeros(3, 3);
  byte[] buf = Nd4j.toNpyByteArray(mat);
  System.out.printf("First 6 bytes are %c %c %c %c %c %c\n", buf[0], buf[1], buf[2], buf[3], buf[4], buf[5]);
  // First 6 bytes are \ x 9 3 N U
  // But should be 0x93 N U M P Y

Hex dump of files produced by Nd4j:

$ hexyl signalmatrix.npy  | head -n 10
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 5c 78 39 33 4e 55 4d 50 ┊ 59 31 93 4e 55 4d 50 59 │\x93NUMP┊Y1×NUMPY│
│00000010│ 01 46 27 64 65 73 63 72 ┊ 27 3a 20 27 3c 3f 34 27 │•F'descr┊': '<?4'│
│00000020│ 2c 20 27 66 6f 72 74 72 ┊ 61 6e 5f 6f 72 64 65 72 │, 'fortr┊an_order│
│00000030│ 27 3a 20 46 61 6c 73 65 ┊ 2c 20 27 73 68 61 70 65 │': False┊, 'shape│

Hex dump of a different file (3x3 identity matrix) saved in Numpy (not from Java):

$ hexyl m.npy 
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 93 4e 55 4d 50 59 01 00 ┊ 76 00 7b 27 64 65 73 63 │×NUMPY•0┊v0{'desc│
│00000010│ 72 27 3a 20 27 3c 69 38 ┊ 27 2c 20 27 66 6f 72 74 │r': '<i8┊', 'fort│
│00000020│ 72 61 6e 5f 6f 72 64 65 ┊ 72 27 3a 20 46 61 6c 73 │ran_orde┊r': Fals│
│00000030│ 65 2c 20 27 73 68 61 70 ┊ 65 27 3a 20 28 33 2c 29 │e, 'shap┊e': (3,)│
│00000040│ 2c 20 7d 20 20 20 20 20 ┊ 20 20 20 20 20 20 20 20 │, }     ┊        │

Looks like the header is added twice in Nd4j.

@AlexDBlack AlexDBlack self-assigned this Apr 10, 2019

@mccorby

This comment has been minimized.

Copy link

commented Apr 10, 2019

Yes, that is what I saw. Also it seems that is affecting the padding of blanks.

@AlexDBlack

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2019

OK, some progress, but not a solution yet: https://github.com/deeplearning4j/deeplearning4j/pull/7518/files
deeplearning4j/dl4j-test-resources#190

Header was definitely incorrectly added in BaseNativeNDArrayFactory.java (see removed magicPointer in PR).

However, that still leaves us with a difference between numpy-generated and ND4J-generated arrays:
image
First is ND4J generated, second is Numpy generated.

Header and content is ultimately coming from here:

template<typename T>
std::vector<char> cnpy::createNpyHeader(const T *data,
const unsigned int *shape,
const unsigned int ndims,
unsigned int wordSize) {
std::vector<char> dict;
dict += "{'descr': '";
dict += BigEndianTest();
dict += mapType(typeid(T));
dict += tostring(wordSize);
dict += "', 'fortran_order': False, 'shape': (";
dict += tostring(shape[0]);
for(int i = 1; i < ndims;i++) {
dict += ", ";
dict += tostring(shape[i]);
}
if(ndims == 1)
dict += ",";
dict += "), }";
//pad with spaces so that preamble+dict is modulo 16 bytes. preamble is 10 bytes. dict needs to end with \n

And, the wrong type ("<?4" instead of "<f4") from here:

cnpy.cpp is a 3rd party library. Might be old, outdated, or simply wrong?
https://github.com/rogersce/cnpy

@AlexDBlack AlexDBlack added the C++ label Apr 11, 2019

@raver119 raver119 self-assigned this Apr 11, 2019

@AlexDBlack AlexDBlack added this to the Release Milestone milestone Apr 11, 2019

@raver119

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

Fixed. Thanks for highlighting this problem.

@raver119 raver119 closed this Apr 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.