ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

AlexDBlack · 2019-04-08T01:48:47Z

Reported in Gitter:

Jose A. Corbacho @mccorby 02:59
Hi,

I am working on a project that requires transferring tensors from DL4J to Pytorch. I am using the npy format to move the data from one side (Android) to another (a server running Pytorch)
I can successfully send a byte array from the server to the client where I build an INDArray using Nd4j.createNpyFromByteArray
I do then some operations and return an INDArray as a byte array using Nd4j.toNpyByteArray(myINDArray)
However when the byte array is received on the server side I get errors. I have seen that when doing toNpyByteArray(myINDArra) the byte array starts with

\x93NUMPY1\x93NUMPY\x01F'descr': '<?4', 'fortran_order': False, 'shape': (5,), } rest of data

This is what the server sends (using numpy.save()) and this can be parsed by DL4J
\x93NUMPY\x01\x00v\x00{'descr': '<f4', 'fortran_order': False, 'shape': (5,), } rest of data

I am using DL4J 1.0.0-beta3 in Android

Am I missing something?

Thanks

The text was updated successfully, but these errors were encountered:

raver119 · 2019-04-08T04:08:58Z

We need source code.

mccorby · 2019-04-08T06:04:00Z

This is the code on the Android side

    public byte[] add(@NotNull byte[] tensor1, @NotNull byte[] tensor2) {
        INDArray array1 = Nd4j.createNpyFromByteArray(tensor1);
        INDArray array2 = Nd4j.createNpyFromByteArray(tensor2);
        INDArray result = array1.add(array2);

        try {
            return Nd4j.toNpyByteArray(result);
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }

On the server side, this is what is used

def numpy_tensor_deserializer(tensor_bin) -> torch.Tensor:
    """"Strategy to deserialize a binary input in npy format into a Torch tensor"""
    input_file = TemporaryFile()
    input_file.write(tensor_bin)
    # read data from file
    input_file.seek(0)
    return torch.from_numpy(numpy.load(input_file))

numpy.load(input_file) fails as pickle cannot load the input

Numpy version is 1.16.2

I will try the snapshot with npz and let you know if I find something else

Thanks

mccorby · 2019-04-08T06:05:35Z

Note that the byte[] obtained after doing add in Android already starts with the bytes Python cannot parse

mccorby · 2019-04-08T11:16:39Z

It looks as if numpyHeaderForNd4j is also adding the magic header for numpy. I've got some progress (still not working though) by removing the magic header in convertToNumpy

I'll keep you posted if I progress more

roessland · 2019-04-10T10:59:46Z

I ran into this issue too. Here is some extra info:

INDArray mat = Nd4j.zeros(3, 3);
  byte[] buf = Nd4j.toNpyByteArray(mat);
  System.out.printf("First 6 bytes are %c %c %c %c %c %c\n", buf[0], buf[1], buf[2], buf[3], buf[4], buf[5]);
  // First 6 bytes are \ x 9 3 N U
  // But should be 0x93 N U M P Y

Hex dump of files produced by Nd4j:

$ hexyl signalmatrix.npy  | head -n 10
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 5c 78 39 33 4e 55 4d 50 ┊ 59 31 93 4e 55 4d 50 59 │\x93NUMP┊Y1×NUMPY│
│00000010│ 01 46 27 64 65 73 63 72 ┊ 27 3a 20 27 3c 3f 34 27 │•F'descr┊': '<?4'│
│00000020│ 2c 20 27 66 6f 72 74 72 ┊ 61 6e 5f 6f 72 64 65 72 │, 'fortr┊an_order│
│00000030│ 27 3a 20 46 61 6c 73 65 ┊ 2c 20 27 73 68 61 70 65 │': False┊, 'shape│

Hex dump of a different file (3x3 identity matrix) saved in Numpy (not from Java):

$ hexyl m.npy 
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 93 4e 55 4d 50 59 01 00 ┊ 76 00 7b 27 64 65 73 63 │×NUMPY•0┊v0{'desc│
│00000010│ 72 27 3a 20 27 3c 69 38 ┊ 27 2c 20 27 66 6f 72 74 │r': '<i8┊', 'fort│
│00000020│ 72 61 6e 5f 6f 72 64 65 ┊ 72 27 3a 20 46 61 6c 73 │ran_orde┊r': Fals│
│00000030│ 65 2c 20 27 73 68 61 70 ┊ 65 27 3a 20 28 33 2c 29 │e, 'shap┊e': (3,)│
│00000040│ 2c 20 7d 20 20 20 20 20 ┊ 20 20 20 20 20 20 20 20 │, }     ┊        │

Looks like the header is added twice in Nd4j.

mccorby · 2019-04-10T11:25:16Z

Yes, that is what I saw. Also it seems that is affecting the padding of blanks.

AlexDBlack · 2019-04-10T12:20:05Z

OK, some progress, but not a solution yet: https://github.com/deeplearning4j/deeplearning4j/pull/7518/files
https://github.com/deeplearning4j/dl4j-test-resources/pull/190

Header was definitely incorrectly added in BaseNativeNDArrayFactory.java (see removed magicPointer in PR).

However, that still leaves us with a difference between numpy-generated and ND4J-generated arrays:

First is ND4J generated, second is Numpy generated.

Header and content is ultimately coming from here:

deeplearning4j/libnd4j/include/cnpy/cnpy.cpp

Lines 582 to 603 in e5125b5

    
           template<typename T> 
        
           std::vector<char> cnpy::createNpyHeader(const T *data, 
        
                                                         const unsigned int *shape, 
        
                                                         const unsigned int ndims, 
        
                                                         unsigned int wordSize) { 
        
               std::vector<char> dict; 
        
               dict += "{'descr': '"; 
        
               dict += BigEndianTest(); 
        
               dict += mapType(typeid(T)); 
        
               dict += tostring(wordSize); 
        
               dict += "', 'fortran_order': False, 'shape': ("; 
        
               dict += tostring(shape[0]); 
        
               for(int i = 1; i < ndims;i++) { 
        
                   dict += ", "; 
        
                   dict += tostring(shape[i]); 
        
               } 
        
               if(ndims == 1) 
        
                   dict += ","; 
        
               dict += "), }"; 
        
               //pad with spaces so that preamble+dict is modulo 16 bytes. preamble is 10 bytes. dict needs to end with \n

And, the wrong type ("<?4" instead of "<f4") from here:

deeplearning4j/libnd4j/include/cnpy/cnpy.cpp

Line 65 in e5125b5

else return '?';

cnpy.cpp is a 3rd party library. Might be old, outdated, or simply wrong?
https://github.com/rogersce/cnpy

raver119 · 2019-04-25T14:21:22Z

Fixed. Thanks for highlighting this problem.

lock · 2019-05-25T15:02:18Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

AlexDBlack added Bug Bugs and problems ND4J ND4J Issues labels Apr 8, 2019

AlexDBlack self-assigned this Apr 10, 2019

AlexDBlack mentioned this issue Apr 10, 2019

[WIP] Numpy format fixes #7518

Merged

AlexDBlack added the C++ label Apr 11, 2019

raver119 self-assigned this Apr 11, 2019

AlexDBlack added this to the Release Milestone milestone Apr 11, 2019

raver119 closed this as completed Apr 25, 2019

lock bot locked and limited conversation to collaborators May 25, 2019

eclipsewebmaster unassigned raver119 Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

AlexDBlack commented Apr 8, 2019

raver119 commented Apr 8, 2019

mccorby commented Apr 8, 2019

mccorby commented Apr 8, 2019 •

edited

mccorby commented Apr 8, 2019

roessland commented Apr 10, 2019 •

edited

mccorby commented Apr 10, 2019

AlexDBlack commented Apr 10, 2019 •

edited

raver119 commented Apr 25, 2019

lock bot commented May 25, 2019

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

Comments

AlexDBlack commented Apr 8, 2019

raver119 commented Apr 8, 2019

mccorby commented Apr 8, 2019

mccorby commented Apr 8, 2019 • edited

mccorby commented Apr 8, 2019

roessland commented Apr 10, 2019 • edited

mccorby commented Apr 10, 2019

AlexDBlack commented Apr 10, 2019 • edited

raver119 commented Apr 25, 2019

lock bot commented May 25, 2019

mccorby commented Apr 8, 2019 •

edited

roessland commented Apr 10, 2019 •

edited

AlexDBlack commented Apr 10, 2019 •

edited