Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

Closed
AlexDBlack opened this issue Apr 8, 2019 · 9 comments
Closed

ND4J: Error in Nd4j.toNpyByteArray(INDArray) format? #7466

AlexDBlack opened this issue Apr 8, 2019 · 9 comments
Assignees
Labels
Bug Bugs and problems ND4J ND4J Issues
Milestone

Comments

@AlexDBlack
Copy link
Contributor

Reported in Gitter:

Jose A. Corbacho @mccorby 02:59
Hi,

I am working on a project that requires transferring tensors from DL4J to Pytorch. I am using the npy format to move the data from one side (Android) to another (a server running Pytorch)
I can successfully send a byte array from the server to the client where I build an INDArray using Nd4j.createNpyFromByteArray
I do then some operations and return an INDArray as a byte array using Nd4j.toNpyByteArray(myINDArray)
However when the byte array is received on the server side I get errors. I have seen that when doing toNpyByteArray(myINDArra) the byte array starts with

\x93NUMPY1\x93NUMPY\x01F'descr': '<?4', 'fortran_order': False, 'shape': (5,), } rest of data

This is what the server sends (using numpy.save()) and this can be parsed by DL4J
\x93NUMPY\x01\x00v\x00{'descr': '<f4', 'fortran_order': False, 'shape': (5,), } rest of data

I am using DL4J 1.0.0-beta3 in Android

Am I missing something?

Thanks
@AlexDBlack AlexDBlack added Bug Bugs and problems ND4J ND4J Issues labels Apr 8, 2019
@raver119
Copy link
Contributor

raver119 commented Apr 8, 2019

We need source code.

@mccorby
Copy link

mccorby commented Apr 8, 2019

This is the code on the Android side

    public byte[] add(@NotNull byte[] tensor1, @NotNull byte[] tensor2) {
        INDArray array1 = Nd4j.createNpyFromByteArray(tensor1);
        INDArray array2 = Nd4j.createNpyFromByteArray(tensor2);
        INDArray result = array1.add(array2);

        try {
            return Nd4j.toNpyByteArray(result);
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }

On the server side, this is what is used

def numpy_tensor_deserializer(tensor_bin) -> torch.Tensor:
    """"Strategy to deserialize a binary input in npy format into a Torch tensor"""
    input_file = TemporaryFile()
    input_file.write(tensor_bin)
    # read data from file
    input_file.seek(0)
    return torch.from_numpy(numpy.load(input_file))

numpy.load(input_file) fails as pickle cannot load the input

Numpy version is 1.16.2

I will try the snapshot with npz and let you know if I find something else

Thanks

@mccorby
Copy link

mccorby commented Apr 8, 2019

Note that the byte[] obtained after doing add in Android already starts with the bytes Python cannot parse

@mccorby
Copy link

mccorby commented Apr 8, 2019

It looks as if numpyHeaderForNd4j is also adding the magic header for numpy. I've got some progress (still not working though) by removing the magic header in convertToNumpy

I'll keep you posted if I progress more

@roessland
Copy link

roessland commented Apr 10, 2019

I ran into this issue too. Here is some extra info:

INDArray mat = Nd4j.zeros(3, 3);
  byte[] buf = Nd4j.toNpyByteArray(mat);
  System.out.printf("First 6 bytes are %c %c %c %c %c %c\n", buf[0], buf[1], buf[2], buf[3], buf[4], buf[5]);
  // First 6 bytes are \ x 9 3 N U
  // But should be 0x93 N U M P Y

Hex dump of files produced by Nd4j:

$ hexyl signalmatrix.npy  | head -n 10
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 5c 78 39 33 4e 55 4d 50 ┊ 59 31 93 4e 55 4d 50 59 │\x93NUMP┊Y1×NUMPY│
│00000010│ 01 46 27 64 65 73 63 72 ┊ 27 3a 20 27 3c 3f 34 27 │•F'descr┊': '<?4'│
│00000020│ 2c 20 27 66 6f 72 74 72 ┊ 61 6e 5f 6f 72 64 65 72 │, 'fortr┊an_order│
│00000030│ 27 3a 20 46 61 6c 73 65 ┊ 2c 20 27 73 68 61 70 65 │': False┊, 'shape│

Hex dump of a different file (3x3 identity matrix) saved in Numpy (not from Java):

$ hexyl m.npy 
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 93 4e 55 4d 50 59 01 00 ┊ 76 00 7b 27 64 65 73 63 │×NUMPY•0┊v0{'desc│
│00000010│ 72 27 3a 20 27 3c 69 38 ┊ 27 2c 20 27 66 6f 72 74 │r': '<i8┊', 'fort│
│00000020│ 72 61 6e 5f 6f 72 64 65 ┊ 72 27 3a 20 46 61 6c 73 │ran_orde┊r': Fals│
│00000030│ 65 2c 20 27 73 68 61 70 ┊ 65 27 3a 20 28 33 2c 29 │e, 'shap┊e': (3,)│
│00000040│ 2c 20 7d 20 20 20 20 20 ┊ 20 20 20 20 20 20 20 20 │, }     ┊        │

Looks like the header is added twice in Nd4j.

@AlexDBlack AlexDBlack self-assigned this Apr 10, 2019
@mccorby
Copy link

mccorby commented Apr 10, 2019

Yes, that is what I saw. Also it seems that is affecting the padding of blanks.

@AlexDBlack
Copy link
Contributor Author

AlexDBlack commented Apr 10, 2019

OK, some progress, but not a solution yet: https://github.com/deeplearning4j/deeplearning4j/pull/7518/files
https://github.com/deeplearning4j/dl4j-test-resources/pull/190

Header was definitely incorrectly added in BaseNativeNDArrayFactory.java (see removed magicPointer in PR).

However, that still leaves us with a difference between numpy-generated and ND4J-generated arrays:
image
First is ND4J generated, second is Numpy generated.

Header and content is ultimately coming from here:

template<typename T>
std::vector<char> cnpy::createNpyHeader(const T *data,
const unsigned int *shape,
const unsigned int ndims,
unsigned int wordSize) {
std::vector<char> dict;
dict += "{'descr': '";
dict += BigEndianTest();
dict += mapType(typeid(T));
dict += tostring(wordSize);
dict += "', 'fortran_order': False, 'shape': (";
dict += tostring(shape[0]);
for(int i = 1; i < ndims;i++) {
dict += ", ";
dict += tostring(shape[i]);
}
if(ndims == 1)
dict += ",";
dict += "), }";
//pad with spaces so that preamble+dict is modulo 16 bytes. preamble is 10 bytes. dict needs to end with \n

And, the wrong type ("<?4" instead of "<f4") from here:

cnpy.cpp is a 3rd party library. Might be old, outdated, or simply wrong?
https://github.com/rogersce/cnpy

@AlexDBlack AlexDBlack added the C++ label Apr 11, 2019
@raver119 raver119 self-assigned this Apr 11, 2019
@AlexDBlack AlexDBlack added this to the Release Milestone milestone Apr 11, 2019
@raver119
Copy link
Contributor

Fixed. Thanks for highlighting this problem.

@lock
Copy link

lock bot commented May 25, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators May 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Bugs and problems ND4J ND4J Issues
Projects
None yet
Development

No branches or pull requests

4 participants