# Read Numpy File as binary

While normally it would be acceptable to simply import the array file into python with [np.load](https://numpy.org/doc/stable/reference/generated/numpy.load.html), Python does not meet the required per-frame time budget for Project Basilisk. This means that our frame loader for testing Basilisk will need to be written in a language with faster execution times, such as C or C++. Because we can't\* import numpy into C or C++, or because we may use a non-C language entirely, we need to know how to read a numpy array as a binary file without access to any of the methods provided by numpy.

**Arete Confidential. Do not distribute.**

### Imports

In [172]:
from glob import glob
import numpy as np

### Variables

In [173]:
PATH = "Z:\Export\Day_1\Test_1\Shot_1_nuc"

In [174]:
#Clean the PATH variables
PATH = PATH.replace("\\","/") #Use forward slashes as god intended
PATH = PATH.rstrip("/") #Remove any trailing slash

### Find and read first .npy file

In [175]:
file = open(glob(PATH+'/*.npy')[0],'rb')

#### Does the file have a numpy signature?
[According to the numpy format specifications,](http://numpy.org/devdocs/reference/generated/numpy.lib.format.html) the first six bytes of a binary file are exactly "\x93NUMPY."

In [176]:
signature = b'\x93NUMPY'
file.seek(0,0)
first_six_bytes = file.read(6)
print("Signature is " + str(signature))
print("First six bytes are " + str(first_six_bytes))
first_six_bytes == signature

Signature is b'\x93NUMPY'
First six bytes are b'\x93NUMPY'


True

#### Version numbering
After the signature, the next two bytes of a numpy file are the major and minor version numbers respectively.

In [177]:
file.seek(6,0)
major_version = ord(file.read(1))
minor_version = ord(file.read(1))

print("This array was saved with numpy format version " + str(major_version) + "." + str(minor_version))

This array was saved with numpy format version 1.0


#### Compatability Note
[From the numpy docs](http://numpy.org/devdocs/reference/generated/numpy.lib.format.html#format-version-2-0), "The version 1.0 format only allowed the array header to have a total size of 65535 bytes. This can be exceeded by structured arrays with a large number of columns. The version 2.0 format extends the header size to 4 GiB. numpy.save will automatically save in 2.0 format if the data requires it, else it will always use the more compatible 1.0 format."

This means that, unless you have a very large .npy file, you should expect version 1.0 for increased compatability

#### HEADER_LEN
The next 2 bytes form an unsigned integer representing the length of the header. I am unsure how to parse this, but it doesn't seem entirely necessary to do so because of how the header is formatted.

In [178]:
file.seek(8,0)
HEADER_LEN = file.read(2)
print(f"This file has a header length of {HEADER_LEN}")

This file has a header length of b'v\x00'


#### HEADER
The next HEADER_LEN bytes describe the format of the array. If HEADER_LEN is unavailable, you can safely just read until you find the byte representation of a newline char.

In [179]:
file.seek(10,0)
HEADER = file.read(1)
while True:
    next_char = file.read(1)
    HEADER += next_char
    if next_char == b'\n':
        break
HEADER_DICT = eval(HEADER)
print(HEADER)
print("\n")
for elem in HEADER_DICT:
    print(str(elem) + ": " + str(HEADER_DICT[elem]))

b"{'descr': '<u2', 'fortran_order': False, 'shape': (1025, 1024), }                                                    \n"


descr: <u2
fortran_order: False
shape: (1025, 1024)


#### % 64
The length of the magic string, version, HEADER_LENGTH, and HEADER should be evenly divisible by 64

In [180]:
total = (len(first_six_bytes) + 2 + len(HEADER_LEN) + len(HEADER))
total % 64 == 0

True

#### Array Data
You don't even want to know how long I was trying to get this to work before I discovered int.from_bytes()

In [181]:
file.seek(total,0)
for i in range(10):
    print(int.from_bytes(file.read(2), byteorder='little'))

1
3
0
0
75
0
0
0
3
207
