# Demo: raw byte manipulations

What is data? It depends on how you (choose to) intepret it! There may be more than one way...

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt
show_as_hex = np.vectorize(hex)

## The data

We are given a file called `binary_image_file`, and the task to visualise it. Unfortunately, the file suffix has been lost, and the only thing our supervisor recalls is that the image size is 324 x 324 pixels. 

"No problem", you say, and present your supervisor with the image:

In [None]:
img_file = 'binary_image_file'
img_size = (324, 324)
n_values = img_size[0] * img_size[1]
with open(img_file, 'rb') as fp:
    image = np.fromfile(fp, dtype=np.uint8, count=n_values).reshape(*img_size)

plt.imshow(image)

__THAT'S NOT RIGHT!!__, your supervisor exclaims, __FIX IT!__.

## Information is only accessible if data structure is known

What went wrong? You realise that first of all, the colour of each pixel on a computer screen is represented by a mixture of the three primary colours: red, green and blue. On a hunch, you decide to try reading the data as RGB-triplets instead.

In [None]:
img_size = (324, 324, 3)
n_values = img_size[0] * img_size[1] * img_size[2]
with open(img_file, 'rb') as fp:
    image = np.fromfile(fp, dtype=np.uint8, count=n_values).reshape(*img_size)
    
plt.imshow(image)

__BETTER! But the colours are off and there's a weird offset in the image; fix it!__

## Binary data formats are usually augmented by meta-information known as the 'header'

In this particular toy example, the image file is of format TIFF. A TIFF file contains information on its contents both before and after the actual pixel colour information. In this particular case (after some investigating), we find that the header of the file consists of precisely 166 bytes of data. After taking this offset into consideration, things look right.

### Load the file header and the data corresponding to the image.

In [None]:
header_size = 166

with open(img_file, 'rb') as fp:
    header = np.fromfile(fp, dtype=np.uint8, count=header_size)
    raw_bytes = np.fromfile(fp, dtype=np.uint8, count=n_values)

### Print the header contents, assuming all of it is human-readable ASCII text

(some of it's not!)

In [None]:
for c in header:
    print(str(chr(c)), end='')

### Print some of the image data in hexadecimal notation

* first 10 bytes
* 10 bytes somewhere 'deeper' inside the file

In [None]:
print(show_as_hex(raw_bytes[:10]))
print(show_as_hex(raw_bytes[40440:40450]))

### Print the same image data segments, now interpreted as integers

In [None]:
print(raw_bytes[:10])
print(raw_bytes[40440:40450])

Each of the colour-channels is one byte 'wide', _i.e._, the amount of each of the primary colours in every pixels is in the range $0 \rightarrow 2^{8} = 255$.

### Reshape the 300-thousand-or-so values back into the shape of the image

In [None]:
reshaped_bytes = raw_bytes.reshape(*img_size)

In [None]:
reshaped_bytes.shape

### Plot

In [None]:
plt.imshow(reshaped_bytes)

### Plot each of the colour channes separately

In [None]:
fig, axs = plt.subplots(1,3)
chans = ['red', 'green', 'blue']
for ii in range(3):
    im = axs[ii].imshow(reshaped_bytes[:, :, ii], cmap='gray')
    axs[ii].set_title(chans[ii])
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.3, 0.05, 0.4])
fig.colorbar(im, cax=cbar_ax)