# Extract Image from Raw Files

This notebook tests code that extract .tif images from .raw (binary) files. I aim to create a "to_tif.py" equivalence for .raw files. The .raw format is not a standard image format. I describe the structure of .raw file below, and explain the procedure of converting .raw to sequences of .tif.

## 1 The structure of .raw file 

.raw file is a binary file containing all the images and their corresponding image numbers. Such structure can be illustrated as the following sketch
<img src='doc-images/raw-structure.png'>

## 2 Rearrange the sequential data into seperate images

Compared to using a for loop to iterate over all the images one-by-one, reading the whole stack is apparently more efficient. I will read the whole stack, and then rearrange the array as the following sketch, so that I can seperate the images. 
<img src='doc-images/seperate.png'>

Then, the sequential "pixel info" is reshaped into the actual dimensions of images (with height and width), as shown below:


<img src="doc-images/reshape.png">

Finally, we can save the images as a sequence of .tif files, which are easier to play with. 

## 3 Image labels

The image labels (numbers) are saved as `uint32` in the .raw file. When reading the .raw file using `np.fromfile` with `dtype=uint16`, the `uint32` number is represented as two `uint16` numbers. 

For example, in the sample images below, the label of the first image is `a = 60639`. When represented as two `uint16` numbers, it's `b = [60638, 0]`. To make image labels consistent with numbers in stagePosition.txt file, I use `a = b[0] + b[1] * 2 ** 16 + 1` to convert the two `uint16` to one `uint32`. 

**NOTE:** since I only have one data point now, I am not totally sure that this conversion is correct for all the images. Verify the relation when more data are available.

## 4 py script implementation

A python script `py_files/extractImages.py` is implemented to enable batch processing without opening and modifying this notebook. The original working functions are kept here for future development. Other testing contents are removed.

In [1]:
import os
import numpy as np
from matplotlib import pyplot as plt
from skimage import io

## 5 Wrap in a function

1. Check for necessary files: RawImage.raw, RawImageInfo.txt, ...
2. Read image info from RawImageInfo.txt
3. Read RawImage.raw and save .tif images

In [2]:
def check_necessary_files(folder):
    """
    Check for necessary files: RawImage.raw, RawImageInfo.txt, ...
    """
    return os.path.exists(os.path.join(folder, 'RawImage.raw')) and \
            os.path.exists(os.path.join(folder, 'RawImageInfo.txt'))

In [3]:
# test check_necessary_files(folder)
folder = '/home/zhengyang/Documents/MATLAB/image-processing/ExtractImage/test-files'
check_necessary_files(folder)

True

In [4]:
def read_raw_image_info(info_file):
    """
    Read image info, such as fps and image dimensions,
    from RawImageInfo.txt
    """
    with open(info_file, 'r') as f:
        a = f.read()
    fps, h, w = a.split('\n')[0:3]
    return int(fps), int(h), int(w)

In [5]:
# test read_raw_image_info(info_file)
info_file = os.path.join(folder, 'RawImageInfo.txt')
read_raw_image_info(info_file)

(80, 1024, 1024)

In [16]:
def raw_to_tif(raw_file, img_dim, save_folder):
    """
    Read RawImage.raw and save .tif images
    
    Args:
    raw_file -- the directory of .raw file
    img_dim -- the (h, w) tuple of each frame
    """
    
    
    a = np.fromfile(raw_file, dtype='uint16')
    # to make sure the raw_file contains correct information,
    # we check if the number of numbers in array a can be
    # divided exactly by h*w+2
    h, w = img_dim
    assert(a.shape[0] % (h*w+2) == 0)
    num_images = a.shape[0] // (h*w+2)
    
    folder = os.path.split(raw_file)[0]
    write_log(folder, num_images)
    
    img_in_row = a.reshape(num_images, h*w+2)
    labels = img_in_row[:, :2] # not in use currently
    images = img_in_row[:, 2:]
    images_reshape = images.reshape(num_images, h, w)
    
    # save the images as .tif sequence
    if os.path.exists(save_folder) == False:
        os.makedirs(save_folder)
#     for num, img in enumerate(images_reshape):
#         io.imsave(os.path.join(save_folder, '{:04d}.tif'.format(num)), img, check_contrast=False)
    for label, img in zip(labels, images_reshape):
        num = label[0] + label[1] * 2 ** 16 + 1
        io.imsave(os.path.join(save_folder, '{:08d}.tif'.format(num)), img, check_contrast=False)

In [17]:
# test raw_to_tif(raw_file, img_dim, save_folder)
raw_file = os.path.join(folder, 'RawImage.raw')
img_dim = (1024, 1024)
save_folder = os.path.join(folder, 'images')
raw_to_tif(raw_file, img_dim, save_folder)

In [14]:
# combine the functions
def extract_images(folder):
    if check_necessary_files(folder):
        info_file = os.path.join(folder, 'RawImageInfo.txt')
        raw_file = os.path.join(folder, 'RawImage.raw')
        save_folder = os.path.join(folder, 'images')
        fps, h, w = read_raw_image_info(info_file)
        raw_to_tif(raw_file, (h, w), save_folder)
    else:
        print('Imcomplete files')

In [10]:
# test extract_images(folder)
folder = '/home/zhengyang/Documents/GitHub/Python/generic_proc/test_images/extractImages'
extract_images(folder)

In [13]:
def write_log(folder, num_images):
    """
    Generate a log file to the same folder as .raw file.
    Records the total number of frames in .raw.
    This log is used to check if the image extraction is complete.
    
    Args:
    folder -- folder of .raw
    num_images -- total number of frames in .raw
    """
    
    with open(os.path.join(folder, 'log.txt'), 'w') as f:
        f.write("Raw image has {:d} frames".format(num_images))