# Parse CIFs

Use `deepometry.parse` module to transform .CIF files to NumPy arrays. In this example, .CIF files are stored at `/data/raw/` in subdirectories corresponding to the experiment, date, sample, replicate, and class labels. There may be more than one .CIF file per class label.

    /data/raw/
        Experiment 001/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A_foo.cif
                            A_bar.cif
                        Class B/
                            B_foo.cif
                            B_foo.cif
        Experiment 002/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A_foo.cif
                        Class B/
                            B_bar.cif
                            B_foo.cif
        ...

Within each .CIF file, we shall parse the images of selected channels of each object into a numpy array, e.g. one cell - one numpy array that contains multiple channels (3D tensor). The arrays are stored at `/data/parsed` in subdirectories mirroring original data structure. Array filenames will have metadata prefixes, followed by a hex series.

    /data/parsed/
        Experiment 001/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A__32e88e1ac3a8f44bf8f77371155553b9.npy
                            A__3dc56a0c446942aa0da170acfa922091.npy
                        Class B/
                            B__8068ef7dcddd89da4ca9740bd2ccb31e.npy
        Experiment 002/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A__8348deaa70dfc95c46bd02984d28b873.npy
                        Class B/
                            B__c1ecbca7bd98c01c1d3293b64cd6739a.npy
                            B__c56cfb8e7e7121dd822e47c67d07e2d4.npy
        ...

# User's settings

In [None]:
input_parse = '/data/raw'
output_parse = '/data/parsed/'

# channels = [0, 1]
channels = None # for all possible channels

frame = 55

montage_size = 0

# Executable

In [None]:
import glob
import os.path
from itertools import groupby

import bioformats
import javabridge

import deepometry.parse

In [None]:
pathnames_tif = glob.glob(os.path.join(input_parse, '**', '*.tif'), recursive = True)
pathnames_tiff = glob.glob(os.path.join(input_parse, '**', '*.tiff'), recursive = True)
pathnames_cif = glob.glob(os.path.join(input_parse, '**', '*.cif'), recursive = True)
if len(pathnames_cif) > 0:
    javabridge.start_vm(class_path=bioformats.JARS, max_heap_size="8G")    
    
print('Parsing... Please wait!')
for paths in [pathnames_tif, pathnames_tiff, pathnames_cif]:
    if len(paths) > 0:
        keyf = lambda path: os.path.dirname(path)
        grouped_paths = [list(items) for gr, items in groupby(paths, key=keyf)]

        for group in grouped_paths:

            meta_as_path = os.path.relpath(os.path.dirname(group[0]),input_parse)

            dest_dir = os.path.join(output_parse,meta_as_path)

            deepometry.parse.parse(
                paths=group, 
                output_directory=dest_dir,  
                meta = '_'.join(os.path.normpath(meta_as_path).split(os.path.sep)),                                 
                size=int(frame),
                channels=channels,
                montage_size=int(montage_size)
            )
                
print('Done.')