# Parse CIFs

Use `deepometry.parse` module to transform .CIF files to NumPy arrays. In this example, .CIF files are stored at `/data/raw/` in subdirectories corresponding to the experiment, date, sample, replicate, and class labels. There may be more than one .CIF file per class label.

    /data/raw/
        Experiment 001/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A_foo.cif
                            A_bar.cif
                        Class B/
                            B_foo.cif
                            B_foo.cif
        Experiment 002/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A_foo.cif
                        Class B/
                            B_bar.cif
                            B_foo.cif
        ...

Within each .CIF file, we shall parse the images of selected channels of each object into a numpy array, e.g. one cell - one numpy array that contains multiple channels (3D tensor). The arrays are stored at `/data/parsed` in subdirectories mirroring original data structure. Array filenames will have metadata prefixes, followed by a hex series.

    /data/parsed/
        Experiment 001/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A__32e88e1ac3a8f44bf8f77371155553b9.npy
                            A__3dc56a0c446942aa0da170acfa922091.npy
                        Class B/
                            B__8068ef7dcddd89da4ca9740bd2ccb31e.npy
        Experiment 002/
            Day 1/
                Sample A/
                    Replicate 1/
                        Class A/
                            A__8348deaa70dfc95c46bd02984d28b873.npy
                        Class B/
                            B__c1ecbca7bd98c01c1d3293b64cd6739a.npy
                            B__c56cfb8e7e7121dd822e47c67d07e2d4.npy
        ...

# User's settings

In [None]:
input_dir = '/data/raw'
output_dir = '/data/parsed/'

# channels = [0, 1]
channels = None # for all possible channels

frame = 55

montage_size = 0

# Executable

In [None]:
import glob
import os.path

import bioformats
import javabridge

import deepometry.parse

javabridge.start_vm(class_path=bioformats.JARS, max_heap_size="8G")

In [None]:
all_subdirs = [x[0] for x in os.walk(input_dir)]

possible_labels = sorted(list(set([os.path.basename(i) for i in all_subdirs])))

# Book-keepers for all metadata
experiments = [i for i in possible_labels if 'experiment' in i.lower()]
days = [i for i in possible_labels if 'day' in i.lower()]
samples = [i for i in possible_labels if 'sample' in i.lower()]
replicates = [i for i in possible_labels if 'replicate' in i.lower()]
classes = [i for i in possible_labels if 'class' in i.lower()]

In [None]:
print('Parsing... Please wait!')

for exp in experiments:
    for day in days:
        for sample in samples:
            for rep in replicates:
                for cl in classes:
                    folder_path = os.path.join(input_dir,exp,day,sample,rep,cl)

                    if os.path.exists(folder_path):
                        pathnames_tif = glob.glob(os.path.join(folder_path, '*.tif'))
                        pathnames_tiff = glob.glob(os.path.join(folder_path, '*.tiff'))
                        pathnames_cif = glob.glob(os.path.join(folder_path, '*.cif'))

                        for paths in [pathnames_tif, pathnames_tiff, pathnames_cif]:
                            if len(paths) > 0:
                                dest_dir = os.path.join(output_dir,exp,day,sample,rep,cl)

                                deepometry.parse.parse(
                                    paths=paths, 
                                    output_directory=dest_dir, 
                                    meta=exp + '_' + day + '_' + sample + '_' + rep + '_' + cl,                                        
                                    size=int(frame),
                                    channels=channels,
                                    montage_size=int(montage_size)
                                )
                                
print('Done.')