#### This notebook preprocesses all annotation files to annotation tif masks.
> Note: Can be run again if more data is added. Already creted mask files are skipped.

---------------------

# Dependencies

This notebook uses the [ASAP](https://github.com/computationalpathologygroup/ASAP/releases) 1.8 package for reading large tif files and annotations. [OpenSlide](https://openslide.org/download/) is also required as a dependency.

In [1]:
# add ASAP path to sys to locate the multiresolutionimageinterface
import sys
sys.path.append('/opt/ASAP/bin')

In [2]:
# import other necessary libraries
import multiresolutionimageinterface as mir
import cv2
from tqdm import tqdm_notebook
import os

In [3]:
reader = mir.MultiResolutionImageReader()
annotation_list = mir.AnnotationList()
xml_repository = mir.XmlRepository(annotation_list)

**Data and annotation directories**
> Note: Check that these are correct.

In [4]:
dirAnnotations = 'data/annotations/'
dirData = 'data/training/'

Get a list of tif image files. These are in different folders like 'center_0', 'center_1', etc.

In [5]:
ImageFiles = []
# r=root, d=directories, f = files
for r, d, f in os.walk(dirData):
    for file in f:
        if '.tif' in file and 'mask' not in file:
            ImageFiles.append(os.path.join(r, file))

This creates an annotation TIF mask out of annotation polygon file.

In [6]:
def CreateAnnotationMask(annotationPath):
    
    # get only the name without dir or file suffix
    fileNamePart = annotationPath.replace('.xml','').replace(dirAnnotations, "")
    
    # what is the corresponding tif file - directories may vary so search from the list
    tifName = fileNamePart + '.tif'
    partialMatches = [s for s in ImageFiles if tifName in s]
    if len(partialMatches) == 0:
        print('Warning - This file is missing from the file list: {0} - skipping.'.format(tifName))
        return
    tifPath = partialMatches[0]
    
    # skip if tif file is not found
    if (not os.path.isfile(tifPath)): 
        print('Warning - Could not locate {0} - skipping this annotation file.'.format(tifPath))
        return
    
    # Skip if this mask is already found
    maskPath = tifPath.replace('.tif', '_mask.tif')
    if (os.path.isfile(maskPath)):
        print('Info - Mask file of {0} already exists - skipping'.format(tifPath))
        return
    
    # create mask
    xml_repository.setSource(annotationPath)
    xml_repository.load()
    annotation_mask = mir.AnnotationToMask()
    mr_image = reader.open(tifPath)
    if(mr_image is None):
        print('Warning - Could not read {0} - skipping'.format(tifPath))
        return
    label_map = {'metastases': 1, 'normal': 2}
    conversion_order = ['metastases', 'normal']
    annotation_mask.convert(annotation_list, 
                            maskPath, 
                            mr_image.getDimensions(), 
                            mr_image.getSpacing(), 
                            label_map, 
                            conversion_order)

Collect all annotation files

In [7]:
AnnotationFiles = []
# r=root, d=directories, f = files
for r, d, f in os.walk(dirAnnotations):
    for file in f:
        if '.xml' in file:
            AnnotationFiles.append(os.path.join(r, file))

> Note: This may take hours. You may run this notebook in the background over night.

In [8]:
for f in tqdm_notebook(AnnotationFiles, 'Creating masks...'):
    print('Annotation file: ' + f)
    CreateAnnotationMask(f)

HBox(children=(IntProgress(value=0, description='Creating masks...', max=50, style=ProgressStyle(description_w…

Annotation file: data/annotations/patient_004_node_4.xml
Info - Mask file of data/training/center_0/patient_004_node_4.tif already exists - skipping
Annotation file: data/annotations/patient_009_node_1.xml
Info - Mask file of data/training/center_0/patient_009_node_1.tif already exists - skipping
Annotation file: data/annotations/patient_010_node_4.xml
Info - Mask file of data/training/center_0/patient_010_node_4.tif already exists - skipping
Annotation file: data/annotations/patient_012_node_0.xml
Info - Mask file of data/training/center_0/patient_012_node_0.tif already exists - skipping
Annotation file: data/annotations/patient_015_node_1.xml
Info - Mask file of data/training/center_0/patient_015_node_1.tif already exists - skipping
Annotation file: data/annotations/patient_015_node_2.xml
Info - Mask file of data/training/center_0/patient_015_node_2.tif already exists - skipping
Annotation file: data/annotations/patient_016_node_1.xml
Info - Mask file of data/training/center_0/patien