### Annotations to Patches (Notebook)

This notebook shows how to create patches for annotations downloaded from CoralNet. It assumes
that you've already downloaded the data from a source, and that it is stored in the
CoralNet_Data directory.

#### Imports

In [None]:
from utils import *

#### Setting the Data Directory

In [None]:
# The root directory for the data
ROOT = "../CoralNet_Data/"

# The sub-directory containing for a source; note that you can alter this
# to include multiple sources if you want to.
SOURCE_DIR = ROOT + "3420/"
IMAGE_DIR = SOURCE_DIR + "images/"
LABEL_PATH = SOURCE_DIR + "annotations.csv"

# Check that the paths exist
assert os.path.exists(ROOT)
assert os.path.exists(SOURCE_DIR)
assert os.path.exists(IMAGE_DIR)
assert os.path.exists(LABEL_PATH)

# Create a sub-directory for the patches
PATCH_DIR = SOURCE_DIR + "patches/"
os.makedirs(PATCH_DIR, exist_ok=True)

#### Loading the Annotations

Here the annotations are loaded into a pandas dataframe. The dataframe can then be filtered to
only include annotations made by a user, or by a model with high confidence.

In [None]:
# Read in the annotations
labels_df = pd.read_csv(LABEL_PATH)

# Get the annotations made by a user
labels_df = labels_df[labels_df['Annotator'] == 'Imported']

# Subset the dataframe to only include needed columns
labels_df = labels_df[['Name', 'Row', 'Column', 'Label']]

#### Getting the Images

The annotation file will only contain the names of the images, so we need to get the actual
image paths. This is done by getting all of the images in the image directory, and then
filtering the dataframe to only include images that are in the image directory. We'll update the
 dataframe to also include the path of the image.

In [None]:
# For each row in the dataframe, get the name of the image, and provide the path
# to the image in a new column called image_path
image_path = []

for i, r in labels_df.iterrows():
    if os.path.exists(IMAGE_DIR + r['Name']):
        image_path.append(IMAGE_DIR + r['Name'])
    else:
        image_path.append(None)

# Set the image path, filter any rows that don't have an image path
labels_df['Image_Path'] = image_path
labels_df = labels_df[labels_df['Image_Path'].notnull()]

#### Creating the Patches

Now that we have the annotations, we can create the patches. We'll first create sub-directories
for each of the labels within the patch folder. Then we'll extract patches from images and save
them to the appropriate sub-directory.

In [None]:
# Create a sub-directory for each class label
for label in labels_df['Label'].unique():
    os.makedirs(PATCH_DIR + label, exist_ok=True)

# Dataframe to hold information for patches
patches_df = []

# Loop through all of the individual images
for image_name in tqdm(labels_df['Name'].unique()):

    # Get the labels for just this image
    image_df = labels_df[labels_df['Name'] == image_name]

    # Open image
    image = imread(image_df['Image_Path'].iloc[0])

    # Crop each patch in the current image dataframe
    for i, r in image_df.iterrows():

        try:
            patch = crop(image, r['Row'], r['Column'], 56)
            name = f"{r['Row']}_{r['Column']}_{r['Label']}_{image_name}"
            path = PATCH_DIR + r['Label'] + "/" + name

            # Save if it doesn't already exist
            if not os.path.exists(path):
                imsave(fname=path, arr=patch)

            patches_df.append([name, r['Label'], path, image_name])

        except:
            # If the patch is out of bounds, skip it
            continue

#### Creating a CSV File

Finally, we'll create a CSV file that contains the information for each patch. This will be
useful for loading the patches into a dataset for model training.

In [None]:
patches_df = pd.DataFrame(patches_df, columns=['Name', 'Label', 'Path', 'Image_Name'])
patches_df.to_csv(SOURCE_DIR + "patches.csv")