### Annotations to Patches (Notebook)

This notebook shows how to create patches for annotations downloaded from CoralNet. It assumes
that you've already downloaded the data from a source, and that it is stored in the
CoralNet_Data directory.

#### Imports

In [1]:
import warnings
warnings.filterwarnings("ignore")

from utils import *

#### Setting the Data Directory

In [2]:
SOURCE_ID = str(4189)

In [3]:
# The root directory for the data
ROOT = f"B://CoralNet_Data//"

# The sub-directory containing for a source; note that you can alter this
# to include multiple sources if you want to.
SOURCE_DIR = ROOT + f"{SOURCE_ID}//"
IMAGE_DIR = SOURCE_DIR + "images/"
LABEL_PATH = SOURCE_DIR + "annotations.csv"

# Check that the paths exist
assert os.path.exists(ROOT)
assert os.path.exists(SOURCE_DIR)
assert os.path.exists(IMAGE_DIR)
assert os.path.exists(LABEL_PATH)

# Create a sub-directory for the patches
PATCH_DIR = SOURCE_DIR + "patches/"
os.makedirs(PATCH_DIR, exist_ok=True)

#### Loading the Annotations

Here the annotations are loaded into a pandas dataframe. The dataframe can then be filtered to
only include annotations made by a user, or by a model with high confidence.

In [4]:
# Read in the annotations
labels_df = pd.read_csv(LABEL_PATH)

# Get the annotations made by a user
if "Imported" in labels_df.columns:
    labels_df = labels_df[labels_df['Annotator'] == 'Imported']

# Subset the dataframe to only include needed columns
labels_df = labels_df[['Name', 'Row', 'Column', 'Label']]

In [5]:
labels_df.sample(5)

Unnamed: 0,Name,Row,Column,Label
2392,DSC_8386.JPG,3444,2543,Turf
8056,DSC_8246.JPG,3621,1878,ZOA
25463,DSC_7740.JPG,959,4144,Turf
22105,DSC_7822.JPG,3761,2423,Turf
11515,DSC_8125.JPG,3760,3149,ZOA


#### Getting the Images

The annotation file will only contain the names of the images, so we need to get the actual
image paths. This is done by getting all of the images in the image directory, and then
filtering the dataframe to only include images that are in the image directory. We'll update the
 dataframe to also include the path of the image.

In [6]:
# For each row in the dataframe, get the name of the image, and provide the path
# to the image in a new column called image_path
image_path = []

for i, r in labels_df.iterrows():
    if os.path.exists(IMAGE_DIR + r['Name']):
        image_path.append(IMAGE_DIR + r['Name'])
    else:
        image_path.append(None)

# Set the image path, filter any rows that don't have an image path
labels_df['Image Path'] = image_path
labels_df = labels_df[labels_df['Image Path'].notnull()]

In [7]:
labels_df.sample(5)

Unnamed: 0,Name,Row,Column,Label,Image Path
30302,DSC_7614.JPG,2077,5426,Turf,B://CoralNet_Data//4189//images/DSC_7614.JPG
4229,DSC_8343.JPG,2794,5032,GORSPP,B://CoralNet_Data//4189//images/DSC_8343.JPG
37669,DSC_7381.JPG,40,2010,CALG,B://CoralNet_Data//4189//images/DSC_7381.JPG
11055,DSC_8139.JPG,1273,533,ZOA,B://CoralNet_Data//4189//images/DSC_8139.JPG
36690,DSC_7403.JPG,2460,758,ZOA,B://CoralNet_Data//4189//images/DSC_7403.JPG


#### Creating the Patches

Now that we have the annotations, we can create the patches. We'll first create sub-directories
for each of the labels within the patch folder. Then we'll extract patches from images and save
them to the appropriate sub-directory.

In [8]:
# Create a directory for each class label
for label in labels_df['Label'].unique():
    os.makedirs(PATCH_DIR + label, exist_ok=True)

In [10]:
labels_df = labels_df[labels_df['Name'].isin(labels_df['Name'].unique()[0:3])]
labels_df['Name'].unique().shape

(3,)

In [11]:
# Dataframe to hold information for patches
patches_df = []

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(tqdm(
        executor.map(
            lambda image_df: crop_patches(image_df, PATCH_DIR),
            [labels_df[labels_df['Name'] == image_name] for image_name in labels_df['Name'].unique()]
        ),
        total=len(labels_df['Name'].unique())
    ))


for result in results:
    patches_df.extend(result)

100%|██████████| 3/3 [00:06<00:00,  2.18s/it]


#### Creating a CSV File

Finally, we'll create a CSV file that contains the information for each patch. This will be
useful for loading the patches into a dataset for model training.

In [12]:
patches_df = pd.DataFrame(patches_df, columns=['Name', 'Label', 'Path', 'Image Name', 'Image Path'])
patches_df.to_csv(SOURCE_DIR + "patches.csv")