### Annotations to Patches (Notebook)

This notebook shows how to create patches for annotations downloaded from CoralNet. It assumes
that you've already downloaded the data from a source, and that it is stored in the
CoralNet_Data directory.

#### Imports

In [1]:
import warnings
warnings.filterwarnings("ignore")

from ..Tools.Patches import *

#### Setting the Data Directory

In [2]:
SOURCE_ID = str(4189)

In [3]:
# The root directory for the data
ROOT = f"B://CoralNet_Data//"

# The sub-directory containing for a source; note that you can alter this
# to include multiple sources if you want to.
SOURCE_DIR = ROOT + f"{SOURCE_ID}//"
IMAGE_DIR = SOURCE_DIR + "images/"
LABEL_PATH = SOURCE_DIR + "annotations.csv"

# Check that the paths exist
assert os.path.exists(ROOT)
assert os.path.exists(SOURCE_DIR)
assert os.path.exists(IMAGE_DIR)
assert os.path.exists(LABEL_PATH)

# Create a sub-directory for the patches
PATCH_DIR = SOURCE_DIR + "patches/"
os.makedirs(PATCH_DIR, exist_ok=True)

#### Loading the Annotations

Here the annotations are loaded into a pandas dataframe. The dataframe can then be filtered to
only include annotations made by a user, or by a model with high confidence.

In [4]:
# Read in the annotations
labels_df = pd.read_csv(LABEL_PATH)

# Get the annotations made by a user
if "Imported" in labels_df.columns:
    labels_df = labels_df[labels_df['Annotator'] == 'Imported']

# Subset the dataframe to only include needed columns
labels_df = labels_df[['Name', 'Row', 'Column', 'Label']]

In [5]:
labels_df.sample(5)

Unnamed: 0,Name,Row,Column,Label
22025,DSC_7824.JPG,3283,3186,Turf
26781,DSC_7716.JPG,1696,2847,Turf
25860,DSC_7732.JPG,242,435,CALG
2911,DSC_8375.JPG,1013,308,CCA1
12945,DSC_8089.JPG,3104,317,ZOA


#### Getting the Images

The annotation file will only contain the names of the images, so we need to get the actual
image paths. This is done by getting all of the images in the image directory, and then
filtering the dataframe to only include images that are in the image directory. We'll update the
 dataframe to also include the path of the image.

In [6]:
# For each row in the dataframe, get the name of the image, and provide the path
# to the image in a new column called image_path
image_path = []

for i, r in labels_df.iterrows():
    if os.path.exists(IMAGE_DIR + r['Name']):
        image_path.append(IMAGE_DIR + r['Name'])
    else:
        image_path.append(None)

# Set the image path, filter any rows that don't have an image path
labels_df['Image Path'] = image_path
labels_df = labels_df[labels_df['Image Path'].notnull()]

In [7]:
labels_df.sample(5)

Unnamed: 0,Name,Row,Column,Label,Image Path
35935,DSC_7430.JPG,1136,3759,ZOA,B://CoralNet_Data//4189//images/DSC_7430.JPG
39249,DSC_7322.JPG,1878,2915,ZOA,B://CoralNet_Data//4189//images/DSC_7322.JPG
13662,DSC_8074.JPG,1725,1994,Turf,B://CoralNet_Data//4189//images/DSC_8074.JPG
5937,DSC_8318.JPG,2565,4948,ZOA,B://CoralNet_Data//4189//images/DSC_8318.JPG
1217,DSC_8432.JPG,3084,852,Turf,B://CoralNet_Data//4189//images/DSC_8432.JPG


#### Creating the Patches

Now that we have the annotations, we can create the patches. We'll first create sub-directories
for each of the labels within the patch folder. Then we'll extract patches from images and save
them to the appropriate sub-directory.

In [8]:
# Create a directory for each class label
for label in labels_df['Label'].unique():
    os.makedirs(PATCH_DIR + label, exist_ok=True)

In [9]:
# Dataframe to hold information for patches
patches_df = []

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(tqdm(
        executor.map(
            lambda image_df: crop_patches(image_df, PATCH_DIR),
            [labels_df[labels_df['Name'] == image_name] for image_name in labels_df['Name'].unique()]
        ),
        total=len(labels_df['Name'].unique())
    ))


for result in results:
    patches_df.extend(result)

100%|██████████| 1213/1213 [04:12<00:00,  4.80it/s]


#### Creating a CSV File

Finally, we'll create a CSV file that contains the information for each patch. This will be
useful for loading the patches into a dataset for model training.

In [12]:
patches_df = pd.DataFrame(patches_df, columns=['Name', 'Path', 'Label', 'Image Name', 'Image Path'])
patches_df.to_csv(SOURCE_DIR + "patches.csv")

In [13]:
patches_df

Unnamed: 0,Name,Path,Label,Image Name,Image Path
0,DSC_8445_2604_5285_Sand.png,B://CoralNet_Data//4189//patches/Sand/DSC_8445...,Sand,DSC_8445.JPG,B://CoralNet_Data//4189//images/DSC_8445.JPG
1,DSC_8445_2230_3865_Turf.png,B://CoralNet_Data//4189//patches/Turf/DSC_8445...,Turf,DSC_8445.JPG,B://CoralNet_Data//4189//images/DSC_8445.JPG
2,DSC_8445_2580_4270_Turf.png,B://CoralNet_Data//4189//patches/Turf/DSC_8445...,Turf,DSC_8445.JPG,B://CoralNet_Data//4189//images/DSC_8445.JPG
3,DSC_8445_2139_4752_Sand.png,B://CoralNet_Data//4189//patches/Sand/DSC_8445...,Sand,DSC_8445.JPG,B://CoralNet_Data//4189//images/DSC_8445.JPG
4,DSC_8445_2144_5050_Sand.png,B://CoralNet_Data//4189//patches/Sand/DSC_8445...,Sand,DSC_8445.JPG,B://CoralNet_Data//4189//images/DSC_8445.JPG
...,...,...,...,...,...
44123,DSC_7150_553_1251_Turf.png,B://CoralNet_Data//4189//patches/Turf/DSC_7150...,Turf,DSC_7150.JPG,B://CoralNet_Data//4189//images/DSC_7150.JPG
44124,DSC_7150_416_1612_Turf.png,B://CoralNet_Data//4189//patches/Turf/DSC_7150...,Turf,DSC_7150.JPG,B://CoralNet_Data//4189//images/DSC_7150.JPG
44125,DSC_7150_444_2186_Turf.png,B://CoralNet_Data//4189//patches/Turf/DSC_7150...,Turf,DSC_7150.JPG,B://CoralNet_Data//4189//images/DSC_7150.JPG
44126,DSC_7150_409_43_Turf.png,B://CoralNet_Data//4189//patches/Turf/DSC_7150...,Turf,DSC_7150.JPG,B://CoralNet_Data//4189//images/DSC_7150.JPG
