### CoralNet Labelset (Notebook)

This notebook walks through the steps of creating a labelset on CoralNet. Creating a labelset is
 an action that cannot be undone without the assistance of CoralNet. Please note that this is
 just an example of how you can set up to be able to create labelsets in bulk; change what you
 need to, to accommodate for your specific project needs.

**Do not abuse this tool**

CoralNet uses labelsets to combine similar images and annotations to a single class category. A
labelset is created and shared globally across CoralNet. The label name and short code (a short,
 representative string) must be unique, and description and thumbnail for the label are
 required; the functional group must be one of the following:
```python
"Other Invertebrates",
"Hard coral",
"Soft Substrate",
"Hard Substrate",
"Other",
"Algae",
"Seagrass"
```

### Imports

In [5]:
import sys
sys.path.append("../")

from CoralNet_Labelset import *
from CoralNet_Download import *

#### Set up authentication

The first step is to authenticate with CoralNet. You need to provide your
username and password. If you don't have an account, you can create one at
https://coralnet.ucsd.edu/. If you don't want to provide your credentials
every time you run the script, you can store them in a separate file, or make
them user/environmental variables. If you don't want to store your credentials
in a file, you can also provide them as arguments when you run the script.

In [6]:
# Username
CORALNET_USERNAME = os.getenv("CORALNET_USERNAME")
USERNAME = input("Username: ") if not CORALNET_USERNAME else CORALNET_USERNAME

# Password
CORALNET_PASSWORD = os.getenv("CORALNET_PASSWORD")
PASSWORD = input("Password: ") if not CORALNET_PASSWORD else CORALNET_PASSWORD

try:
    # Authenticate
    authenticate(USERNAME, PASSWORD)
except Exception as e:
    print(e)


###############################################
Authentication
###############################################

NOTE: Authenticating user jordan.pierce@noaa.gov
NOTE: Authentication successful for jordan.pierce@noaa.gov


### Source ID

To create a labelset, you must create it for the intention of using it with a specific source.
Therefore, to use this script, you must provide a source ID that will contain this labelset.

In [7]:
# ID of the source to upload data to
SOURCE_ID = 4098

### Labelsets

In this example, we're creating CoralNet labelsets for each of the labels used in Mission Iconic
 Reefs (MIR). Below we open a table containing information for each label used by MIR.

In [8]:
from VISCORE import *

vpi_table.sample(3)

Unnamed: 0,VPI_ID,VPI_label_V3,VPI_label_V4,VPI_name,Taxo_high,Fxnl_high,Fxnl_low
19,18,Green_Diploso,Green_diploso,Green_Diplosoma,Tunicata,Invertebrates,Invertebrates
27,26,M_phar,M_pha,Madracis_pharensis,Astrocoeniidae,Hard_coral,Hard_coral
21,20,Icilligorgia,Icilligorgia,Icilligorgia,Octocoral,Soft_coral,Soft_coral


### Avoid Duplicates

Mentioned above, the label and short code must be unique. To avoid creating a duplicate, we'll
use the `get_coralnet_labelsets` function in the `Download_CoralNet` script and use it to
prevent us from creating a duplicate.

To do this, we need to create a driver (i.e., browser) and login to CoralNet with our
credentials. Then we'll download the most up-to-date labelset list from CoralNet.

In [9]:
# Pass the options object while creating the driver
driver = check_for_browsers(headless=True)

# Store the credentials in the driver
driver.capabilities['credentials'] = {
    'username': USERNAME,
    'password': PASSWORD
}

# Log in to CoralNet
driver, _ = login(driver)


###############################################
Browser
###############################################



[WDM] - Downloading: 100%|██████████| 6.30M/6.30M [00:01<00:00, 6.38MB/s]


NOTE: Using Google Chrome

###############################################
Login
###############################################

NOTE: Successfully logged in for jordan.pierce@noaa.gov


In [10]:
# Get the most up-to-date labelset from CoralNet
driver, coralnet_labelset = download_coralnet_labelsets(driver, None)

# Convert the Name column to be a standard version
coralnet_labelset['name'] = [n.lower() for n in coralnet_labelset['Name'].values]

coralnet_labelset.sample(3)


###############################################
CoralNet Labelset Dataframe
###############################################

NOTE: Downloading CoralNet Labelset Dataframe


100%|██████████| 7547/7547 [00:00<00:00, 13993.73it/s]

NOTE: Labelset Dataframe saved successfully





Unnamed: 0,Label ID,Name,URL,Functional Group,Popularity %,Short Code,Duplicate,Duplicate Notes,Verified,Has Calcification Rates
1726,5464,Pavona spp. massive,https://coralnet.ucsd.edu/label/5464/,Hard coral,0,PAVMA,False,,False,False
6129,423,Caulerpa,https://coralnet.ucsd.edu/label/423/,Algae,80,Caulerpa,False,,True,False
4471,841,Solaster endeca,https://coralnet.ucsd.edu/label/841/,Other Invertebrates,0,Sen,False,,False,False


### Finding Existing Labelsets

Before creating labelsets on CoralNet, we first identify any existing labelsets that match our
labels. To do this, well loop through our list of labels, and find those that match the ones
just downloaded from CoralNet.

In [9]:
# A list to hold all CoralNet labelsets for VPI labels
labelsets = []
no_labelsets = []

count = 0

for i, r in vpi_table.iterrows():

    # First try to get the name, exactly
    name = r['VPI_name'].lower().replace("_", " ")
    labelset = coralnet_labelset[coralnet_labelset['name'] == name]

    # If that doesn't work, try to get name, but slightly modified
    if labelset.empty:
        name = r['VPI_name'].lower().replace("_sp", " spp.")
        labelset = coralnet_labelset[coralnet_labelset['name'] == name]

    if labelset.empty:
        name = r['VPI_name'].lower() + " spp."
        labelset = coralnet_labelset[coralnet_labelset['name'] == name]

    if labelset.empty:
        name = r['VPI_name'].lower().replace("_sp", "")
        labelset = coralnet_labelset[coralnet_labelset['name'] == name]

    # If that doesn't work, try to get the sub-label, exactly
    if labelset.empty:
        label = r['VPI_label_V3'].lower().replace("_", " ")
        labelset = coralnet_labelset[coralnet_labelset['name'] == label]

    if labelset.empty:
        label = r['VPI_label_V4'].lower().replace("_", " ")
        labelset = coralnet_labelset[coralnet_labelset['name'] == label]

    # If the labelset isn't empty, then we found an exact match
    if not labelset.empty:
        r['Label ID'] = labelset['Label ID'].item()
        r['Name'] = labelset['Name'].item()
        r['Short Code'] = labelset['Short Code'].item()
        r['Functional Group'] = labelset['Functional Group'].item()
        labelsets.append(r)
    else:
        # Use fuzzy logic to find the 5 closest matches
        name = r['VPI_name'].lower().replace("_", " ")
        choices = process.extract(name, coralnet_labelset['name'].values, limit=5)
        choice_name = [c[0] for c in choices]
        choice_score = [c[1] for c in choices]

        choices = coralnet_labelset[coralnet_labelset['name'].isin(choice_name)]

        for _, (i_, r_) in enumerate(choices.iterrows()):
            r[f'Name {str(_+1)}'] = r_['Name']
            r[f'Label ID {str(_+1)}'] = r_['Label ID']
            r[f'Functional Group {str(_+1)}'] = r_['Functional Group']

        no_labelsets.append(r)

pd.DataFrame(no_labelsets).to_csv("./VISCORE/CoralNet_VPI_Labelset_No_Exact_Match.csv")
pd.DataFrame(labelsets).to_csv("./VISCORE/CoralNet_VPI_Labelset_With_Exact_Match.csv")

Now we move on to creating labelsets on CoralNet for those labels that do not already exist. In
this next cell, we first find all the labelsets that start with a prefix used by MIR, so we know
if one already exists.

In [19]:
# Custom Prefix
prefix = "MIR_"

# List of names and short codes already in use
used_names = coralnet_labelset['Name'].values.tolist()
used_short_codes = coralnet_labelset['Short Code'].values.tolist()

### Creating a Labelset

This next loop creates a short code for each label in the VPI table, and then creates a
dictionary that is then passed to the function.

In [17]:
for i, r in vpi_table.iterrows():

    # This label already has an existing labelset, so skip it
    if r['VPI_name'].lower().replace("_", " ") in coralnet_labelset['name'].values:
        continue

    try:
        # Create a short code for the name
        short_code = get_short_code(r['VPI_name'], prefix, used_short_codes, 4)
        # Add to list so it won't be used again
        used_short_codes.append(short_code)

        # Change the name to MIR Name
        name = f"{prefix}{r['VPI_name']}"
        name = name.replace("_", " ")
        # Check that it doesn't already exist
        assert name not in used_names

    except Exception as e:
        print(f"ERROR: Could not create labelset for {r['VPI_name']}\n{e}")
        continue

    # Get the functional group
    fxnl_group = FUNC_GROUPS_DICT[r['Fxnl_high']]

    # The name of the label, grabbed from VPI_name
    labelset_name = name
    # The unique short code, created from function
    labelset_short = short_code
    # Mapped from VPI to CoralNet
    labelset_func = fxnl_group
    # Boilerplate info, plus some MIR specific info
    labelset_description = DESCRIPTION
    labelset_description += f"For MIR Use: \n"
    labelset_description += f"VPI_ID: {r['VPI_ID']}\n"
    labelset_description += f"VPI_label_V3: {r['VPI_label_V3']}\n"
    labelset_description += f"VPI_label_V$: {r['VPI_label_V4']}\n"
    labelset_description += f"VPI_name: {r['VPI_name']}\n"
    labelset_description += f"Taxo_high: {r['Taxo_high']}\n"
    labelset_description += f"Fxnl_high: {r['Fxnl_high']}\n"
    labelset_description += f"Fxnl_low: {r['Fxnl_low']}"
    # Image downloaded from internet
    labelset_thumbnail = "./Figures/MIR_Logo.png"
    labelset_thumbnail = os.path.abspath(labelset_thumbnail)

    # Dict to pass to function
    labelset = {"Name": labelset_name,
                "Short Code": labelset_short,
                "Functional Group": labelset_func,
                "Description": labelset_description,
                "Image Path": labelset_thumbnail}

    driver, success = create_labelset(driver, SOURCE_ID, labelset)

    if success:
        # CoralNet specific names, don't change
        r['Name'] = name
        r['Short Code'] = short_code
        r['Functional Group'] = fxnl_group

        # Append to list
        labelsets.append(r)


NOTE: Navigating to labelset creation page
NOTE: Submitted labelset MIR Black coral, MIR_BLAC


In [29]:
# Output the updated table for posterity
pd.DataFrame(labelsets).to_csv("./VISCORE/CoralNet_VPI_Labelsets.csv")

### Close the Browser

In [22]:
driver.close()