### Download CoralNet (Notebook)

This notebook can be useful for experimenting with the functions in
`Download_CoralNet.py`. The script is designed to be run from the command line,
and will allow a user to download the images, annotations, labelset, and
model metadata from a list containing the IDs of sources they are interested
 in.

#### Import packages

In [1]:
from CoralNet_Download import *

#### Set up authentication

The first step is to authenticate with CoralNet. You need to provide your
username and password. If you don't have an account, you can create one at
https://coralnet.ucsd.edu/. If you don't want to provide your credentials
every time you run the script, you can store them in a separate file, or make
them user/environmental variables. If you don't want to store your credentials
in a file, you can also provide them as arguments when you run the script.

In [2]:
# Username
CORALNET_USERNAME = os.getenv("CORALNET_USERNAME")
USERNAME = input("Username: ") if not CORALNET_USERNAME else CORALNET_USERNAME

# Password
CORALNET_PASSWORD = os.getenv("CORALNET_PASSWORD")
PASSWORD = input("Password: ") if not CORALNET_PASSWORD else CORALNET_PASSWORD

try:
    # Authenticate
    authenticate(USERNAME, PASSWORD)
except Exception as e:
    print(e)

# Set the path to the root directory where you want to save the data for
# each source. The data will be saved in a subdirectory named after the source.
OUTPUT_DIR = "../CoralNet_Data/"

NOTE: Successfully logged in for jordan.pierce@noaa.gov


#### Download one or multiple sources' data

In this cell we'll download a source's data given the ID of the source.
The only requirements are that the source exists and that the user has
permission to access it. The source ID can be found in the URL of the source's
page on CoralNet. You also need to provide your username and password,
regardless of whether the source is public or private. If the source is
private and you do have access to it, you will be able to download the data.

In [None]:
# Next we download the data for each desired source. If you want to download
# data for multiple sources, you can put the source IDs in a list and iterate
# over them, or use multiprocessing/threading to download data simultaneously.

# Source IDs
SOURCE_IDs = [4033]

[download_data(id, USERNAME, PASSWORD, OUTPUT_DIR) for id in SOURCE_IDs];

Downloading Metadata...


#### Download a list of all public sources

If you want to know what sources are available on CoralNet, you can download a
dataframe containing all public source IDs and their name. The dataframe is
saved as a csv file in the output directory.

In [6]:
public_sources = download_coralnet_sources(USERNAME, PASSWORD, OUTPUT_DIR)
public_sources.sample(5)

Downloading CoralNet Source List...
Source ID list exported successfully.


Unnamed: 0,Source_ID,Source_Name
67,3383,BIO 403
601,1358,Tiles_Bermuda2019
436,2970,NFWF
45,3560,Andrej Fucak
518,3032,Rocky Shore Costa Rica


#### Download a list of all public labelsets

You can also download a dataframe containing all the public labelsets. The
dataframe will contain the name of the labelset, the link to the labelset's
page on CoralNet, the functional group, and the other attributes you find in
 the table on the labelset page. The dataframe is saved as a csv file in the
    output directory.

In [10]:
public_labelsets = download_coralnet_labelset(USERNAME, PASSWORD, OUTPUT_DIR)
public_labelsets.sample(5)

Downloading CoralNet Labeset List...
Labelset list exported successfully.


Unnamed: 0,Label ID,Name,URL,Functional Group,Popularity %,Short Code,Duplicate,Duplicate Notes,Verified,Has Calcification Rates
910,5949,Dictiosphaeria versluysii,https://coralnet.ucsd.edu/label/5949/,Hard coral,0,DVER,False,,False,False
5303,2994,Cnidaria: Colonial anemone: zoanthids,https://coralnet.ucsd.edu/label/2994/,Other,40,CC,False,,False,False
2234,2694,0mm Yellow Encrusting Sponge,https://coralnet.ucsd.edu/label/2694/,Other Invertebrates,43,0mmYEnSp,False,,False,False
3121,1648,Crown of Thorns Sea Star,https://coralnet.ucsd.edu/label/1648/,Other Invertebrates,74,COT,False,,True,False
3044,1857,Coral: Black/Octocorals: Massive soft corals,https://coralnet.ucsd.edu/label/1857/,Other Invertebrates,54,OM,True,Duplicate of Soft Coral (massive),False,False


#### Find public sources with desired labelsets

Given a list of desired labelsets, you can then identify which sources
contain them. With the source IDs, you can then download all
public sources that contain those labelsets.

In [7]:
# List of desired labelsets
desired_labelsets = ["Acropora"]
subset_labelset = public_labelsets[public_labelsets["Name"].isin(desired_labelsets)]

subset_labelset

Unnamed: 0,Name,URL,Functional_Group,Popularity %,Short_Code,Duplicate,Duplicate Notes,Verified,Has_Calcification_Rates
25,Acropora,https://coralnet.ucsd.edu/label/59/,Hard coral,85,Acropora,False,,True,True


In [8]:
desired_sources = get_sources_with(subset_labelset, USERNAME, PASSWORD, OUTPUT_DIR)

Finding sources with desired labelsets...
Source ID List exported successfully.


Use this dataframe with the public sources dataframe as filter to identify
 just the sources with the desired labelsets.

In [9]:
public_sources[public_sources["Source_ID"].isin(desired_sources['Source_ID'])]

Unnamed: 0,Source_ID,Source_Name
10,3551,2Maldives_2Katie
28,3305,AFCD trial species
29,4029,Ahmed Fizal
31,290,AIMS LTMP
50,3424,Ant
...,...,...
642,3577,WAPA RFI
652,3415,Windward
654,3046,WorkshopTest
656,109,WSU West Hawaii
