# IFCB Dashboard to OBIS use case with manual annotation
# Event core with Occurrence extension
Karina Khazmutdinova and Trevor Golden (Axiom), Stace Beaulieu (WHOI)

With thanks to Kasia Kenitz (UCSD) and Heidi Sosik and Joe Futrelle (WHOI)

Prepared for Marine Biological Data Mobilization Workshop March 2022

**This is a prototype for testing purposes only.**

**A protocol is being developed to determine if and when appropriate to submit products from automated classification to OBIS.**

Preconditions:
- IFCB Dashboard sample (bin) has autoclass csv file with scores from automated classifier
- IFCB Dashboard sample has been populated with latitude, longitude, and depth
- For autoclass labels: A lookup table has been prepared to match the autoclass labels to the World Register of Marine Species (WoRMS) (and to the SeaBASS namespace for non-biota particles)
- Manual annotations are available for at least a subset of the ROIs in the IFCB sample
- For manual annotation labels: In this workflow the manual annotation database was already matching annotation labels to WoRMS AphiaID identifiers; thus, we used a script to use the provided AphiaID to acquire the current valid AphiaID and associated scientific name.


This workflow is being developed with several best practices:
- Neeley et al. (2021) http://dx.doi.org/10.1575/1912/27377
- EU Horizon 2020 “Best practices and recommendations for plankton imagery data management”
- Horton et al. (2021) https://doi.org/10.3389/fmars.2021.620702

Import Python libraries to access IFCB Dashboard, interact with WoRMS API, and manipulate tabular data

In [1]:
import numpy as np
import pandas as pd
import pyworms # pyworms package is not in Anaconda distribution
import re
import requests


Define functions to acquire sample (bin) numbers and spatiotemporal metadata from IFCB Dashboard

In [2]:
# We adopted some code contributed by Joe Futrelle (WHOI)

def get_metric_timeseries(dataset, metric, start='2000-01-01', end='2100-01-01'):
    url = f'{BASE_URL}/{dataset}/api/feed/{metric}/start/{start}/end/{end}'
    r = requests.get(url)
    assert r.ok
    return r.json()


def shorten(fq_pid):
    return re.sub('.*/', '', fq_pid)


def list_bins(dataset, start='2000-01-01', end='2100-01-01'):
    metric_timeseries = get_metric_timeseries(dataset, METRIC, start, end)
    metric_timeseries = sorted(metric_timeseries, key=lambda record: record['date'])
    return [shorten(record['pid']) for record in metric_timeseries]


def latest_bin(dataset):
    return list_bins(dataset)[-1]


def basic_info(pid):
    url = f'{BASE_URL}/api/bin/{pid}?include_coordinates=False'
    r = requests.get(url)
    assert r.ok
    record = r.json()
    return {
        'datetime': record['timestamp_iso'],
        'previous_bin': record['previous_bin_id'],
        'next_bin': record['next_bin_id'],
        'latitude': record['lat'],
        'longitude': record['lng'],
        'depth': record['depth'],
        'instrument': record['instrument'],
        'number_of_rois': record['num_images'],
        'ml_analyzed': float(re.sub(' .*', '', record['ml_analyzed'])),
        'concentration': record['concentration'],
        'sample_type': record['sample_type'],
        'cruise': record['cruise'],
        'cast': record['cast'],
        'niskin': record['niskin'],
    }

Define function to select appropriate name/ID pair from autoclass lookup table manually prepared from WoRMS Taxon Match

In [3]:
# presently this acquires the name/ID from the autoclass lookup table
# autoclass lookup table includes SeaBASS namespace for non-biota particles

# ultimately for some taxa expect to use WoRMS API to provide name/ID at higher rank
# in that case consider adding a function that uses identificationQualifier

def get_sci_attrs(r):
    if r['Taxon status'] == 'unaccepted':
        r['scientificNameID'] = "urn:lsid:marinespecies.org:taxname:" + str(r['AphiaID_accepted'])
        r['scientificName'] = r['ScientificName_accepted']
#    if r['Taxon status'] == 'NA': 
#        r['scientificNameID'] = r['LSID'] 
#        r['scientificName'] = r['ScientificName_accepted'] # did not manually populate this column when status NA
# the following else statement applies to all except status unaccepted
    else:
        r['scientificNameID'] = r['LSID'] 
        r['scientificName'] = r['ScientificName']
    return r

Define function to acquire the autoclass scores from the Dashboard and merge the winning autoclass to name/ID pair from lookup table

In [4]:
# Code for this function was developed from an initial prototype by Kathy Qi (EDI Fellow at WHOI)
# https://github.com/klqi/EDI-NES-LTER-2019/tree/master/auto_join

# We would like to split this function into multiple functions
# to accommodate decisions with regard to interpreting the autoclass scores
# and with regard to the particular taxa that the provider would like to provide to OBIS

# We would like to accommodate different thresholds for different taxa

# If subsetting only to particular taxa, we would need to alert the provider re: false positives:
# e.g., consider adding columns to the intermediate data table to report when
# an autoclass label other than the set of targeted labels had a higher autoclass score

# Presently hard coding for the column with autoclass labels
# autoclass list version is class_vNone_SIO_Delmar_mooring_20220225
 

def get_sci_id_name(url, dataset, pid, lookup_path, threshold):
    path = f'{url}/{dataset}/{pid}_class_scores.csv'
    df = pd.read_csv(path)
    classes = df.iloc[:, 1:]
    classes = classes.mask(classes < threshold) # presently constrained to single threshold
    winner = classes.idxmax(axis=1)
    winner.name = 'autoclass'
    winner_df = winner.to_frame()
    lookup = pd.read_csv(lookup_path)
    merged = pd.merge(winner_df, lookup, how="left", left_on="autoclass", right_on="class_vNone_SIO_Delmar_mooring_20220225")
    df_sci_id = merged.apply(get_sci_attrs, axis=1)
    df_sci_id = df_sci_id[["autoclass","scientificName","scientificNameID","Kingdom"]] # retain autoclass label
    occurance_id = df["pid"] 
    new_df = pd.concat([occurance_id, df_sci_id], axis=1)
    
    return new_df

Define function to acquire size data from the Dashboard 

Note that size data will ultimately be included in the Extended Measurement or Fact (EMOF) table

In [5]:
# Code for this function was developed from an initial prototype by Kathy Qi (EDI Fellow at WHOI)
# https://github.com/klqi/EDI-NES-LTER-2019/tree/master/auto_join

# note this presently only retains one of the recommended features

def get_size(url, dataset, pid, lookup_path):
    path = f'{url}/{dataset}/{pid}_features.csv'
    df = pd.read_csv(path)
    size = df["Area"] 
    return size

Define function to merge manual annotation label to name/ID pair generated on-the-fly from WoRMS API

In [6]:
def get_valid_sci_attrs(row):
    # print(row['roi']) # this provides a visual indicator of progress when running this function 
    if not pd.isnull(row['worms']):
        a = int(row['worms'])
        # print(a) # this provides a visual indicator of progress when running this function
        record = pyworms.aphiaRecordByAphiaID(a)
        row['valid_AphiaID'] = record['valid_AphiaID']
        row['valid_name'] = record['valid_name']
        row['valid_kingdom'] = record['kingdom']
    else:
        row['valid_AphiaID'] = np.nan
        row['valid_name'] = np.nan
        row['valid_kingdom'] = np.nan
    return row

**Initiate the workflow**

Provide path to the IFCB Dashboard and auto-select or specify a sample bin

In [7]:
# This workflow is specifying a particular sample bin for which we have manual annotations

DATASET = 'SIO_Delmar_mooring' 
BASE_URL = 'https://ifcb-data.whoi.edu'
METRIC = 'temperature'
LOOKUP_PATH = 'autoclass_lookup_20220227.csv'
# LAST_PID = latest_bin(DATASET) # Auto-select the most recent sample bin
LAST_PID = 'D20180406T033616_IFCB115'
THRESHOLD = 0.75 # develop further to accommodate different thresholds for different taxa


**Run the workflow**

First generate a data table for the automated classification of a sample bin

This populates the scientific name/ID that matches the winning autoclass

In [8]:
# note to expect NaN's if did not meet threshold

df = get_sci_id_name(BASE_URL, DATASET, LAST_PID, LOOKUP_PATH, THRESHOLD)
df.head()

Unnamed: 0,pid,autoclass,scientificName,scientificNameID,Kingdom
0,D20180406T033616_IFCB115_00001,amoeba,Protozoa,urn:lsid:marinespecies.org:taxname:5,Protozoa
1,D20180406T033616_IFCB115_00002,flagellate,Protozoa,urn:lsid:marinespecies.org:taxname:5,Protozoa
2,D20180406T033616_IFCB115_00003,Lauderia_annulata,Lauderia annulata,urn:lsid:marinespecies.org:taxname:149135,Chromista
3,D20180406T033616_IFCB115_00004,,,,
4,D20180406T033616_IFCB115_00005,,,,


In [9]:
# checking that non-biota were assigned to SeaBASS PTWG namespace
print (df[df["autoclass"] == 'bead'] )

                                pid autoclass scientificName scientificNameID  \
360  D20180406T033616_IFCB115_00362      bead           bead        ptwg:bead   
450  D20180406T033616_IFCB115_00452      bead           bead        ptwg:bead   
727  D20180406T033616_IFCB115_00729      bead           bead        ptwg:bead   
812  D20180406T033616_IFCB115_00814      bead           bead        ptwg:bead   

    Kingdom  
360     NaN  
450     NaN  
727     NaN  
812     NaN  


Read in the query results that were exported from manual annotation database

Populate on-the-fly the scientific name/ID that matches the provided AphiaID

In [10]:
# note to always at least manually spot-check when using a script to call the WoRMS API
# taxonomies change; for example, 2022-04-01 Family Gymnodiniaceae does not contain Genus Katodinium 

manual_df = pd.read_csv('20220314_stace_query_results_sio_delmar_mooring.csv')
manual_df = manual_df.apply(get_valid_sci_attrs, axis=1) # for visual indicator of progress uncomment print in function
manual_df = manual_df.loc[pd.notnull(manual_df['valid_AphiaID'])]
manual_df.head()

Unnamed: 0,roi,class_name,worms,username,valid_AphiaID,valid_name,valid_kingdom
41,D20180404T020721_IFCB115_00054,Thalassiosira,148912.0,rshipe,148912.0,Thalassiosira,Chromista
259,D20180404T203715_IFCB115_00003,Katodinium or Torodinium,109410.0,rshipe,109410.0,Gymnodiniaceae,Chromista
265,D20180404T203715_IFCB115_00025,Katodinium or Torodinium,109410.0,rshipe,109410.0,Gymnodiniaceae,Chromista
270,D20180404T203715_IFCB115_00039,Katodinium or Torodinium,109410.0,rshipe,109410.0,Gymnodiniaceae,Chromista
275,D20180404T203715_IFCB115_00066,Katodinium or Torodinium,109410.0,rshipe,109410.0,Gymnodiniaceae,Chromista


Join the automated results with the manual annotations in a wide-format table as recommended by the OCB PTWG in Neeley et al. (2021)

In [11]:
df_auto_man = pd.merge(df, manual_df, how="left", left_on='pid', right_on='roi')
df_auto_man = df_auto_man.loc[pd.notnull(df_auto_man['valid_AphiaID'])]
df_auto_man.head()

Unnamed: 0,pid,autoclass,scientificName,scientificNameID,Kingdom,roi,class_name,worms,username,valid_AphiaID,valid_name,valid_kingdom
2,D20180406T033616_IFCB115_00003,Lauderia_annulata,Lauderia annulata,urn:lsid:marinespecies.org:taxname:149135,Chromista,D20180406T033616_IFCB115_00003,Bacillariophyceae,148899.0,hsosik,148899.0,Bacillariophyceae,Chromista
12,D20180406T033616_IFCB115_00013,ciliate,Ciliophora,urn:lsid:marinespecies.org:taxname:11,Chromista,D20180406T033616_IFCB115_00013,ciliate,1348.0,hsosik,1348.0,Spirotrichea,Chromista
15,D20180406T033616_IFCB115_00016,cryptophyta,Cryptophyta,urn:lsid:marinespecies.org:taxname:17638,Chromista,D20180406T033616_IFCB115_00016,Euglena,8012.0,hsosik,8012.0,Euglena,Protozoa
18,D20180406T033616_IFCB115_00019,Chaetoceros_socialis,Chaetoceros socialis,urn:lsid:marinespecies.org:taxname:149123,Chromista,D20180406T033616_IFCB115_00019,Chaetoceros,148985.0,kkenitz,148985.0,Chaetoceros,Chromista
20,D20180406T033616_IFCB115_00021,Strombidium_morphotype2,Strombidium,urn:lsid:marinespecies.org:taxname:101195,Chromista,D20180406T033616_IFCB115_00021,ciliate,1348.0,hsosik,1348.0,Spirotrichea,Chromista


The following adds the feature "size" in pixels to the wide-format table as recommended by the OCB PTWG in Neeley et al. (2021)

Note that we did not specify the features file as a precondition for the Dashboard, because it is not required for the output tables in this present workflow


In [12]:
size = get_size(BASE_URL, DATASET, LAST_PID, LOOKUP_PATH)
df_auto_man["size"] = size
df_auto_man.head()

Unnamed: 0,pid,autoclass,scientificName,scientificNameID,Kingdom,roi,class_name,worms,username,valid_AphiaID,valid_name,valid_kingdom,size
2,D20180406T033616_IFCB115_00003,Lauderia_annulata,Lauderia annulata,urn:lsid:marinespecies.org:taxname:149135,Chromista,D20180406T033616_IFCB115_00003,Bacillariophyceae,148899.0,hsosik,148899.0,Bacillariophyceae,Chromista,4431
12,D20180406T033616_IFCB115_00013,ciliate,Ciliophora,urn:lsid:marinespecies.org:taxname:11,Chromista,D20180406T033616_IFCB115_00013,ciliate,1348.0,hsosik,1348.0,Spirotrichea,Chromista,1165
15,D20180406T033616_IFCB115_00016,cryptophyta,Cryptophyta,urn:lsid:marinespecies.org:taxname:17638,Chromista,D20180406T033616_IFCB115_00016,Euglena,8012.0,hsosik,8012.0,Euglena,Protozoa,1139
18,D20180406T033616_IFCB115_00019,Chaetoceros_socialis,Chaetoceros socialis,urn:lsid:marinespecies.org:taxname:149123,Chromista,D20180406T033616_IFCB115_00019,Chaetoceros,148985.0,kkenitz,148985.0,Chaetoceros,Chromista,1027
20,D20180406T033616_IFCB115_00021,Strombidium_morphotype2,Strombidium,urn:lsid:marinespecies.org:taxname:101195,Chromista,D20180406T033616_IFCB115_00021,ciliate,1348.0,hsosik,1348.0,Spirotrichea,Chromista,9497


Transform the intermediate, wide-format data table into an Occurrence table. Note that size will ultimately go into an EMOF table.

Rename column headers to Darwin Core terms and add columns to meet the minimum requirements for an occurrence table for OBIS


In [13]:
df_auto_man['eventID'] = LAST_PID
df_auto_man.rename(columns={'pid': 'occurrenceID'}, inplace=True) # note that ultimately we append label to ROI identifier
df_auto_man["associatedMedia"] = BASE_URL + "/" + DATASET + "/" + df_auto_man["occurrenceID"]+".png"
df_auto_man.rename(columns={'class_name': 'verbatimIdentification'}, inplace=True)
df_auto_man['occurrenceID'] = df_auto_man.apply(lambda x: x['occurrenceID'] + '_' + x['verbatimIdentification'], axis=1)
# we are appending the verbatimIdentification to the ROI identifier
# because in the future we could consider more than 1 occurrence record per ROI
df_auto_man.head()

Unnamed: 0,occurrenceID,autoclass,scientificName,scientificNameID,Kingdom,roi,verbatimIdentification,worms,username,valid_AphiaID,valid_name,valid_kingdom,size,eventID,associatedMedia
2,D20180406T033616_IFCB115_00003_Bacillariophyceae,Lauderia_annulata,Lauderia annulata,urn:lsid:marinespecies.org:taxname:149135,Chromista,D20180406T033616_IFCB115_00003,Bacillariophyceae,148899.0,hsosik,148899.0,Bacillariophyceae,Chromista,4431,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
12,D20180406T033616_IFCB115_00013_ciliate,ciliate,Ciliophora,urn:lsid:marinespecies.org:taxname:11,Chromista,D20180406T033616_IFCB115_00013,ciliate,1348.0,hsosik,1348.0,Spirotrichea,Chromista,1165,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
15,D20180406T033616_IFCB115_00016_Euglena,cryptophyta,Cryptophyta,urn:lsid:marinespecies.org:taxname:17638,Chromista,D20180406T033616_IFCB115_00016,Euglena,8012.0,hsosik,8012.0,Euglena,Protozoa,1139,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
18,D20180406T033616_IFCB115_00019_Chaetoceros,Chaetoceros_socialis,Chaetoceros socialis,urn:lsid:marinespecies.org:taxname:149123,Chromista,D20180406T033616_IFCB115_00019,Chaetoceros,148985.0,kkenitz,148985.0,Chaetoceros,Chromista,1027,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
20,D20180406T033616_IFCB115_00021_ciliate,Strombidium_morphotype2,Strombidium,urn:lsid:marinespecies.org:taxname:101195,Chromista,D20180406T033616_IFCB115_00021,ciliate,1348.0,hsosik,1348.0,Spirotrichea,Chromista,9497,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...


In [14]:
df_auto_man.drop(columns=['roi', 'worms', 'autoclass', 'scientificName', 'scientificNameID', 'size', 'Kingdom'], inplace=True, axis=1)
# we are excluding all automated results from the occurrence table
# because in this use case the automated classification was solely to assist efficiency of manual annotations
# also exclude size
df_auto_man.head()

Unnamed: 0,occurrenceID,verbatimIdentification,username,valid_AphiaID,valid_name,valid_kingdom,eventID,associatedMedia
2,D20180406T033616_IFCB115_00003_Bacillariophyceae,Bacillariophyceae,hsosik,148899.0,Bacillariophyceae,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
12,D20180406T033616_IFCB115_00013_ciliate,ciliate,hsosik,1348.0,Spirotrichea,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
15,D20180406T033616_IFCB115_00016_Euglena,Euglena,hsosik,8012.0,Euglena,Protozoa,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
18,D20180406T033616_IFCB115_00019_Chaetoceros,Chaetoceros,kkenitz,148985.0,Chaetoceros,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
20,D20180406T033616_IFCB115_00021_ciliate,ciliate,hsosik,1348.0,Spirotrichea,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...


In [15]:

df_auto_man.rename(columns={'valid_AphiaID': 'scientificNameID'}, inplace=True) # contents not yet LSID 
df_auto_man.rename(columns={'valid_name': 'scientificName'}, inplace=True)
df_auto_man.rename(columns={'username': 'identifiedBy'}, inplace=True)
df_auto_man['scientificNameID'] = df_auto_man.apply(lambda x: 'urn:lsid:marinespecies.org:taxname:' + str(int(x['scientificNameID'])), axis=1)
df_auto_man.rename(columns={'valid_kingdom': 'kingdom'}, inplace=True)
df_auto_man.head()

Unnamed: 0,occurrenceID,verbatimIdentification,identifiedBy,scientificNameID,scientificName,kingdom,eventID,associatedMedia
2,D20180406T033616_IFCB115_00003_Bacillariophyceae,Bacillariophyceae,hsosik,urn:lsid:marinespecies.org:taxname:148899,Bacillariophyceae,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
12,D20180406T033616_IFCB115_00013_ciliate,ciliate,hsosik,urn:lsid:marinespecies.org:taxname:1348,Spirotrichea,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
15,D20180406T033616_IFCB115_00016_Euglena,Euglena,hsosik,urn:lsid:marinespecies.org:taxname:8012,Euglena,Protozoa,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
18,D20180406T033616_IFCB115_00019_Chaetoceros,Chaetoceros,kkenitz,urn:lsid:marinespecies.org:taxname:148985,Chaetoceros,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
20,D20180406T033616_IFCB115_00021_ciliate,ciliate,hsosik,urn:lsid:marinespecies.org:taxname:1348,Spirotrichea,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...


In [16]:
df_occurrence = df_auto_man
df_occurrence.head()

Unnamed: 0,occurrenceID,verbatimIdentification,identifiedBy,scientificNameID,scientificName,kingdom,eventID,associatedMedia
2,D20180406T033616_IFCB115_00003_Bacillariophyceae,Bacillariophyceae,hsosik,urn:lsid:marinespecies.org:taxname:148899,Bacillariophyceae,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
12,D20180406T033616_IFCB115_00013_ciliate,ciliate,hsosik,urn:lsid:marinespecies.org:taxname:1348,Spirotrichea,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
15,D20180406T033616_IFCB115_00016_Euglena,Euglena,hsosik,urn:lsid:marinespecies.org:taxname:8012,Euglena,Protozoa,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
18,D20180406T033616_IFCB115_00019_Chaetoceros,Chaetoceros,kkenitz,urn:lsid:marinespecies.org:taxname:148985,Chaetoceros,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...
20,D20180406T033616_IFCB115_00021_ciliate,ciliate,hsosik,urn:lsid:marinespecies.org:taxname:1348,Spirotrichea,Chromista,D20180406T033616_IFCB115,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...


add additional columns to meet the minimum requirements for an occurrence table for OBIS

In [17]:

df_occurrence["occurrenceStatus"] = "present"
df_occurrence["basisOfRecord"] = "MachineObservation"
df_occurrence["identificationVerificationStatus"] = "ValidatedByHuman" # EU Horizon 2020 best practice
# because we already excluded all automated results we will not be using identificationVerificationStatus PredictedByMachine
# nor will we be using identificationReferences EU Horizon 2020 best practice if providing automated results
# df_auto_man["identificationReferences"] = "NA" # we could consider inserting NA when providing manual annotations 
df_occurrence["institutionCode"] = "Axiom ROR" # IOOS OBIS Workshop best practice
# Axiom's Research Organization Registry (ROR) is still pending 

# Maybe include
# df["dataGeneralizations"] if only providing concentrations to OBIS (not providing ROIs)

# specify the order of the columns
df_occurrence = df_occurrence[["eventID", "occurrenceID", "associatedMedia", "basisOfRecord", "identificationVerificationStatus", "identifiedBy", \
"verbatimIdentification", "scientificName", "scientificNameID", "kingdom", "occurrenceStatus", "institutionCode"]]
#df = df[["occurrence_id", "scientificName", \
#         "scientificNameID","occurrenceStatus","basisOfRecord"]]
df_occurrence.head()

Unnamed: 0,eventID,occurrenceID,associatedMedia,basisOfRecord,identificationVerificationStatus,identifiedBy,verbatimIdentification,scientificName,scientificNameID,kingdom,occurrenceStatus,institutionCode
2,D20180406T033616_IFCB115,D20180406T033616_IFCB115_00003_Bacillariophyceae,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...,MachineObservation,ValidatedByHuman,hsosik,Bacillariophyceae,Bacillariophyceae,urn:lsid:marinespecies.org:taxname:148899,Chromista,present,Axiom ROR
12,D20180406T033616_IFCB115,D20180406T033616_IFCB115_00013_ciliate,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...,MachineObservation,ValidatedByHuman,hsosik,ciliate,Spirotrichea,urn:lsid:marinespecies.org:taxname:1348,Chromista,present,Axiom ROR
15,D20180406T033616_IFCB115,D20180406T033616_IFCB115_00016_Euglena,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...,MachineObservation,ValidatedByHuman,hsosik,Euglena,Euglena,urn:lsid:marinespecies.org:taxname:8012,Protozoa,present,Axiom ROR
18,D20180406T033616_IFCB115,D20180406T033616_IFCB115_00019_Chaetoceros,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...,MachineObservation,ValidatedByHuman,kkenitz,Chaetoceros,Chaetoceros,urn:lsid:marinespecies.org:taxname:148985,Chromista,present,Axiom ROR
20,D20180406T033616_IFCB115,D20180406T033616_IFCB115_00021_ciliate,https://ifcb-data.whoi.edu/SIO_Delmar_mooring/...,MachineObservation,ValidatedByHuman,hsosik,ciliate,Spirotrichea,urn:lsid:marinespecies.org:taxname:1348,Chromista,present,Axiom ROR


Output a comma-separated occurrence table

In [18]:
df_occurrence.to_csv('IFCB_OBIS_occurrence.csv', index=False, na_rep='NaN')

Create the Event table

In [19]:
df_event= pd.DataFrame({
    "eventID": [LAST_PID],
    "eventDate": [basic_info(LAST_PID)["datetime"]],
    "decimalLongitude": [basic_info(LAST_PID)["latitude"]],
    "decimalLatitude": [basic_info(LAST_PID)["longitude"]],
    "minimumDepthInMeters": [basic_info(LAST_PID)["depth"]], # OBIS best practice
    "maximumDepthInMeters": [basic_info(LAST_PID)["depth"]], # OBIS best practice
    "datasetName": [DATASET], # IOOS OBIS Workshop best practice
    "geodeticDatum": "WGS84", # IOOS OBIS Workshop best practice
    "countryCode": "US" # GBIF best practice
})

# Maybe include
# df["coordinateUncertaintyInMeters"] = TBD 

# specify the order of the columns
df_event = df_event[["datasetName", "eventID", "eventDate", "decimalLongitude", "decimalLatitude", \
                     "countryCode", "geodeticDatum", "minimumDepthInMeters", "maximumDepthInMeters"]]
#df = df[["occurrence_id", "eventDate","decimalLongitude","decimalLatitude","scientificName", \
#         "scientificNameID","occurrenceStatus","basisOfRecord", "geodeticDatum"]]
df_event.head()


Unnamed: 0,datasetName,eventID,eventDate,decimalLongitude,decimalLatitude,countryCode,geodeticDatum,minimumDepthInMeters,maximumDepthInMeters
0,SIO_Delmar_mooring,D20180406T033616_IFCB115,2018-04-06T03:36:16+00:00,32.93,-117.3172,US,WGS84,5.0,5.0


Output a comma-separated event table

In [20]:
df_event.to_csv('IFCB_OBIS_event.csv', index=False, na_rep='NaN')

Next steps: Include more than one sample; Create EMOF table