# FathomNet Python API
Experiments with the [fathomnet-py](https://github.com/fathomnet/fathomnet-py) client-side API.

In [None]:
%pip install --user fathomnet 

In [1]:
import fathomnet
from fathomnet.api import boundingboxes

In [4]:
concepts = boundingboxes.find_concepts()
print(len(concepts))

2119


There are over 2000 concepts in FathomNet, many of which aren't actually organisms.

Filtering out non-organisms can be done using a WoRMS lookup.

In [59]:
import requests
import urllib

def get_worms_id(scientific_name) -> int or None:
    parsed_name = urllib.parse.quote(scientific_name)
    url = "https://www.marinespecies.org/rest/AphiaIDByName/" + parsed_name
    response = requests.get(url)
    if (response.status_code == 200):
        return response.json()
    elif (response.status_code == 206):
        # Multiple matches, so get the first accepted ID if one exists.
        url = "https://www.marinespecies.org/rest/AphiaRecordsByName/" + parsed_name
        response = requests.get(url)
        
        if (response.status_code == 200): # Successful
            for record in response.json():
                if record["status"] == "accepted":
                    return record["AphiaID"]
            return response.json()[0]["AphiaID"]

    else:
        return None


In [None]:
# Sort out species that have exact matches in WoRMS.
concepts = set(concepts)
matched_concepts_to_id: dict[str, str] = dict()
matched_concepts = set([])
unmatched_concepts = set([])

In [32]:
for i in range(len(concepts)):
    new_concept = list(concepts)[0]
    id = get_worms_id(new_concept)
    concepts.remove(new_concept)
    if (id == None):
        print("{}: NO MATCH".format(new_concept))
        unmatched_concepts.add(new_concept)
    else:
        print("{}: {}".format(new_concept, id))
        matched_concepts.add(new_concept)
        matched_concepts_to_id[new_concept] = id

print("\nMatched Concepts: {}".format(len(matched_concepts)))
print("\nUnmatched Concepts: {}".format(len(unmatched_concepts)))

Desbruyeresia: 391517
tire: NO MATCH
Caryophyllia/Javania: NO MATCH
Caenopedina pulchella: 456763
Tetrorchis erythrogaster: 117868
Hollardia goslinei: 281083
Diastobranchus capensis: 158656
Thrissacanthias penicillatus: 292833
Appendicularia: NO MATCH
Actinernus: 100691
Myroconger gracilis: 281607
inner filter: NO MATCH
Amphianthus sp.: NO MATCH
Clio: 137751
Galiteuthis phyllura: 341807
Tetractinellida: 597812
Cirroteuthis: 153091
Asteronyx: 123578
Bolocera: 100698
paragon: NO MATCH
Iridogorgia bella: 286152
Midwater Respirometry System: NO MATCH
Pseudosagitta maxima: 105445
marine organism: NO MATCH
Parthenopidae: 106761
Acanthamunnopsis milleri: 258647
Narella hypsocalyx: 719480
Cyclothone pallida: 127288
Victorgorgia alba: 1045634
Clausophyidae: 135337
Munidopsis recta: 392592
Echiura "mucus tube": NO MATCH
Archeterokrohnia docrickettsae: 742233
Laqueus: 235265
Funiculina-Halipteris complex: NO MATCH
Phyllodocida: 892
Hydractiniidae: 1601
Lamprogrammus brunswigi: 159133
Acanthascina

Example concepts that were unmatched:
tire
Group/Group
Genus (right)
Genus "subspecies"
Genus sp. 1
salp detritus
can
wood fall experiment
Cirrata "egg"
GenusnaME

Additional filters to run on currently unmatched groups:
- Ignore non-capitalized classifications
- Set all characters to lowercase
- Ignore any characters after first space or slash

In [None]:
sorted_matches = list(matched_concepts)
sorted_matches.sort()
print(sorted_matches)

sorted_unmatched = list(unmatched_concepts)
sorted_unmatched.sort()
print(sorted_unmatched)

In [64]:
# Second-pass filter

unmatched_concepts_list = list(unmatched_concepts)
for new_concept in unmatched_concepts_list:
    # Ignore characters after space or slash
    formatted_concept = new_concept.split(" ")[0]
    formatted_concept = formatted_concept.split("/")[0]

    if (formatted_concept[0].isupper()): # Check that first character is capitalized
        # Search for this term on WoRMS
        id = get_worms_id(formatted_concept)
        if (id == None):
            print("{}: NO MATCH".format(new_concept))
        else:
            print("{}: {}".format(new_concept, id))
            unmatched_concepts.remove(new_concept)
            matched_concepts.add(new_concept)
            matched_concepts_to_id[new_concept] = id
        
    else:
        print("{}: IGNORED".format(new_concept))

print("\nMatched Concepts: {}".format(len(matched_concepts)))
print("\nUnmatched Concepts: {}".format(len(unmatched_concepts)))

Octopodinae: NO MATCH
kelp holdfast: IGNORED
tire: IGNORED
inner filter: IGNORED
paragon: IGNORED
detrital aggregate: IGNORED
Midwater Respirometry System: NO MATCH
sheet flow: IGNORED
BED: NO MATCH
marine organism: IGNORED
Medusae: NO MATCH
whale carcass: IGNORED
Funiculina-Halipteris complex: NO MATCH
ADCP: NO MATCH
Sebastomus complex: NO MATCH
bottle: IGNORED
plastic: IGNORED
Tanyostea: NO MATCH
manipulator: IGNORED
medusa carcass: IGNORED
trash: IGNORED
DiplacanthopomaA: NO MATCH
cf. Hansenothuria sp.: IGNORED
TorquaratoridaeB sp. 1: NO MATCH
DiplacanthopomaB: NO MATCH
plastic bag: IGNORED
Eye-in-the-Sea: NO MATCH
Macon: NO MATCH
swing arm: IGNORED
bottle-2: IGNORED
cable spool: IGNORED
Vitreosalpa: NO MATCH
salp detritus: IGNORED
rope: IGNORED
wood fall experiment: IGNORED
trap: IGNORED
boulder: IGNORED
net: IGNORED
narella sp.: IGNORED
dover sole: IGNORED
can: IGNORED
Homerpro: NO MATCH
sinker: IGNORED
Tomopterid eggcase: NO MATCH
Neptunea-Buccinum Complex: NO MATCH
polychaete tu

In [65]:
unmatched_concepts

{'2G Robotics structured light laser',
 '55-gallon drum',
 'ADCP',
 'Actinaria',
 'BED',
 'Bassogigas1',
 'Bassozetus1',
 'Benthic Respiration System',
 'Benthic Rover',
 'Chrysogorgidae',
 'Ctenophore',
 'DeepPIV 1.0',
 'DeepPIV 2.0',
 'DeepPIV 3.0',
 'Detritus Sampler',
 'DiplacanthopomaA',
 'DiplacanthopomaB',
 'DiplacanthopomaC',
 'Doliolenetta',
 'Dye Injector',
 'Eye-in-the-Sea',
 'Fecampiid eggcase',
 'Funiculina-Halipteris complex',
 'Homerpro',
 'Hydrate Synthesis Chamber',
 'Hydromedusae',
 'Ink Dispenser',
 'Krill molt',
 'LRJ complex',
 'Lagrangian sediment trap',
 'Larval Sampler',
 'Laser Raman',
 'Leptocephalus-2',
 'Macon',
 'Medusae',
 'Midwater Respirometry System',
 'Neptunea-Buccinum Complex',
 'Octopodinae',
 'Phyllospadix-Zostera detritus',
 'Push Corer',
 'Roundnose grenadier',
 'Sebastomus complex',
 'Solmunaegina nematophora',
 'Sonardyne beacon',
 'Suction Sampler',
 'Tanyostea',
 'Temperature Gradient Probe',
 'Teuthoidea',
 'Theudoidea',
 'Tomopterid eggcase

In [66]:
import json

full_matches = dict(matched_concepts_to_id)

for concept in unmatched_concepts:
    full_matches[concept] = None

with open("../data/concept_to_aphia_id.json", 'w') as fp:
    json.dump(full_matches, fp, sort_keys=True)
    


Some concepts did not find matches because there are actually multiple conflicting records. 

Method from the WoRMS REST API:

/AphiaRecordsByName/{ScientificName}
Get one or more matching (max. 50) AphiaRecords for a given name

Select the first index returned for a given record, using the previous data formatting (remove characters after whitespace, ignore lowercase.)
(Updated get_worms_id to use this method)

In [57]:
concept_to_group: dict[str, str] = dict()

'Porifera'