In a process like the SGCN, we have some specific business rules we are following. We are trying to nail down species for a "National List" based on consultation with taxonomic authorities, putting every name on the national list where we find a valid record in ITIS or WoRMS to follow. Since we are essentially trusting ITIS first and only using the information from WoRMS as an additional possible source for taxonomic alignment, we can shortcut the process of consulting with WoRMS and only search for the ITIS leftovers. If we wanted to use the information in WoRMS for more than that, we might run all of the species names through this process.

In [1]:
import pandas as pd
import bispy
import json
from joblib import Parallel, delayed

worms = bispy.worms.Worms()

In [2]:
with open('itis.json', 'r') as f:
    itis_data = json.loads(f.read())
    f.close()

In [3]:
itis_leftovers = [i["parameters"]["Scientific Name"] for i in itis_data if i["processing_metadata"]["status"] != "success"]

In [6]:
len(itis_leftovers)

1645

In [5]:
%%time
# Use joblib to run multiple requests for ITIS documents in parallel via known ITIS TSNs
worms_cache = Parallel(n_jobs=8)(delayed(worms.search)(name) for name in itis_leftovers)

CPU times: user 5.38 s, sys: 442 ms, total: 5.82 s
Wall time: 2min 56s


In [12]:
with open('worms.json', 'w') as f:
    f.write(json.dumps(worms_cache))
    f.close()