# 1. Using OpenAlex to find taxonomists

### 1.1. Following the RLIT methodology

The European Red List of Insect Taxonomists is an important predecessor to our methodology (https://cloud.pensoft.net/s/mGpyQYUPQOMPs8C). They used Web of Science (WoS) to find articles related to specific insect orders, specifically by searching WoS with the following query (https://www.webofscience.com/wos/woscc/summary/5f6f7d2a-89dd-4709-bf49-494b6f2522bc-73383729/relevance/1):

    ALL=(Plecoptera  AND  (taxonom*  OR  “new  species”  OR  
    “novel species” OR “checklist” OR “new genus” OR “new genera”))

To see if OpenAlex can find the same or even more articles, we replicate their methodology here.

Some differences between the two methodologies exist: OpenAlex, for copyright reasons, only records the abstract in the form of an inverted index - listing every word in it and its place(s) in the abstract as a key-value pair, respectively. For this reason, word groups can't be searched exactly: "new species" is searched as "new" and "species", not necessarily adjacent. Furthermore, OpenAlex doesn't record author keywords nor does it feature WoS' "keywords plus" which are based on the references, but it does associate "concepts" with every article. We searched these concepts instead.

Title and abstract were searched concurrently and explicitly since the simpler "search" function of the OpenAlex API searches the full text as well as title and abstract.

In [1]:
import numpy as np
import pandas as pd
import pickle
import openalex
import time

In [2]:
# replicating WoS queries

# search every insect order listed in the RLIT
insect_orders = ["Coleoptera", "Hemiptera", "Diptera", "Lepidoptera", "Orthoptera", 
                 "Odonata", "Blattodea", "Ephemeroptera", "Psocodea", "Grylloblattodea", 
                 "Neuroptera", "Mecoptera", "Trichoptera", "Plecoptera", "Dermaptera", 
                 "Mantodea", "Siphonaptera", "Strepsiptera", "Embioptera", "Hymenoptera",
                 "Phasmida", "Raphidioptera", "Isoptera", "Megaloptera", "Thysanoptera",
                 "Zygentoma", "Mantophasmatodea", "Archaeognatha", "Zoraptera"]
insect_articles = pd.DataFrame()

for order in insect_orders:
    start = time.time()
    results = []
    
    # search each of the WoS search terms in abstract or title or concepts
    # the order must also be found in abstract or title (only some orders exist as concepts)
    # OpenAlex OR function in search not useable because it excludes results with both search terms
    
    # Plecoptera AND
    for query in ["title.search:"+order+",title.search:%22new species%22", # OR "new species"
                  "title.search:"+order+",abstract.search:new species",
                  "title.search:%22new species%22,abstract.search:"+order,
                  "abstract.search:"+order+" new species", 

                  "title.search:"+order+" AND %22novel species%22", # OR "novel species"
                  "title.search:"+order+",abstract.search:novel species",
                  "title.search:%22novel species%22,abstract.search:"+order,
                  "abstract.search:"+order+" novel species",

                  "title.search:"+order+" AND %22new genus%22", # OR "new genus"
                  "title.search:"+order+",abstract.search:new genus",
                  "title.search:%22new genus%22,abstract.search:"+order,
                  "abstract.search:"+order+" new genus",

                  "title.search:"+order+" AND %22new genera%22", # OR "new genera"
                  "title.search:"+order+",abstract.search:new genera",
                  "title.search:%22new genera%22,abstract.search:"+order,
                  "abstract.search:"+order+" new genera",

                  "title.search:"+order+" AND checklist", # OR "checklist"
                  "title.search:"+order+",abstract.search:checklist",
                  "title.search:checklist,abstract.search:"+order,
                  "abstract.search:"+order+" checklist",

                  "title.search:"+order+" AND taxonomy", # taxonom* (OpenAlex automatically stems)
                  "title.search:"+order+",abstract.search:taxonomy",
                  "title.search:taxonomy,abstract.search:"+order,
                  "abstract.search:"+order+" taxonomy",

                  # concepts
                  "title.search:"+order+",concepts.id:C58642233", # taxonomy
                  "abstract.search:"+order+",concepts.id:C58642233",

                  "title.search:"+order+",concepts.id:C71640776", # taxon
                  "abstract.search:"+order+",concepts.id:C71640776",

                  "title.search:"+order+",concepts.id:C2779356329", # checklist
                  "abstract.search:"+order+",concepts.id:C2779356329",
                 ]:
            articles = openalex.request_works(query, 
                                              from_date="2011-01-01", to_date="2020-12-31",
                                              print_number=False)
            results.append(articles)
    
    # combine results and remove duplicates
    order_articles = pd.concat(results, ignore_index=True).drop_duplicates(subset="id", ignore_index=True)
    order_articles["order"] = order
    insect_articles = pd.concat([insect_articles, order_articles])
    
    end=time.time()
    print(order + " done in "+str(end-start)+" seconds")

insect_articles

Coleoptera done in 73.58504891395569 seconds
Hemiptera done in 44.279725790023804 seconds
Diptera done in 59.63000440597534 seconds
Lepidoptera done in 50.811116218566895 seconds
Orthoptera done in 37.57386827468872 seconds
Odonata done in 27.874809980392456 seconds
Blattodea done in 20.895572423934937 seconds
Ephemeroptera done in 27.143210887908936 seconds
Psocodea done in 18.286417245864868 seconds
Grylloblattodea done in 15.649590730667114 seconds
Neuroptera done in 23.050442457199097 seconds
Mecoptera done in 18.829779148101807 seconds
Trichoptera done in 30.594982385635376 seconds
Plecoptera done in 25.255085945129395 seconds
Dermaptera done in 17.95497441291809 seconds
Mantodea done in 20.149827003479004 seconds
Siphonaptera done in 19.847620248794556 seconds
Strepsiptera done in 17.957784175872803 seconds
Embioptera done in 16.079787731170654 seconds
Hymenoptera done in 97.72476530075073 seconds
Phasmida done in 16.535338163375854 seconds
Raphidioptera done in 17.06973004341125

Unnamed: 0,id,doi,title,display_name,relevance_score,publication_year,publication_date,ids,language,primary_location,...,grants,referenced_works,related_works,ngrams_url,abstract_inverted_index,cited_by_api_url,counts_by_year,updated_date,created_date,order
0,https://openalex.org/W2075105050,https://doi.org/10.3897/zookeys.186.2947,New species and distributional records of Aleo...,New species and distributional records of Aleo...,187.25473,2012,2012-04-26,{'openalex': 'https://openalex.org/W2075105050...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],"[https://openalex.org/W239374901, https://open...","[https://openalex.org/W2011812956, https://ope...",https://api.openalex.org/works/W2075105050/ngrams,"{'The': [0, 118, 155], 'Aleocharinae': [1, 112...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2021, 'cited_by_count': 1}, {'year':...",2023-05-29T01:52:30.286445,2016-06-24,Coleoptera
1,https://openalex.org/W2129944008,https://doi.org/10.11646/zootaxa.2883.1.2,New species and new records of mites of the fa...,New species and new records of mites of the fa...,184.06510,2011,2011-05-19,{'openalex': 'https://openalex.org/W2129944008...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],"[https://openalex.org/W296414054, https://open...","[https://openalex.org/W2017242324, https://ope...",https://api.openalex.org/works/W2129944008/ngrams,"{'We': [0], 'report': [1], 'on': [2], 'a': [3,...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2022, 'cited_by_count': 2}, {'year':...",2023-05-28T11:03:33.873578,2016-06-24,Coleoptera
2,https://openalex.org/W2048741281,https://doi.org/10.1016/j.cretres.2011.10.010,"Prosolierius, a new mid-Cretaceous genus of So...","Prosolierius, a new mid-Cretaceous genus of So...",166.04407,2012,2012-04-01,{'openalex': 'https://openalex.org/W2048741281...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W121843087, https://open...","[https://openalex.org/W2101287629, https://ope...",https://api.openalex.org/works/W2048741281/ngrams,"{'Investigation': [0], 'of': [1, 17, 27, 34, 3...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-06-08T10:21:55.318171,2016-06-24,Coleoptera
3,https://openalex.org/W2516041128,https://doi.org/10.3897/zookeys.610.9361,Twelve new species and fifty-three new provinc...,Twelve new species and fifty-three new provinc...,157.88048,2016,2016-08-11,{'openalex': 'https://openalex.org/W2516041128...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],"[https://openalex.org/W1964889809, https://ope...","[https://openalex.org/W2067216388, https://ope...",https://api.openalex.org/works/W2516041128/ngrams,"{'One': [0], 'hundred': [1], 'twenty': [2], 's...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2021, 'cited_by_count': 10}, {'year'...",2023-05-29T01:53:39.957429,2016-09-16,Coleoptera
4,https://openalex.org/W2134062392,https://doi.org/10.1603/an10136,A New Species of <i>Laricobius</i> (Coleoptera...,A New Species of <i>Laricobius</i> (Coleoptera...,148.83423,2011,2011-05-01,{'openalex': 'https://openalex.org/W2134062392...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W1968197856, https://ope...","[https://openalex.org/W1974991915, https://ope...",https://api.openalex.org/works/W2134062392/ngrams,"{'Abstract': [0], 'Laricobius': [1, 108, 111, ...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2022, 'cited_by_count': 4}, {'year':...",2023-06-07T18:24:44.204020,2016-06-24,Coleoptera
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25,https://openalex.org/W2300829245,https://doi.org/10.1146/annurev-ento-010715-02...,Structure and Evolution of Insect Sperm: New I...,Structure and Evolution of Insect Sperm: New I...,359.53284,2016,2016-03-16,{'openalex': 'https://openalex.org/W2300829245...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W607865467, https://open...","[https://openalex.org/W154122946, https://open...",https://api.openalex.org/works/W2300829245/ngrams,"{'This': [0], 'comprehensive': [1], 'review': ...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-05-29T04:26:39.022672,2016-06-24,Zoraptera
26,https://openalex.org/W2540536377,https://doi.org/10.1038/srep36175,Molecular phylogeny of Polyneoptera (Insecta) ...,Molecular phylogeny of Polyneoptera (Insecta) ...,291.37665,2016,2016-10-26,{'openalex': 'https://openalex.org/W2540536377...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],"[https://openalex.org/W1557992090, https://ope...","[https://openalex.org/W1994597060, https://ope...",https://api.openalex.org/works/W2540536377/ngrams,"{'Abstract': [0], 'The': [1], 'Polyneoptera': ...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 2}, {'year':...",2023-06-08T04:32:28.660091,2016-11-04,Zoraptera
27,https://openalex.org/W2792039220,https://doi.org/10.7717/peerj.5126,A reference cytochrome c oxidase subunit I dat...,A reference cytochrome c oxidase subunit I dat...,129.17558,2018,2018-06-26,{'openalex': 'https://openalex.org/W2792039220...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[{'funder': 'https://openalex.org/F4320321033'...,"[https://openalex.org/W1571552535, https://ope...","[https://openalex.org/W1999884409, https://ope...",https://api.openalex.org/works/W2792039220/ngrams,"{'Metabarcoding': [0], 'is': [1, 179], 'a': [2...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-06-08T23:28:04.670099,2018-03-29,Zoraptera
28,https://openalex.org/W4243684135,https://doi.org/10.7287/peerj.preprints.26662,A reference cytochrome c oxidase subunit I dat...,A reference cytochrome c oxidase subunit I dat...,,2018,2018-03-12,{'openalex': 'https://openalex.org/W4243684135...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],[],"[https://openalex.org/W1070900, https://openal...",https://api.openalex.org/works/W4243684135/ngrams,"{'Metabarcoding': [0], 'is': [1, 176], 'a': [2...",https://api.openalex.org/works?filter=cites:W4...,[],2023-05-31T19:31:56.669364,2022-05-12,Zoraptera


In [3]:
insect_articles.to_pickle("./data/RLIT_method_openalex_all_insect_articles.pkl")

In [6]:
insect_eu_articles = openalex.filter_eu_articles(insect_articles)
insect_eu_articles

Unnamed: 0,id,doi,title,display_name,relevance_score,publication_year,publication_date,ids,language,primary_location,...,grants,referenced_works,related_works,ngrams_url,abstract_inverted_index,cited_by_api_url,counts_by_year,updated_date,created_date,order
0,https://openalex.org/W2074050863,https://doi.org/10.3897/zookeys.250.3715,Introduction of the Exocelina ekari-group with...,Introduction of the Exocelina ekari-group with...,134.51106,2012,2012-12-13,{'openalex': 'https://openalex.org/W2074050863...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],"[https://openalex.org/W283890424, https://open...","[https://openalex.org/W2074050863, https://ope...",https://api.openalex.org/works/W2074050863/ngrams,"{'The': [0, 23, 139], 'Exocelina': [1, 42, 46,...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2022, 'cited_by_count': 1}, {'year':...",2023-05-28T11:21:10.003066,2016-06-24,Coleoptera
1,https://openalex.org/W2124627356,https://doi.org/10.3161/000345411x622525,A New Species ofHenosepilachnaLi (Coleoptera: ...,A New Species ofHenosepilachnaLi (Coleoptera: ...,123.24875,2011,2011-12-01,{'openalex': 'https://openalex.org/W2124627356...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W252341036, https://open...","[https://openalex.org/W1975229913, https://ope...",https://api.openalex.org/works/W2124627356/ngrams,"{'Abstract.': [0], 'Henosepilachna': [1, 14], ...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-06-11T10:06:06.248646,2016-06-24,Coleoptera
2,https://openalex.org/W2470267224,https://doi.org/10.1017/jpa.2016.51,New species from Late Cretaceous New Jersey am...,New species from Late Cretaceous New Jersey am...,120.11381,2016,2016-05-01,{'openalex': 'https://openalex.org/W2470267224...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W1492713357, https://ope...","[https://openalex.org/W1581826312, https://ope...",https://api.openalex.org/works/W2470267224/ngrams,"{'Abstract': [0], 'A': [1], 'new': [2, 105], '...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-06-04T09:42:06.895466,2016-07-22,Coleoptera
3,https://openalex.org/W2595316164,https://doi.org/10.3161/00034541anz2017.67.1.009,"Brochocoleus Zhiyuani, a New Species of Brocho...","Brochocoleus Zhiyuani, a New Species of Brocho...",105.57870,2017,2017-03-16,{'openalex': 'https://openalex.org/W2595316164...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W1963912104, https://ope...","[https://openalex.org/W770724554, https://open...",https://api.openalex.org/works/W2595316164/ngrams,"{'Abstract.': [0], 'A': [1], 'new': [2], 'spec...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2022, 'cited_by_count': 1}, {'year':...",2023-05-28T15:53:07.004448,2017-03-23,Coleoptera
4,https://openalex.org/W1981080163,https://doi.org/10.11646/zootaxa.3755.5.5,<strong>A review of Drilini (Coleoptera: Elate...,<strong>A review of Drilini (Coleoptera: Elate...,102.48853,2014,2014-01-24,{'openalex': 'https://openalex.org/W1981080163...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W1527744912, https://ope...","[https://openalex.org/W1963956275, https://ope...",https://api.openalex.org/works/W1981080163/ngrams,"{'The': [0], 'species': [1, 15, 66], 'of': [2,...",https://api.openalex.org/works?filter=cites:W1...,"[{'year': 2022, 'cited_by_count': 1}, {'year':...",2023-06-11T13:39:00.243911,2016-06-24,Coleoptera
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8155,https://openalex.org/W3000603933,https://doi.org/10.3390/insects11010051,Molecular Phylogeny and Infraordinal Classific...,Molecular Phylogeny and Infraordinal Classific...,119.16625,2020,2020-01-12,{'openalex': 'https://openalex.org/W3000603933...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],"[https://openalex.org/W1480665129, https://ope...","[https://openalex.org/W1554287491, https://ope...",https://api.openalex.org/works/W3000603933/ngrams,"{'Zoraptera': [0, 31, 117], 'is': [1], 'a': [2...",https://api.openalex.org/works?filter=cites:W3...,"[{'year': 2023, 'cited_by_count': 2}, {'year':...",2023-06-08T02:15:54.659814,2020-01-23,Zoraptera
8156,https://openalex.org/W2300829245,https://doi.org/10.1146/annurev-ento-010715-02...,Structure and Evolution of Insect Sperm: New I...,Structure and Evolution of Insect Sperm: New I...,359.53284,2016,2016-03-16,{'openalex': 'https://openalex.org/W2300829245...,en,"{'is_oa': False, 'landing_page_url': 'https://...",...,[],"[https://openalex.org/W607865467, https://open...","[https://openalex.org/W154122946, https://open...",https://api.openalex.org/works/W2300829245/ngrams,"{'This': [0], 'comprehensive': [1], 'review': ...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-05-29T04:26:39.022672,2016-06-24,Zoraptera
8157,https://openalex.org/W2792039220,https://doi.org/10.7717/peerj.5126,A reference cytochrome c oxidase subunit I dat...,A reference cytochrome c oxidase subunit I dat...,129.17558,2018,2018-06-26,{'openalex': 'https://openalex.org/W2792039220...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[{'funder': 'https://openalex.org/F4320321033'...,"[https://openalex.org/W1571552535, https://ope...","[https://openalex.org/W1999884409, https://ope...",https://api.openalex.org/works/W2792039220/ngrams,"{'Metabarcoding': [0], 'is': [1, 179], 'a': [2...",https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2023, 'cited_by_count': 1}, {'year':...",2023-06-08T23:28:04.670099,2018-03-29,Zoraptera
8158,https://openalex.org/W4243684135,https://doi.org/10.7287/peerj.preprints.26662,A reference cytochrome c oxidase subunit I dat...,A reference cytochrome c oxidase subunit I dat...,,2018,2018-03-12,{'openalex': 'https://openalex.org/W4243684135...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",...,[],[],"[https://openalex.org/W1070900, https://openal...",https://api.openalex.org/works/W4243684135/ngrams,"{'Metabarcoding': [0], 'is': [1, 176], 'a': [2...",https://api.openalex.org/works?filter=cites:W4...,[],2023-05-31T19:31:56.669364,2022-05-12,Zoraptera


In [7]:
insect_eu_articles.to_pickle("./data/RLIT_method_openalex_EU27_insect_articles.pkl")

A Red List Score was calculated for every insect order by counting the number of articles found for the order and calculating how many articles there are per 100 species in the order (later in a spreadsheet):

    (N_pubs / N_species) x 100

In [8]:
# how many articles per order?
for order in insect_orders:
    print(order + ": " 
          + str(len(insect_eu_araticles[insect_eu_articles["order"]==order].index)) 
          + " articles")

Coleoptera: 2393 articles
Hemiptera: 786 articles
Diptera: 1385 articles
Lepidoptera: 957 articles
Orthoptera: 286 articles
Odonata: 202 articles
Blattodea: 39 articles
Ephemeroptera: 129 articles
Psocodea: 19 articles
Grylloblattodea: 5 articles
Neuroptera: 102 articles
Mecoptera: 20 articles
Trichoptera: 180 articles
Plecoptera: 116 articles
Dermaptera: 22 articles
Mantodea: 40 articles
Siphonaptera: 27 articles
Strepsiptera: 14 articles
Embioptera: 7 articles
Hymenoptera: 1262 articles
Phasmida: 10 articles
Raphidioptera: 12 articles
Isoptera: 36 articles
Megaloptera: 13 articles
Thysanoptera: 48 articles
Zygentoma: 15 articles
Mantophasmatodea: 9 articles
Archaeognatha: 11 articles
Zoraptera: 15 articles
