# 0. Replicating Red List of Insect Taxonomists methodolgy
The European Red List of Insect Taxonomists is an important predecessor to our methodology (https://cloud.pensoft.net/s/mGpyQYUPQOMPs8C). They used Web of Science (WoS) to find articles related to specific insect orders, specifically by searching WoS with the following query (https://www.webofscience.com/wos/woscc/summary/5f6f7d2a-89dd-4709-bf49-494b6f2522bc-73383729/relevance/1):

`
ALL=(Plecoptera  AND  (taxonom*  OR  “new  species”  OR  
“novel species” OR “checklist” OR “new genus” OR “new genera”))
`

To see if OpenAlex can find the same or even more articles, we replicate their methodology here.

Some differences between the two methodologies exist: OpenAlex, for copyright reasons, only records the abstract in the form of an inverted index - listing every word in it and its place(s) in the abstract as a key-value pair, respectively. For this reason, word groups can't be searched exactly: "new species" is searched as "new" and "species", not necessarily adjacent. Furthermore, OpenAlex doesn't record author keywords nor does it feature WoS' "keywords plus" which are based on the references, but it does associate "concepts" with every article. We searched these concepts instead.

Title and abstract were searched concurrently and explicitly since the simpler "search" function of the OpenAlex API searches the full text as well as title and abstract.

In [None]:
import pandas as pd
import pickle

In [None]:
insect_eu_articles = pd.read_pickle("../data/rlit/openalex_EU27_articles.pkl")

In [None]:
# how many articles per order?
insect_orders = ["Coleoptera", "Hemiptera", "Diptera", "Lepidoptera", "Orthoptera", 
                 "Odonata", "Blattodea", "Ephemeroptera", "Psocodea", "Grylloblattodea", 
                 "Neuroptera", "Mecoptera", "Trichoptera", "Plecoptera", "Dermaptera", 
                 "Mantodea", "Siphonaptera", "Strepsiptera", "Embioptera", "Hymenoptera",
                 "Phasmida", "Raphidioptera", "Isoptera", "Megaloptera", "Thysanoptera",
                 "Zygentoma", "Mantophasmatodea", "Archaeognatha", "Zoraptera"]

for order in insect_orders:
    print(order + ": " 
          + str(len(insect_eu_articles[insect_eu_articles["order"]==order].index)) 
          + " articles")

In [None]:
# Web of Science results for ALL=(Strepsiptera  AND  (taxonom*  OR  “new  species” 
# OR “novel species” OR “checklist” OR “new genus” OR “new genera”))
wos_strepsiptera = pd.read_csv("Strepsiptera_WoS_ALL_EU27_2011-2020.tsv", sep="\t")
wos_strepsiptera

In [None]:
# OpenAlex results for similar query
oa_strepsiptera = insect_eu_articles[insect_eu_articles["order"]=="Strepsiptera"].reset_index(drop=True)
oa_strepsiptera

In [None]:
# in common
set(oa_strepsiptera["display_name"]).intersection(wos_strepsiptera["TI"])

In [None]:
# in oa but not in wos
list(set(oa_strepsiptera["display_name"]) - set(wos_strepsiptera["TI"]))

Why does OpenAlex find these articles where Web of Science does not?


- Traumatic insemination and female counter-adaptation in Strepsiptera (Insecta) -> "novel" and "species" apart

- Superparasitism ofEoxenos laboulbeneiDe Peyerimhoff (Strepsiptera: Mengenillidae) byIdiomacromerus gregarius(Silvestri) (Hymenoptera: Chalcidoidea) in southern Spain -> "new" and "species" apart

- A needle in a haystack: Mesozoic origin of parasitism in Strepsiptera revealed by first definite Cretaceous primary larva (Insecta) -> on WoS but not found bcs no taxonomy keyword

- Is †Skleroptera (†Stephanastus) an order in the stemgroup of Coleopterida (Insecta)? -> on WoS but not found bcs no taxonomy keyword

- Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks -> on WoS but not found bcs no taxonomy keyword

- Morphological and molecular evidence converge upon a robust phylogeny of the megadiverse Holometabola -> on WoS but not found bcs no taxonomy keyword (evolution, systematic position,... are included)

Often "phylogeny" *is* included as a keyword

In [None]:
# in wos but not in oa
list(set(wos_strepsiptera["TI"]) - set(oa_strepsiptera["title"]))

Why does WoS find these and not OpenAlex?

- 'Diversity of Eocene Ripiphoridae with descriptions of the first species of Pelecotominae and larva of Ripidiinae (Coleoptera)', -> open access but no N-grams https://api.openalex.org/works/https://doi.org/10.1093/zoolinnean/zlz062 

- 'New ripiphorid beetles in mid-Cretaceous amber from Myanmar (Coleoptera: Ripiphoridae): First Pelecotominae and possible Mesozoic aggregative behaviour in male Ripidiinae', -> closed access, no N-grams http://api.openalex.org/works/https://doi.org/10.1016/j.cretres.2016.08.002 

- 'A remarkable diversity of parasitoid beetles (Ripiphoridae) in Cretaceous amber, with a summary of the Mesozoic record of Tenebrionoidea' -> “new species” in abstract; closed access; n-grams available but limited http://api.openalex.org/works/https://doi.org/10.1016/j.cretres.2018.04.019  

- 'Host specialization and species diversity in the genus Stylops (Strepsiptera: Stylopidae), revealed by molecular phylogenetic analysis' -> authors’ institution information is incomplete

- 'A miniaturized beetle larva in Cretaceous Burmese amber: reinterpretation of a fossil "strepsipteran triungulin"' -> no search terms in title or abstract, WoS found it via a funder "National Science Fund for Fostering Talents in Basic Research (Special Subjects in Animal Taxonomy)"

- 'The First Complete 3D Reconstruction of a Spanish Fly Primary Larva (Lytta vesicatoria, Meloidae, Coleoptera)' -> idem

- 'First Sex Pheromone of the Order Strepsiptera: (3R,5R,9R)-3,5,9-Trimethyldodecanal in Stylops melittae Kirby, 1802' -> "taxonomy" in author keywords, not on OpenAlex