# 1. Using OpenAlex to find taxonomists

## 1.2. Preprocessing OpenAlex article data into author data

Previously, we found a list of articles of taxonomic interest. Ultimately, we are interested in the authors, whom we assume are taxonomists or at least have relevant expertise about the taxon studied in the paper. We extract the information of the authors from the OpenAlex article data here. 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt # version 3.5.2
import pickle
import openalex

In [2]:
eu_tax_articles = pd.read_pickle("./data/keyword_filtered_articles_EU27_with_taxonomy_concept_in_journal.pkl")
eu_tax_articles = openalex.flatten_works(eu_tax_articles)
eu_tax_articles

Unnamed: 0,id,doi,title,display_name,publication_year,publication_date,ids,language,primary_location,type,...,source_issn,source_host_organization,source_host_organization_name,source_host_organization_lineage,source_host_organization_lineage_names,source_type,is_oa,oa_status,oa_url,any_repository_has_fulltext
0,https://openalex.org/W2282635459,https://doi.org/10.1071/is15033,Integrative systematic studies on tardigrades ...,Integrative systematic studies on tardigrades ...,2016,2016-08-31,{'openalex': 'https://openalex.org/W2282635459...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
1,https://openalex.org/W1534942643,https://doi.org/10.1071/is13002,Morphological and molecular insights on Megalo...,Morphological and molecular insights on Megalo...,2013,2013-01-01,{'openalex': 'https://openalex.org/W1534942643...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,False,closed,,False
2,https://openalex.org/W2159262135,https://doi.org/10.1071/is13030,Two distinct evolutionary lineages of the Asta...,Two distinct evolutionary lineages of the Asta...,2014,2014-01-01,{'openalex': 'https://openalex.org/W2159262135...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,False,closed,,False
3,https://openalex.org/W2271608622,https://doi.org/10.1071/is15023,Molecular evidence for non-monophyly of the pi...,Molecular evidence for non-monophyly of the pi...,2016,2016-01-01,{'openalex': 'https://openalex.org/W2271608622...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,False,closed,,False
4,https://openalex.org/W2209115793,https://doi.org/10.1071/is14019,Mitochondrial DNA analyses reveal widespread t...,Mitochondrial DNA analyses reveal widespread t...,2015,2015-01-01,{'openalex': 'https://openalex.org/W2209115793...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://oa.upm.es/40973/1/INVE_MEM_2015_224830...,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11649,https://openalex.org/W4253285772,https://doi.org/10.21684/0132-8077-2019-27-2-1...,"MERISTACARUS BOCHKOVI (ACARI, ORIBATIDA, LOHMA...","MERISTACARUS BOCHKOVI (ACARI, ORIBATIDA, LOHMA...",2019,2019-12-30,{'openalex': 'https://openalex.org/W4253285772...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,False,closed,,False
11650,https://openalex.org/W4253410054,https://doi.org/10.21684/0132-8077-2021-29-1-3-9,TAXONOMIC CONTRIBUTION TO THE KNOWLEDGE OF THE...,TAXONOMIC CONTRIBUTION TO THE KNOWLEDGE OF THE...,2021,2021-01-01,{'openalex': 'https://openalex.org/W4253410054...,en,"{'is_oa': False, 'landing_page_url': 'https://...",journal-article,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,False,closed,,False
11651,https://openalex.org/W4254963337,https://doi.org/10.21684/0132-8077-2019-27-2-2...,A NEW SYRINGOPHILID MITE (ACARIFORMES: SYRINGO...,A NEW SYRINGOPHILID MITE (ACARIFORMES: SYRINGO...,2019,2019-12-30,{'openalex': 'https://openalex.org/W4254963337...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",journal-article,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-2...,False
11652,https://openalex.org/W4255590455,https://doi.org/10.21684/0132-8077-2019-27-2-1...,"HYPOZETES ANDREII (ACARI, ORIBATIDA, TEGORIBAT...","HYPOZETES ANDREII (ACARI, ORIBATIDA, TEGORIBAT...",2019,2019-12-30,{'openalex': 'https://openalex.org/W4255590455...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",journal-article,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-1...,True


In [3]:
eu_tax_articles.loc[0]

id                                                         https://openalex.org/W2282635459
doi                                                         https://doi.org/10.1071/is15033
title                                     Integrative systematic studies on tardigrades ...
display_name                              Integrative systematic studies on tardigrades ...
publication_year                                                                       2016
publication_date                                                                 2016-08-31
ids                                       {'openalex': 'https://openalex.org/W2282635459...
language                                                                                 en
primary_location                          {'is_oa': False, 'landing_page_url': 'https://...
type                                                                        journal-article
open_access                               {'is_oa': True, 'oa_status': 'green', 

In [4]:
authors_eu_tax = openalex.get_authors(eu_tax_articles)
authors_eu_tax

Unnamed: 0,article_id,author_position,author_id,author_display_name,orcid,raw_affiliation_string,inst_id,inst_display_name,ror,inst_country_code,...,source_issn,source_host_organization,source_host_organization_name,source_host_organization_lineage,source_host_organization_lineage_names,source_type,is_oa,oa_status,oa_url,any_repository_has_fulltext
0,https://openalex.org/W2282635459,first,https://openalex.org/A2528062669,Matteo Vecchi,https://orcid.org/0000-0002-7995-6827,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,IT,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
1,https://openalex.org/W2282635459,middle,https://openalex.org/A2143576881,Michele Cesari,https://orcid.org/0000-0001-8857-3791,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,IT,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
2,https://openalex.org/W2282635459,middle,https://openalex.org/A4360091361,Roberto Bertolani,,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,IT,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
3,https://openalex.org/W2282635459,middle,https://openalex.org/A4335463764,K. Ingemar Jönsson,,"School of Education and Environment, Kristians...",https://openalex.org/I193278943,Kristianstad University,https://ror.org/00tkrft03,SE,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
4,https://openalex.org/W2282635459,middle,https://openalex.org/A2077559806,Lorena Rebecchi,https://orcid.org/0000-0002-0702-1846,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,IT,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49494,https://openalex.org/W4255590455,middle,https://openalex.org/A4334142574,Elizabeth A. Hugo-Coetzee,,"National Museum, Bloemfontein, South Africa; U...",https://openalex.org/I4210088520,National Museum,https://ror.org/004qfqh71,ZA,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-1...,True
49495,https://openalex.org/W4255590455,middle,https://openalex.org/A4353913507,Alexander A. Khaustov,,"X-BIO Institute, Tyumen State University, Tyum...",https://openalex.org/I3020440027,University of Tyumen,https://ror.org/05vehv290,RU,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-1...,True
49496,https://openalex.org/W4255590455,last,https://openalex.org/A4355237428,Jenő Kontschán,,"Plant Protection Institute, Centre for Agricul...",https://openalex.org/I4210156273,Plant Protection Institute,https://ror.org/052t9a145,HU,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-1...,True
49497,https://openalex.org/W4285411966,first,https://openalex.org/A4300730993,Fabio Cianferoni,,National Research Council of Italy (CNR),https://openalex.org/I4210155236,National Research Council,https://ror.org/04zaypm56,IT,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2022-30-1-9...,False


In [6]:
only_eu_authors = openalex.get_eu_authors(authors_eu_tax)
only_eu_authors

Unnamed: 0,Index,article_id,author_position,author_id,author_display_name,orcid,raw_affiliation_string,inst_id,inst_display_name,ror,...,source_issn,source_host_organization,source_host_organization_name,source_host_organization_lineage,source_host_organization_lineage_names,source_type,is_oa,oa_status,oa_url,any_repository_has_fulltext
0,0,https://openalex.org/W2282635459,first,https://openalex.org/A2528062669,Matteo Vecchi,https://orcid.org/0000-0002-7995-6827,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
1,1,https://openalex.org/W2282635459,middle,https://openalex.org/A2143576881,Michele Cesari,https://orcid.org/0000-0001-8857-3791,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
2,2,https://openalex.org/W2282635459,middle,https://openalex.org/A4360091361,Roberto Bertolani,,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
3,3,https://openalex.org/W2282635459,middle,https://openalex.org/A4335463764,K. Ingemar Jönsson,,"School of Education and Environment, Kristians...",https://openalex.org/I193278943,Kristianstad University,https://ror.org/00tkrft03,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
4,4,https://openalex.org/W2282635459,middle,https://openalex.org/A2077559806,Lorena Rebecchi,https://orcid.org/0000-0002-0702-1846,Department of Life Sciences University of Mode...,https://openalex.org/I122346577,University of Modena and Reggio Emilia,https://ror.org/02d4c4y02,...,1447-2600\n1445-5226,https://openalex.org/P4310320302,CSIRO Publishing,[https://openalex.org/P4310320302],[CSIRO Publishing],journal,True,green,https://iris.unimore.it/bitstream/11380/110720...,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25098,49489,https://openalex.org/W4253410054,last,https://openalex.org/A2118604301,Josef Starý,https://orcid.org/0000-0002-9440-4254,Institute of Soil Biology,https://openalex.org/I4210124224,Institute of Soil Biology,https://ror.org/02tz8r820,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,False,closed,,False
25099,49490,https://openalex.org/W4254963337,first,https://openalex.org/A4357507713,Maciej Skoracki,,"Department of Animal Morphology, Adam Mickiewi...",https://openalex.org/I173161963,University of Prešov,https://ror.org/02ndfsn03,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-2...,False
25100,49496,https://openalex.org/W4255590455,last,https://openalex.org/A4355237428,Jenő Kontschán,,"Plant Protection Institute, Centre for Agricul...",https://openalex.org/I4210156273,Plant Protection Institute,https://ror.org/052t9a145,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2019-27-2-1...,True
25101,49497,https://openalex.org/W4285411966,first,https://openalex.org/A4300730993,Fabio Cianferoni,,National Research Council of Italy (CNR),https://openalex.org/I4210155236,National Research Council,https://ror.org/04zaypm56,...,2221-5115\n0132-8077,https://openalex.org/P4310312765,University of Tyumen,[https://openalex.org/P4310312765],[University of Tyumen],journal,True,bronze,https://doi.org/10.21684/0132-8077-2022-30-1-9...,False


In [7]:
single_eu_authors = openalex.get_single_authors(only_eu_authors).reset_index(drop=True)
single_eu_authors

Unnamed: 0,Index,article_id,author_position,author_id,author_display_name,orcid,raw_affiliation_string,inst_id,inst_display_name,ror,...,source_issn,source_host_organization,source_host_organization_name,source_host_organization_lineage,source_host_organization_lineage_names,source_type,is_oa,oa_status,oa_url,any_repository_has_fulltext
0,39349,https://openalex.org/W4286715885,first,https://openalex.org/A2666249949,Yasmina Marin-Felix,,Helmholtz Centre for Infection Research; Techn...,https://openalex.org/I4210124929,Helmholtz Centre for Infection Research,https://ror.org/03d0p2685,...,1861-8952\n1617-416X,https://openalex.org/P4310319900,Springer Science+Business Media,"[https://openalex.org/P4310319965, https://ope...","[Springer Nature, Springer Science+Business Me...",journal,True,hybrid,https://link.springer.com/content/pdf/10.1007/...,False
1,11446,https://openalex.org/W4282829982,last,https://openalex.org/A2105432390,Marcus Lehnert,https://orcid.org/0000-0002-7202-7734,"Martin-Luther-Universität Halle-Wittenberg, Be...",https://openalex.org/I68956291,Martin Luther University Halle-Wittenberg,https://ror.org/05gqaka33,...,1179-3163\n1179-3155,https://openalex.org/P4310321855,Q15088586,[https://openalex.org/P4310321855],[Q15088586],journal,False,closed,,False
2,10509,https://openalex.org/W3191918968,middle,https://openalex.org/A2095303932,Katharina Zacher,https://orcid.org/0000-0001-8897-1255,Alfred Wegener Institute Helmholtz Centre for ...,https://openalex.org/I127251866,Alfred Wegener Institute for Polar and Marine ...,https://ror.org/032e6b942,...,1179-3163\n1179-3155,https://openalex.org/P4310321855,Q15088586,[https://openalex.org/P4310321855],[Q15088586],journal,False,closed,,False
3,13969,https://openalex.org/W2949939566,middle,https://openalex.org/A2791682482,Paul van Els,https://orcid.org/0000-0002-9499-8873,Groningen Institute for Evolutionary Life Scie...,https://openalex.org/I169381384,University of Groningen,https://ror.org/012p63287,...,1076-836X\n1063-5157,https://openalex.org/P4310311648,Oxford University Press,"[https://openalex.org/P4310311647, https://ope...","[University of Oxford, Oxford University Press]",journal,True,green,https://research.rug.nl/files/118029670/syz027...,True
4,15848,https://openalex.org/W2944966715,last,https://openalex.org/A4374495681,Nina Rønsted,,"Natural History Museum of Denmark, University ...",https://openalex.org/I4210110903,Natural History Museum Aarhus,https://ror.org/0166x0j30,...,1055-7903\n1095-9513,https://openalex.org/P4310320990,Elsevier BV,[https://openalex.org/P4310320990],[Elsevier BV],journal,True,hybrid,https://doi.org/10.1016/j.ympev.2019.05.013,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10470,25993,https://openalex.org/W336386924,first,https://openalex.org/A2645630823,Mariana Y. Deli Antoni,,,https://openalex.org/I134820265,Spanish National Research Council,https://ror.org/02gfc7t72,...,1175-5334\n1175-5326,https://openalex.org/P4310321855,Q15088586,[https://openalex.org/P4310321855],[Q15088586],journal,False,closed,,False
10471,40808,https://openalex.org/W3173396053,first,https://openalex.org/A4356957360,Pedro Pablo Ferrer Gallego,,"Servicio de Vida Silvestre, Centro para la Inv...",https://openalex.org/I2802828525,Generalitat Valenciana,https://ror.org/0097mvx21,...,0040-0262\n1996-8175,https://openalex.org/P4310320595,Wiley,[https://openalex.org/P4310320595],[Wiley],journal,True,closed,https://onlinelibrary.wiley.com/doi/pdfdirect/...,False
10472,32820,https://openalex.org/W3209978301,middle,https://openalex.org/A4349394943,Frédéric Chérot,,Département de l’Etude du Milieu Naturel et Ag...,https://openalex.org/I4210113172,Service Public de Wallonie,https://ror.org/0215gxf82,...,1175-5334\n1175-5326,https://openalex.org/P4310321855,Q15088586,[https://openalex.org/P4310321855],[Q15088586],journal,False,closed,,False
10473,16561,https://openalex.org/W1982107118,middle,https://openalex.org/A390288074,Vladimír Vrkoslav,https://orcid.org/0000-0002-5126-8360,Institute of Organic Chemistry and Biochemistr...,https://openalex.org/I4210145889,"Czech Academy of Sciences, Institute of Organi...",https://ror.org/04nfjn472,...,1055-7903\n1095-9513,https://openalex.org/P4310320990,Elsevier BV,[https://openalex.org/P4310320990],[Elsevier BV],journal,False,closed,,False


In [8]:
only_eu_authors.to_pickle("./data/EU27_authors_with_all_taxonomic_articles.pkl")
single_eu_authors.to_pickle("./data/EU27_authors_taxonomic_articles_no_duplicates.pkl")

We now have a list of all European authors of taxonomic articles from taxonomic journals. 