# Search for Authors and Retrieve Their Data

## Setting up the Scopus API Key

You should have your Scopus API Key available. If you have not yet requested a key or do not know where to find it, review the documentation in notebook "01".

The first time you run the code cell below, it will as you to enter your Scopus API Key in a small prompt window (usually appearing near the top of the screen).

![Initial prompt window for entering your API Key](..\images\enter_key.png)

Then, a second window will appear asking for an institional token. That is not necessary, so just press enter.

In [1]:
import pybliometrics
from pybliometrics.scopus.utils import config

Creating config file at C:\Users\F0040RP\.config\pybliometrics.cfg with default paths...
Configuration file successfully created at C:\Users\F0040RP\.config\pybliometrics.cfg
For details see https://pybliometrics.rtfd.io/en/stable/configuration.html.


Now, your Scopus API Key is stored in a configuration file on your computer so you should not need to enter it again, unless you move or delete the configuration file. To find where the configuration file is on your computer, run the following code:

In [6]:
pybliometrics.scopus.utils.constants.CONFIG_FILE

WindowsPath('C:/Users/F0040RP/.config/pybliometrics.cfg')

## Use your Scopus API from within the Dartmouth online network

+ Be sure that you are within the campus network (either on campus or logged into the vpn) to ensure the API will retrieve all requested results

+ Otherwise some requests will return the error: `Scopus401Error: The requestor is not authorized to access the requested view or fields of the resource`

## Get Information for one single author

Using the [Pybliometrics](https://pybliometrics.readthedocs.io/en/stable/) Python library, we can begin by extracting metadata for one single author. However, unless you have an unusual first and last name combination (like me), you will first need to identify the correct individual. For example if you search for "Jane Smith" you might need to parse through data for multiple authors named "Jane Smith" and identify correct matches. 

For example, Jane Smith at Dartmouth may be a different person than Jane Smith at Vassar, but she may be the same person as Jane Smith at UNH (Scopus records often have not been aggregated to merge records of the same person when they move to another institution).

To begin we will search for the [Spanish chemist Rafael Luque who has been suspended by his institution in Spain for academic impropriety](https://cen.acs.org/research-integrity/Highly-cited-chemist-suspended-claiming-to-be-affiliated-with-Russian-and-Saudi-universities/101/i12) related to a highly dubious publication profile (co-authoring 60-70 papers annually) and for accepting salaries as an adjunct scholar at Saudi and Russian universities (while still employed in Spain), which wanted his publication and citation recod to boost their rankings.

We will first use the **AuthorSearch API** to find the correct Rafael Luque. We will then use the **AuthorRetrieval API** to retrieve information about his documents

In [5]:
from pybliometrics.scopus import AuthorSearch
lastname = "Luque"
firstname = "Rafael"
au = AuthorSearch(f"AUTHLAST({lastname}) and AUTHFIRST({firstname})")

The AuthorSearch command sends a **request** for information using the search query above. The API then sends a **response** with the request information, whichb we have saved in the variable `au`.

If we just call `au` we just receive a wrapper for the information. To retrieve specific information about the authors that matched this query, we need to be more specific.

In [7]:
?au

[1;31mType:[0m           AuthorSearch
[1;31mString form:[0m   
Search 'AUTHLAST(Luque) and AUTHFIRST(Rafael)' yielded 29 authors as of 2024-04-05:
           Luque, Ra <...> l; AUTHOR_ID:57213514771 (1 document(s))
           Luque, Rafael; AUTHOR_ID:35810760000 (1 document(s))
[1;31mFile:[0m           c:\users\f0040rp\documents\dartlib_rds\projects\bibliometrics\.venv\lib\site-packages\pybliometrics\scopus\author_search.py
[1;31mDocstring:[0m      <no docstring>
[1;31mInit docstring:[0m
Interaction with the Author Search API.

:param query: A string of the query.  For allowed fields and values see
              https://dev.elsevier.com/sc_author_search_tips.html.
:param refresh: Whether to refresh the cached file if it exists or not.
                If `int` is passed, cached file will be refreshed if the
                number of days since last modification exceeds that value.
:param download: Whether to download results (if they have not been
                 cached).
:pa

In [9]:
dir(au)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_action',
 '_cache_file_path',
 '_integrity',
 '_json',
 '_mdate',
 '_n',
 '_query',
 '_refresh',
 '_view',
 'authors',
 'get_cache_file_age',
 'get_cache_file_mdate',
 'get_key_remaining_quota',
 'get_key_reset_time',
 'get_results_size']

In [10]:
au.authors

#eid: '9-s2.0-26643003700'
#orcid: '0000-0003-4190-1916'

[Author(eid='9-s2.0-26643003700', orcid='0000-0003-4190-1916', surname='Luque', initials='R.G.', givenname='Rafael Geraldo', affiliation='RUDN University', documents=898, affiliation_id='60015024', city='Moscow', country='Russian Federation', areas='CHEM (644); CENG (622); ENVI (611)'),
 Author(eid='9-s2.0-57194868074', orcid='0000-0002-4671-2957', surname='Luque', initials='R.', givenname='Rafael', affiliation='The University of Chicago', documents=126, affiliation_id='60029278', city='Chicago', country='United States', areas='PHYS (121); EART (115); MULT (6)'),
 Author(eid='9-s2.0-57535563900', orcid='0000-0001-5536-1805', surname='Luque-Baena', initials='R.M.', givenname='Rafael Marcos', affiliation='Universidad de Málaga', documents=100, affiliation_id='60003662', city='Malaga', country='Spain', areas='COMP (161); MATH (43); ENGI (25)'),
 Author(eid='9-s2.0-58220142700', orcid='0000-0003-1963-0523', surname='López-Luque', initials='R.', givenname='Rafael', affiliation='Universidad 

In [11]:
?AuthorSearch

[1;31mInit signature:[0m
[0mAuthorSearch[0m[1;33m([0m[1;33m
[0m    [0mquery[0m[1;33m:[0m [0mstr[0m[1;33m,[0m[1;33m
[0m    [0mrefresh[0m[1;33m:[0m [0mUnion[0m[1;33m[[0m[0mbool[0m[1;33m,[0m [0mint[0m[1;33m][0m [1;33m=[0m [1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mverbose[0m[1;33m:[0m [0mbool[0m [1;33m=[0m [1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mdownload[0m[1;33m:[0m [0mbool[0m [1;33m=[0m [1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mintegrity_fields[0m[1;33m:[0m [0mUnion[0m[1;33m[[0m[0mList[0m[1;33m[[0m[0mstr[0m[1;33m][0m[1;33m,[0m [0mTuple[0m[1;33m[[0m[0mstr[0m[1;33m,[0m [1;33m...[0m[1;33m][0m[1;33m][0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mintegrity_action[0m[1;33m:[0m [0mstr[0m [1;33m=[0m [1;34m'raise'[0m[1;33m,[0m[1;33m
[0m    [0mcount[0m[1;33m:[0m [0mint[0m [1;33m=[0m [1;36m200[0m[1;33m,[0m[1;33m
[0m    [1;33m**[0m[0mkwds[0m[1;33m:[0m 

https://dev.elsevier.com/sc_author_search_tips.html

In [14]:
#eid = '9-s2.0-26643003700' #doesn't work
eid = '26643003700'  #works
au2 = AuthorSearch(f"AU-ID({eid})")

In [15]:
au2.authors

[Author(eid='9-s2.0-26643003700', orcid='0000-0003-4190-1916', surname='Luque', initials='R.G.', givenname='Rafael Geraldo', affiliation='RUDN University', documents=898, affiliation_id='60015024', city='Moscow', country='Russian Federation', areas='CHEM (644); CENG (622); ENVI (611)')]

retrieve specific info

In [17]:
au.authors[2].country

'Spain'

Other ways to narrow down author searches:
* include affiliations or affiliation ids
* include subject areas
* include middle names or initials