# Search for Authors and Retrieve Their Data

## Setting up the Scopus API Key

You should have your Scopus API Key available. If you have not yet requested a key or do not know where to find it, review the documentation in notebook "01".

The first time you run the code cell below, it will open a small prompt window (usually appearing near the top of the screen), which asks you to paste in your API Key.

![Initial prompt window for entering your API Key](..\images\enter_key.png)

Then, a second window will appear asking for an institional token. That is not necessary, so just press enter.

In [None]:
import pybliometrics
from pybliometrics.scopus.utils import config

Now, your Scopus API Key is stored in a configuration file on your computer so you should not need to enter it again, unless you move or delete the configuration file. To find where the configuration file is on your computer, run the following code:

In [None]:
pybliometrics.scopus.utils.constants.CONFIG_FILE

## Use your Scopus API from within the Dartmouth online network

+ Be sure that you are within the campus network (either on campus or logged into the vpn) to ensure the API will retrieve all requested results

+ Otherwise some requests will return the error: `Scopus401Error: The requestor is not authorized to access the requested view or fields of the resource`

## Types of Bibliometric Data

Most bibliometric data is stored at the document level. That is, bibliometric databases record metadata for each individual article, report, book, or other paper. However, this data can also be aggregated in various ways. Thus, some common types of bibliometric data include:

* document-level data
* author-level summary data + document-level data for each document this an author (co-)authored
* publication-level summary data and metrics (measuring the "impact" of a journal, for example, by quantifying the number of citations or its articles)
+ institutional-level metadata 

In this lesson, we will begin with an author's name, distinguish this particular author from others with the same name, and then retrieve data for the documents (co-)authored by this researcher.

## Get Information for one single author

Using the [Pybliometrics](https://pybliometrics.readthedocs.io/en/stable/) Python library, we can begin by extracting metadata for one single author. However, unless you have an unusual first and last name combination (like me), you will first need to identify the correct individual. For example if you search for "Jane Smith" you might need to parse through data for multiple authors named "Jane Smith" and identify correct matches. 

For example, Jane Smith at Dartmouth may be a different person than Jane Smith at Vassar, but she may be the same person as Jane Smith at UNH (Scopus records often have not been aggregated to merge records of the same person when they move to another institution).

To begin we will search for the [Spanish chemist Rafael Luque who has been suspended by his institution in Spain for academic impropriety](https://cen.acs.org/research-integrity/Highly-cited-chemist-suspended-claiming-to-be-affiliated-with-Russian-and-Saudi-universities/101/i12) related to a highly dubious publication profile (co-authoring 60-70 papers annually) and for accepting salaries as an adjunct scholar at Saudi and Russian universities (while still employed in Spain), which wanted his publication and citation recod to boost their rankings.

We will first use the **AuthorSearch API** to find the correct Rafael Luque. We will then use the **AuthorRetrieval API** to retrieve information about his documents

In [None]:
from pybliometrics.scopus import AuthorSearch
lastname = "Luque"
firstname = "Rafael"
au = AuthorSearch(f"AUTHLAST({lastname}) and AUTHFIRST({firstname})")

In [None]:
au.authors

The AuthorSearch command sends a **request** for information using the search query above. The API then sends a **response** with the request information, whichb we have saved in the variable `au`.

If we just call `au` we just receive a wrapper for the information. To retrieve specific information about the authors that matched this query, we need to be more specific.

In [None]:
?au

In [None]:
dir(au)

In [None]:
au.authors

#eid: '9-s2.0-26643003700'
#orcid: '0000-0003-4190-1916'

The first entry in the results above, the "Rafael Luque" from RUDN appears to be our suspiciously prolific author. Although, observe that this Rafael Luque may have multiple records as Scopus often produces multiple records for individuals who have worked at multiple institutions. But, for this exercise, let's just retrieve information for the Rafael Luque from RUDN.

In [None]:
full_eid = au.authors[0].eid
full_eid

In [None]:
eid = full_eid.split("-")[-1]
eid

In [None]:
# full_eid doesn't work, but eid does
au2 = AuthorSearch(f"AU-ID({eid})")

In [None]:
au2.authors

retrieve specific info

Other ways to narrow down author searches:
* include affiliations or affiliation ids
* include subject areas
* include middle names or initials

## Exercise

Search for an author you know well (could be yourself or a colleague!). How hard is it to parse their publication record from the record of authors with similar names?  

For authors with common names, you can further filter the results by adding in affiliation or other information. See the [Search Tips page](https://dev.elsevier.com/sc_search_tips.html) for more information about these search fields.

In [None]:
lastname = "Mikecz"
firstname = "Jeremy"
au2 = AuthorSearch(f"AUTHLAST({lastname}) and AUTHFIRST({firstname})")

In [None]:
au2.authors

## Author Information Retrieval

In [None]:
from pybliometrics.scopus import AuthorRetrieval
eid = '26643003700'
ar = AuthorRetrieval(eid)
ar

In [None]:
print(ar.indexed_name)
print(ar.affiliation_current)
print("Number of (co-)authored documents:", ar.document_count)
print("Number of citations in these documents:", ar.citation_count)
print("Number of papers citing this author's documents:", ar.cited_by_count)

In [None]:
ar.get_documents()

In [None]:
import pandas as pd

doc_df = pd.DataFrame(ar.get_documents())
doc_df.head()

In [None]:
doc_df.to_csv(f"../data/{lastname}_{firstname}_{eid}_documents.csv", encoding = 'utf-8')

## Exercise

Retrieve information for all documents written by an author of your choosing. Following the code above, place this information into a dataframe and export as a csv.

In [None]:
my_eid = '56749154000'
lastname = "Mikecz"
firstname = "Jeremy"
ar = AuthorRetrieval(my_eid)
doc_df = pd.DataFrame(ar.get_documents())
doc_df.head()
doc_df.to_csv(f"../data/{lastname}_{firstname}_{eid}_mydocuments.csv", encoding = 'utf-8')
