# Scopus API in Python

By Vincent F. Scalfani and Avery Fernandez

The Scopus API, provided by Elsevier, offers programmatic access to a comprehensive database of abstracts and citations from peer-reviewed literature. It supports advanced search capabilities, author and affiliation retrieval, and citation analysis, facilitating a wide range of academic and research applications.

*This tutorial content is intended to help facilitate academic research.*

Please see the following resources for more information on API usage:
- Documentation
  - <a href="http://www.scopus.com" target="_blank">Scopus</a>
  - <a href="https://dev.elsevier.com/scopus.html" target="_blank">Scopus API</a>
  - <a href="https://dev.elsevier.com/api_docs.html" target="_blank">Elsevier API Documentation</a>
  - <a href="https://dev.elsevier.com/use_cases.html" target="_blank">Elsevier API Use Cases</a>
- Terms
  - <a href="https://dev.elsevier.com/api_service_agreement.html" target="_blank">Elsevier API Service Agreement</a>
  - <a href="https://dev.elsevier.com/policy.html" target="_blank">Elsevier API Policy</a>
- Data Reuse
  - <a href="https://www.elsevier.com/about/policies-and-standards/research-data" target="_blank">Elsevier Research Data Policy</a>
- Scopus Platform

_**NOTE:**_ The Scopus API limits requests to a maximum of 2 per second.

*These recipe examples were tested on May 7, 2025.* 

## Setup

### Import Libraries

The following external libraries need to be installed into your enviornment to run the code examples in this tutorial:
* <a href="https://github.com/psf/requests" target="_blank">requests</a>
* <a href="https://github.com/theskumar/python-dotenv" target="_blank">python-dotenv</a>
* <a href="https://github.com/ipython/ipykernel" target="_blank">ipykernel</a>
* <a href="https://github.com/pandas-dev/pandas" target="_blank">pandas</a>

We import the libraries used in this tutorial below:

In [1]:
import requests
from time import sleep
from pprint import pprint
from dotenv import load_dotenv
import os
import pandas as pd

### Import API Key

An API key is required to access the Scopus API. You can sign up for one at the <a href="https://dev.elsevier.com/apikey/manage" target="_blank">Scopus Developer Portal</a>.

We keep our API key in a separate file, a `.env` file, and use the `dotenv` library to access it. If you use this method, create a file named `.env` in the same directory as this notebook and add the following line to it:

```text
SCOPUS_API_KEY=PUT_YOUR_API_KEY_HERE
```

In [2]:
load_dotenv()
try:
    API_KEY = os.environ["SCOPUS_API_KEY"]
except KeyError:
    print("API key not found. Please set 'SCOPUS_API_KEY' in your .env file.")
else:
    print("Environment and API key successfully loaded.")

Environment and API key successfully loaded.


## 1. Get Author Data

### Number of Records for Author

In [3]:
BASE_URL = "https://api.elsevier.com/content/search/scopus"
params = {
    "query": "AU-ID(55764087400)",
    "apiKey": API_KEY,
    "httpAccept": "application/json"
}

try:
    response = requests.get(BASE_URL, params=params)
    # Raise an error for bad responses
    response.raise_for_status()  
    data = response.json()
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
    data = None

We can take a closer look at the data given back:

In [6]:
pprint(data, depth=1)

{'search-results': {...}}


It seems the data is wrapped in a dictionary with the key `search-results`.

In [7]:
pprint(data["search-results"], depth=1)

{'entry': [...],
 'link': [...],
 'opensearch:Query': {...},
 'opensearch:itemsPerPage': '25',
 'opensearch:startIndex': '0',
 'opensearch:totalResults': '29'}


Inside the `search-results` dictionary, there are six keys:
* `entry` - the actual data we want
* `link` - a link to the API endpoint
* `opensearch:Query` - the query we used to get the data
* `opensearch:itemsPerPage` - the number of items per page
* `opensearch:startIndex` - the starting index of the items
* `opensearch:totalResults` - the total number of results

In [8]:
pprint(data["search-results"]["entry"], depth=1)

[{...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...},
 {...}]


The `entry` key contains a list of data points, we can see that the first item in the list like so:

In [9]:
pprint(data["search-results"]["entry"][0], depth=1)

{'@_fa': 'true',
 'affiliation': [...],
 'article-number': '102984',
 'citedby-count': '0',
 'dc:creator': 'Walker K.W.',
 'dc:identifier': 'SCOPUS_ID:85211077014',
 'dc:title': 'Comparing impact of green open access and toll-access '
             'publication in the chemical sciences',
 'eid': '2-s2.0-85211077014',
 'link': [...],
 'openaccess': '0',
 'openaccessFlag': False,
 'pii': 'S0099133324001459',
 'prism:aggregationType': 'Journal',
 'prism:coverDate': '2025-01-01',
 'prism:coverDisplayDate': 'January 2025',
 'prism:doi': '10.1016/j.acalib.2024.102984',
 'prism:issn': '00991333',
 'prism:issueIdentifier': '1',
 'prism:pageRange': None,
 'prism:publicationName': 'Journal of Academic Librarianship',
 'prism:url': 'https://api.elsevier.com/content/abstract/scopus_id/85211077014',
 'prism:volume': '51',
 'source-id': '12791',
 'subtype': 'ar',
 'subtypeDescription': 'Article'}


The dictionary inside the `entry` list has information for the individual articles. We can load this into a pandas dataframe to make it easier to work with.

In [10]:
df = pd.DataFrame(data["search-results"]["entry"])
df

Unnamed: 0,@_fa,link,prism:url,dc:identifier,eid,dc:title,dc:creator,prism:publicationName,prism:issn,prism:volume,...,subtype,subtypeDescription,article-number,source-id,openaccess,openaccessFlag,prism:eIssn,freetoread,freetoreadLabel,pubmed-id
0,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85211077014,2-s2.0-85211077014,Comparing impact of green open access and toll...,Walker K.W.,Journal of Academic Librarianship,991333.0,51,...,ar,Article,102984.0,12791,0,False,,,,
1,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85150443997,2-s2.0-85150443997,Citation Metrics and Boyer’s Model of Scholars...,Gilstrap D.L.,Innovative Higher Education,7425627.0,48,...,ar,Article,,144736,0,False,15731758.0,,,
2,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85174507273,2-s2.0-85174507273,Creating a Scholarly API Cookbook: Supporting ...,Scalfani V.F.,Issues in Science and Technology Librarianship,,2023,...,ar,Article,,19400156823,1,True,10921206.0,"{'value': [{'$': 'all'}, {'$': 'publisherfullg...","{'value': [{'$': 'All Open Access'}, {'$': 'Go...",
3,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85149244960,2-s2.0-85149244960,The current landscape of author guidelines in ...,Parks N.A.,Pure and Applied Chemistry,334545.0,95,...,cp,Conference Paper,,21458,1,True,13653075.0,"{'value': [{'$': 'all'}, {'$': 'publisherhybri...","{'value': [{'$': 'All Open Access'}, {'$': 'Hy...",
4,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85145228753,2-s2.0-85145228753,Visualizing chemical space networks with RDKit...,Scalfani V.F.,Journal of Cheminformatics,,14,...,ar,Article,87.0,19600157322,1,True,17582946.0,"{'value': [{'$': 'all'}, {'$': 'publisherfullg...","{'value': [{'$': 'All Open Access'}, {'$': 'Go...",
5,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85142025984,2-s2.0-85142025984,The Power Law and Emerging and Senior Scholar ...,Bray N.J.,Innovative Higher Education,7425627.0,47,...,ar,Article,,144736,0,False,15731758.0,,,
6,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85132889260,2-s2.0-85132889260,Cheminformatics: data and standards a Pure and...,Scalfani V.F.,Pure and Applied Chemistry,334545.0,94,...,ed,Editorial,,21458,1,True,13653075.0,"{'value': [{'$': 'all'}, {'$': 'publisherfree2...","{'value': [{'$': 'All Open Access'}, {'$': 'Br...",
7,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85120727934,2-s2.0-85120727934,Using NCBI Entrez Direct (EDirect) for Small M...,Scalfani V.F.,Journal of Chemical Education,219584.0,98,...,ar,Article,,24169,0,False,19381328.0,,,
8,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85192077253,2-s2.0-85192077253,Enhancing the Discovery of Chemistry Theses by...,Scalfani V.F.,Issues in Science and Technology Librarianship,,2021,...,ar,Article,,19400156823,1,True,10921206.0,"{'value': [{'$': 'all'}, {'$': 'publisherfullg...","{'value': [{'$': 'All Open Access'}, {'$': 'Go...",
9,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85117202467,2-s2.0-85117202467,Using the linux operating system full-time tip...,Scalfani V.F.,College and Research Libraries News,990086.0,82,...,no,Note,,14239,1,True,21506698.0,"{'value': [{'$': 'all'}, {'$': 'publisherfullg...","{'value': [{'$': 'All Open Access'}, {'$': 'Go...",


In [11]:
# See the columns of the DataFrame
df.columns

Index(['@_fa', 'link', 'prism:url', 'dc:identifier', 'eid', 'dc:title',
       'dc:creator', 'prism:publicationName', 'prism:issn', 'prism:volume',
       'prism:issueIdentifier', 'prism:pageRange', 'prism:coverDate',
       'prism:coverDisplayDate', 'prism:doi', 'pii', 'citedby-count',
       'affiliation', 'prism:aggregationType', 'subtype', 'subtypeDescription',
       'article-number', 'source-id', 'openaccess', 'openaccessFlag',
       'prism:eIssn', 'freetoread', 'freetoreadLabel', 'pubmed-id'],
      dtype='object')

In [12]:
# Number of rows
len(df)

25

In [13]:
# We can index data from our new dataframe, df1.
# For example, create a list of just the DOIs
dois = df['prism:doi'].tolist()
dois

['10.1016/j.acalib.2024.102984',
 '10.1007/s10755-023-09648-7',
 '10.29173/istl2766',
 '10.1515/pac-2022-1001',
 '10.1186/s13321-022-00664-x',
 '10.1007/s10755-022-09636-3',
 '10.1515/pac-2022-2019',
 '10.1021/acs.jchemed.1c00904',
 '10.29173/istl2566',
 '10.5860/crln.82.9.428',
 '10.1021/acs.iecr.8b02573',
 '10.1021/acs.jchemed.6b00602',
 '10.5062/F4TD9VBX',
 '10.1021/acs.macromol.6b02005',
 '10.1186/s13321-016-0181-z',
 '10.1021/acs.chemmater.5b04431',
 '10.1021/acs.jchemed.5b00512',
 '10.1021/acs.jchemed.5b00375',
 '10.5860/crln.76.9.9384',
 '10.5860/crln.76.2.9259',
 '10.1126/science.346.6214.1258',
 '10.1021/ed400887t',
 '10.1016/j.acalib.2014.03.015',
 '10.5062/F4XS5SB9',
 '10.1021/ma300328u']

In [14]:
# Get a list of article titles
titles = df['dc:title'].tolist()
titles

['Comparing impact of green open access and toll-access publication in the chemical sciences',
 'Citation Metrics and Boyer’s Model of Scholarship: How Do Bibliometrics and Altmetrics Respond to Research Impact?',
 'Creating a Scholarly API Cookbook: Supporting Library Users with Programmatic Access to Information',
 'The current landscape of author guidelines in chemistry through the lens of research data sharing',
 'Visualizing chemical space networks with RDKit and NetworkX',
 'The Power Law and Emerging and Senior Scholar Publication Patterns',
 'Cheminformatics: data and standards a Pure and Applied Chemistry special issue',
 'Using NCBI Entrez Direct (EDirect) for Small Molecule Chemical Information Searching in a Unix Terminal',
 'Enhancing the Discovery of Chemistry Theses by Registering Substances and Depositing in PubChem',
 'Using the linux operating system full-time tips and experiences from a subject liaison librarian',
 'Analysis of the Frequency and Diversity of 1,3-Dial

In [15]:
# Now a list of the cited by count
cited_by = df['citedby-count'].tolist()
cited_by

['0',
 '6',
 '2',
 '2',
 '38',
 '3',
 '0',
 '3',
 '0',
 '0',
 '24',
 '29',
 '7',
 '14',
 '27',
 '7',
 '13',
 '28',
 '0',
 '1',
 '0',
 '114',
 '6',
 '39',
 '48']

In [16]:
# Get sum of cited_by counts
sum([int(x) for x in cited_by])

411

## 2. Get Author Data in a Loop

### Number of Records for Author

In [22]:
# Load a list of author names and Scopus AUIDs
import csv
filename = 'authors.txt'

with open(filename, 'r') as infile:
    rows = csv.reader(infile, delimiter='\t')
    author_list = list(rows)
author_list

[['Emy Decker', '36660678600'],
 ['Lindsey Lowry', '57210944451'],
 ['Karen Chapman', '35783926100'],
 ['Kevin Walker', '56133961300'],
 ['Sara Whitver', '57194760730']]

In [23]:
# Get number of Scopus records for each author
num_records = []
for author, authorID in author_list:

    params = {
        'query': f'AU-ID({authorID})',
        'apiKey': API_KEY,
        'httpAccept': 'application/json'
    }

    try:
        response = requests.get(BASE_URL, params=params)
        sleep(1)  
        # Raise an error for bad responses
        response.raise_for_status()  
        data = response.json()
        number_of_records = data['search-results']['opensearch:totalResults']
        num_records.append([author, authorID, number_of_records])
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        num_records.append([author, authorID, None])

In [24]:
num_records

[['Emy Decker', '36660678600', '22'],
 ['Lindsey Lowry', '57210944451', '8'],
 ['Karen Chapman', '35783926100', '23'],
 ['Kevin Walker', '56133961300', '11'],
 ['Sara Whitver', '57194760730', '7']]

### Download Record Data

In [None]:
# Let's say we want the DOIs and cited by counts in a list
cites = []
for author,authorID in author_list:
    params = {
        'query': f'AU-ID({authorID})',
        'apiKey': API_KEY,
        'httpAccept': 'application/json'
    }

    try:
        response = requests.get(BASE_URL, params=params)
        sleep(1)  
        # Raise an error for bad responses
        response.raise_for_status()  
        data = response.json()
        author_df = pd.DataFrame(data['search-results']['entry'])
        # Get the DOIs and cited by counts
        dois = author_df['prism:doi'].tolist()
        cited_by = author_df['citedby-count'].tolist()
        # Create a list of lists with author, authorID, DOI, and cited by count
        for doi, cited in zip(dois, cited_by):
            cites.append([author, authorID, doi, cited])
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        cites.append([author, authorID, None, None])

In [26]:
# The cites variable is a list of list with the data
# View data for first 5 records
cites[:5]

[['Emy Decker', '36660678600', '10.1007/s10755-024-09698-5', '0'],
 ['Emy Decker', '36660678600', '10.1016/j.acalib.2024.102858', '1'],
 ['Emy Decker', '36660678600', nan, '0'],
 ['Emy Decker', '36660678600', '10.1016/j.acalib.2022.102648', '0'],
 ['Emy Decker', '36660678600', '10.1016/j.acalib.2022.102634', '3']]

In [None]:
# Add to DataFrame
cites_df = pd.DataFrame(cites)
cites_df

Unnamed: 0,0,1,2,3
0,Emy Decker,36660678600,10.1007/s10755-024-09698-5,0
1,Emy Decker,36660678600,10.1016/j.acalib.2024.102858,1
2,Emy Decker,36660678600,,0
3,Emy Decker,36660678600,10.1016/j.acalib.2022.102648,0
4,Emy Decker,36660678600,10.1016/j.acalib.2022.102634,3
...,...,...,...,...
66,Sara Whitver,57194760730,10.1016/j.acalib.2020.102136,7
67,Sara Whitver,57194760730,10.1353/pla.2020.0019,5
68,Sara Whitver,57194760730,10.1108/RSR-04-2019-0023,5
69,Sara Whitver,57194760730,10.15760/comminfolit.2017.11.1.41,7


### Save Record Data to a File

Here is one method if you want to loop over author queries and save all Scopus document data to a file

In [29]:
# Load a list of author names and Scopus AUIDs
import csv
with open('authors.txt') as infile:
    rows = csv.reader(infile, delimiter='\t')
    author_list = list(rows)
author_list

[['Emy Decker', '36660678600'],
 ['Lindsey Lowry', '57210944451'],
 ['Karen Chapman', '35783926100'],
 ['Kevin Walker', '56133961300'],
 ['Sara Whitver', '57194760730']]

In [30]:
# NOTE: This writes one file for each author dataset
for authorName, authorID in author_list:
    # Create new empty DataFrame on each loop
    df = pd.DataFrame()

    # Set up query parameters
    params = {
        'query': f'AU-ID({authorID})',
        'apiKey': API_KEY,
        'httpAccept': 'application/json'
    }

    try:
        # Make the API request
        response = requests.get(BASE_URL, params=params)
        sleep(2)  
        response.raise_for_status()  # Raise an error for bad responses
        data = response.json()

        # Extract the 'entry' data and convert it to a DataFrame
        if 'entry' in data['search-results']:
            df = pd.DataFrame(data['search-results']['entry'])

        # Save to file
        filename = f"{authorName.replace(' ', '_')}_{authorID}_ScopusData.tsv"
        df.to_csv(filename, sep='\t', index=False)

        print(f"Data for {authorName} saved to {filename}")
    except requests.exceptions.RequestException as e:
        print(f"An error occurred for {authorName}: {e}")

Data for Emy Decker saved to Emy_Decker_36660678600_ScopusData.tsv
Data for Lindsey Lowry saved to Lindsey_Lowry_57210944451_ScopusData.tsv
Data for Karen Chapman saved to Karen_Chapman_35783926100_ScopusData.tsv
Data for Kevin Walker saved to Kevin_Walker_56133961300_ScopusData.tsv
Data for Sara Whitver saved to Sara_Whitver_57194760730_ScopusData.tsv


In [31]:
# Load one of the files into pandas
df_author = pd.read_csv('Karen_Chapman_35783926100_ScopusData.tsv', delimiter='\t')
df_author

Unnamed: 0,@_fa,link,prism:url,dc:identifier,eid,dc:title,dc:creator,prism:publicationName,prism:issn,prism:eIssn,...,prism:aggregationType,subtype,subtypeDescription,source-id,openaccess,openaccessFlag,freetoread,freetoreadLabel,pii,article-number
0,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85159073764,2-s2.0-85159073764,The Ideal Review Process Is a Three-Way Street,Ellinger A.D.,Human Resource Development Review,15344843,15526712.0,...,Journal,ar,Article,7100153132,1,True,"{'value': [{'$': 'all'}, {'$': 'publisherhybri...","{'value': [{'$': 'All Open Access'}, {'$': 'Hy...",,
1,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85120603808,2-s2.0-85120603808,Launching chat service during the pandemic: in...,Decker E.N.,Reference Services Review,907324,,...,Journal,ar,Article,144671,0,False,,,,
2,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85106619646,2-s2.0-85106619646,Characteristics of systematic reviews in the s...,Chapman K.,Journal of Academic Librarianship,991333,,...,Journal,ar,Article,12791,0,False,,,S0099133321000872,102396.0
3,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85073936184,2-s2.0-85073936184,"An evaluation of Web of Science, Scopus and Go...",Chapman K.,International Journal of Logistics Management,9574093,17586550.0,...,Journal,ar,Article,19700201449,0,False,,,,
4,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85065484203,2-s2.0-85065484203,BENCHMARKING MARKETING SCHOLAR PRODUCTIVITY,Chapman K.,Marketing Education Review,10528008,21539987.0,...,Journal,ar,Article,21100887523,0,False,,,,
5,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85018240091,2-s2.0-85018240091,The Impact of the Monographs Crisis on the Fie...,Chapman K.,Journal of Academic Librarianship,991333,,...,Journal,ar,Article,12791,0,False,,,S0099133316303305,
6,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:84955449622,2-s2.0-84955449622,IJPDLM’s 45th anniversary: a retrospective bib...,Ellinger A.,International Journal of Physical Distribution...,9600035,,...,Journal,re,Review,144922,0,False,,,,
7,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:84905729060,2-s2.0-84905729060,"Literature of Behavioral Economics, Part 2: Da...",Chapman K.,Behavioral and Social Sciences Librarian,1639269,15444546.0,...,Journal,ar,Article,12881,0,False,,,,
8,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:84887142927,2-s2.0-84887142927,"Literature of Behavioral Economics, Part 1: In...",Chapman K.,Behavioral and Social Sciences Librarian,1639269,15444546.0,...,Journal,ar,Article,12881,0,False,,,,
9,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:81855199796,2-s2.0-81855199796,Benchmarking leading supply chain management a...,Ellinger A.E.,International Journal of Logistics Management,9574093,17586550.0,...,Journal,ar,Article,19700201449,0,False,,,,


In [32]:
# Get info about citedby_count
df_author["citedby-count"].describe()

count    23.000000
mean     10.739130
std      12.274206
min       0.000000
25%       4.000000
50%       6.000000
75%      11.500000
max      45.000000
Name: citedby-count, dtype: float64

In [33]:
# Get info about publication titles
df_author['prism:publicationName'].describe()

count                                           23
unique                                          11
top       Behavioral and Social Sciences Librarian
freq                                             5
Name: prism:publicationName, dtype: object

## 3. Get References via a Title Search

### Number of Title Match Records

In [34]:
# Search Scopus for all references containing 'ChemSpider' in the record title
params = {
    "query": "TITLE(ChemSpider)",
    "apiKey": API_KEY,
    "httpAccept": "application/json"
}

try:
    response = requests.get(BASE_URL, params=params)
    response.raise_for_status()  # Raise an error for bad responses
    data = response.json()
    print(data["search-results"]["opensearch:totalResults"])
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

7


In [35]:
# Repeat this in a loop
titleWord_list = ['ChemSpider', 'PubChem', 'ChEMBL', 'Reaxys', 'SciFinder']

# Get number of Scopus records for each title search
num_records_title = []
for titleWord in titleWord_list:
    # Set up query parameters
    params = {
        "query": f"TITLE({titleWord})",
        "apiKey": API_KEY,
        "httpAccept": "application/json"
    }

    try:
        # Make the API request
        response = requests.get(BASE_URL, params=params)
        response.raise_for_status()  # Raise an error for bad responses
        data = response.json()

        # Extract the total number of results
        numt = data["search-results"]["opensearch:totalResults"]

        # Compile saved Scopus data into a list of lists
        num_records_title.append([titleWord, numt])

        # Delay 1 second between API calls to be nice to Elsevier servers
        sleep(1)
    except requests.exceptions.RequestException as e:
        print(f"An error occurred for {titleWord}: {e}")
        num_records_title.append([titleWord, None])

In [36]:
num_records_title

[['ChemSpider', '7'],
 ['PubChem', '102'],
 ['ChEMBL', '64'],
 ['Reaxys', '9'],
 ['SciFinder', '34']]

### Download Title Match Record Data

In [37]:
# Download records and create a list of selected metadata
titleWord_list = ['ChemSpider', 'PubChem', 'ChEMBL', 'Reaxys', 'SciFinder']
scopus_title_data = []

for titleWord in titleWord_list:
    # Set up query parameters
    params = {
        "query": f"TITLE({titleWord})",
        "apiKey": API_KEY,
        "httpAccept": "application/json"
    }

    try:
        # Make the API request
        response = requests.get(BASE_URL, params=params)
        # Delay 1 second between API calls to be nice to Elsevier servers
        sleep(1)
        # Raise an error for bad responses
        response.raise_for_status()  
        data = response.json()

        # Extract the 'entry' data and convert it to a DataFrame
        entries = data['search-results'].get('entry', [])
        for entry in entries:
            # Extract relevant metadata
            doi = entry.get('prism:doi', None)
            title = entry.get('dc:title', None)
            coverDate = entry.get('prism:coverDate', None)

            # Append to the list
            scopus_title_data.append([titleWord, doi, title, coverDate])
    except requests.exceptions.RequestException as e:
        print(f"An error occurred for {titleWord}: {e}")
        scopus_title_data.append([titleWord, None, None, None])

In [38]:
# Add to DataFrame
scopus_title_data_df = pd.DataFrame(scopus_title_data)
scopus_title_data_df.rename(columns={0:"titleWord",1: "doi",2: "title", 3: "coverDate"},
                            inplace=True)
scopus_title_data_df

Unnamed: 0,titleWord,doi,title,coverDate
0,ChemSpider,10.1039/c5np90022k,Editorial: ChemSpider-a tool for Natural Produ...,2015-08-01
1,ChemSpider,10.1021/bk-2013-1128.ch020,ChemSpider: How a free community resource of d...,2013-01-01
2,ChemSpider,10.1007/s13361-011-0265-y,"Identification of ""known unknowns"" utilizing a...",2012-01-01
3,ChemSpider,10.1002/9781118026038.ch22,Chemspider: A Platform for Crowdsourced Collab...,2011-05-03
4,ChemSpider,10.1021/ed100697w,Chemspider: An online chemical information res...,2010-11-01
...,...,...,...,...
86,SciFinder,,SciFinder not affordable [1],2006-03-13
87,SciFinder,10.1021/ci050481b,SciFinder Scholar 2006: An empirical analysis ...,2006-01-01
88,SciFinder,10.2174/1570163054064693,Exploration tools for drug discovery and beyon...,2005-06-01
89,SciFinder,10.1021/ed082p652,A literature exercise using SciFinder Scholar ...,2005-01-01
