## Using Elsevier APIs

This notebook provides code that you can adapt to use with the [Scopus Search API](https://dev.elsevier.com/documentation/ScopusSearchAPI.wadl) and the [Abstract Retrieval Search API](https://dev.elsevier.com/documentation/AbstractRetrievalAPI.wadl). It includes:
1. Setting up access to Elsevier APIs.
2. Constructing a search template for sending different kinds of searches to the Scopus Search API (e.g., keyword, author name, ISSN).
3. Using an ISSN search with date limiters to send a larger search to the Scopus Search API, and paging through the results. 
4. Exploring JSON data that is returned from the API. 
5. Sending DOIs to the Abstract Search API to retrieve more robust metadata.

To begin, we need to import a few Python libraries to work with the APIs and the returned data.

In [44]:
import pandas as pd # allows us to work with tabular data
import requests # to send the API requests to Elsevier
import json # to read the JSON data that is returned by the APIs
import pickle # pickle files are a good way to save data for reuse in Python
from datetime import datetime # we'll use datetime to interpret the API response for when our API limit resets

### Elsevier APIs
To use these APIs you'll need to register for an API key via the [Elsevier Developer site](https://dev.elsevier.com/). A note on access from the Elsevier site:

>Anyone can request an API Key to use Elsevier APIs. Access at no charge is available to researchers in academic, public-sector and not-for-profit institutions. Free access is only available for non-commercial use and provided Elsevier's policies for using APIs and the data are honoured.

>Full API access is only granted to researchers affiliated to organisations that have subscriptions to the corresponding Elsevier product.

If you are affiliated with an institution with Elsevier access, make sure you send your API request from an on-campus IP address or use a VPN. Elsevier doesn't offer a way to sign in with campus credentials, so you'll need to run this from within an approved IP range for access to non-open-access content.

Once you have an API key, you can save it to a variable below. Make sure you don't save your API key anywhere publicly (such as on GitHub). 

In [2]:
api_key = '6101173765c12924236cea2dddf74014'

### Scopus Search API: Constructing a search query
To add search parameters to our API search we'll start with a few fields. Ignore any of the variables that you don't want to use by just leaving the value unchanged - ```0``` or an empty string (```''```). To use any of the fields in your search, add a value.

- See [Elsevier's documentation](https://dev.elsevier.com/sc_search_tips.html) to find field names to add more search paramters and to format your searches below.
- You can also [test Scopus queries](https://dev.elsevier.com/scopus.html#!/Scopus_Search/ScopusSearch) using their interactive API. This is a great way to see how search string parameters are added to the URLs sent to the API.

In [25]:
'''search parameters: add values to the dictionary key:value pairs below
leave any fields that you don't want to use as either 0 or '''

search_string_parameters = {
    'ISSN' : '', # Limit results to results from a journal ISSN (add as a string). Leave = '' if no value.
    'AUTHOR-NAME' : '', # Search the author name field. lastname, firstname. e.g., Noble, Safiya
    'KEY' : '', # Add a keyword to search in the text of the article
    'PUBLISHER' : '', # Add a publisher name. e.g., Springer
    'EXACTSRCTITLE' : 'libraries', # Add keywords that appear in the journal, book, or conference title. e.g., Informatics
    'TITLE' : '', # Add keywords that appear in the article or chapter title. Can use AND, OR, and AND NOT. e.g. cat AND dog 
    'start_year' : 2010, # Limit your search to items published after this year. Leave = 0 if no value. YYYY , e.g., 1995
    'end_year' : 0 # Limit your search to items published before this year. Leave = 0 if no value. YYYY, e.g., 2020
}

The cell below constructs an encoded search string based on the values entered above. It checks if each of the search string parameter keys (e.g., ISSN) has a value assigned to it. If there's a value it adds it to the search_string variable using the proper syntax. If none of the keys in search_string_parameters have values search_string will be empty. 

In [26]:
def search_builder(search_string_parameters):
    ''' Concatenates a search string query formatted for the Scopus Search API.
    - search_string_parameters expects a python dictionary with keys aligned to API search fields.
    '''
    search_string = ''

    for k,v in search_string_parameters.items():
        if k == 'start_year' and v:
            if search_string != '':
                search_string += f' AND '
            search_string += f'PUBYEAR > {v}'
        elif k == 'end_year' and v:
            if search_string != '':
                search_string += f' AND '
            search_string += f'PUBYEAR < {v}'
        elif v:
            if search_string != '':
                search_string += f' AND '
            search_string += f'{k}({v})'

    return search_string

In [27]:
search_string = search_builder(search_string_parameters)

In [28]:
print(search_string)

EXACTSRCTITLE(libraries) AND PUBYEAR > 2010


Now we'll create two functions that we can use to interact with the Scopus Search API. The first - ```create_url``` - uses the ```search_string``` and ```api_key``` variables we defined above to format a URL API call. The second - ```connect_to_endpoint``` - sends the request to Elsevier, and introduces a ```next_``` parameter that we'll use to page through results when there are more than 25 results for our search. 

In [34]:
def create_url(search_string):
    '''Accepts a formatted search string that will be added to the Scopus Search API URL. 
    Requires a global api_key variable.
    Formats and returns a URL to send to the Scopus Search API.'''
    
    query = f'{search_string}'
    url_template = 'https://api.elsevier.com/content/search/scopus?query={query}&apiKey={api_key}'
    full_url = url_template.format(query=query, api_key=api_key)
    return full_url

def connect_to_endpoint(full_url, params={'cursor': '*'}, next_ = '*'):
    '''Accepts API URL with ISSN, default parameters, and next page cursor;
    Sends request to Scopus API and collects JSON results for each call;
    Returns r.json() for the ['search-results'] key.'''
    
    params['cursor'] = next_
    r = requests.get(full_url, params=params)
    r.raise_for_status()
    return r.json()['search-results'], r.headers

#### Example Search
Once you've assigned your API key and at least one search field above, and run all of the code preceding, you can send a search to the API. 

In [35]:
full_url = create_url(search_string)
r_json, r_headers = connect_to_endpoint(full_url)

Our function returns the JSON response from the API call (```r_json```) along with the headers from the response (```r_headers```). The latter has some useful information about our API key limits. We're probably most interested in the ```X-RateLimit-Limit``` (how many calls we can make per week to the API), the ```X-RateLimit-Remaining``` (how many we have left in the week), and the ```X-RateLimit-Reset``` (when the week counter resets).


In [45]:
print('Limit:', r_headers['X-RateLimit-Limit'], 
      '\nRemaining:', r_headers['X-RateLimit-Remaining'], 
      '\nResets on:', datetime.fromtimestamp(int(r_headers['X-RateLimit-Reset'])))

Limit: 20000 
Remaining: 19964 
Resets on: 2023-05-10 00:20:57


Since we didn't have either of the functions print any outputs, the call should be successful as long as we don't see any errors pop up. We can check by looking at the r_json object that we collected. The keys of the dictionary will show us what kind of data is available.

In [9]:
r_json.keys()

dict_keys(['opensearch:totalResults', 'opensearch:itemsPerPage', 'opensearch:Query', 'cursor', 'link', 'entry'])

Let's first see how many search results there were, and how many of those were returned by our query.

In [10]:
print("Total results:", r_json['opensearch:totalResults'], 
      "\nResults collected:", r_json['opensearch:itemsPerPage'])

Total results: 8851 
Results collected: 25


It looks like we're only getting 25 results per page. We will page through more results in the next section but let's look at some of the data we got back from the query first. We can find that in the ```entry``` key. Since we know there are 25 results, let's just look at the first one to begin:

In [11]:
r_json['entry'][0]

{'@_fa': 'true',
 'link': [{'@_fa': 'true',
   '@ref': 'self',
   '@href': 'https://api.elsevier.com/content/abstract/scopus_id/85152294109'},
  {'@_fa': 'true',
   '@ref': 'author-affiliation',
   '@href': 'https://api.elsevier.com/content/abstract/scopus_id/85152294109?field=author,affiliation'},
  {'@_fa': 'true',
   '@ref': 'scopus',
   '@href': 'https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85152294109&origin=inward'},
  {'@_fa': 'true',
   '@ref': 'scopus-citedby',
   '@href': 'https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85152294109&origin=inward'}],
 'prism:url': 'https://api.elsevier.com/content/abstract/scopus_id/85152294109',
 'dc:identifier': 'SCOPUS_ID:85152294109',
 'eid': '2-s2.0-85152294109',
 'dc:title': 'Academic Map Library Weeding – Thoughts and Guidelines Developed from Two Experiences',
 'dc:creator': 'Chandler M.',
 'prism:publicationName': 'Association of Canadian Map Libraries and Archives Bulletin',
 'prism:issn': '0840933

To get a sense of the fields that we have access to we can also just look at the keys related to each entry:

In [12]:
r_json['entry'][0].keys()

dict_keys(['@_fa', 'link', 'prism:url', 'dc:identifier', 'eid', 'dc:title', 'dc:creator', 'prism:publicationName', 'prism:issn', 'prism:eIssn', 'prism:issueIdentifier', 'prism:pageRange', 'prism:coverDate', 'prism:coverDisplayDate', 'prism:doi', 'citedby-count', 'affiliation', 'prism:aggregationType', 'subtype', 'subtypeDescription', 'source-id', 'openaccess', 'openaccessFlag', 'freetoread', 'freetoreadLabel'])

And we can print specific fields by referencing those keys:

In [13]:
print('Title:', r_json['entry'][0]['dc:title'], 
      '\nCreator:', r_json['entry'][0]['dc:creator'], 
      '\nPublication:', r_json['entry'][0]['prism:publicationName'],
      '\nDate:', r_json['entry'][0]['prism:coverDate'],
      '\nISSN:', r_json['entry'][0]['prism:issn'], 
      '\nDOI:', r_json['entry'][0]['prism:doi'])

Title: Academic Map Library Weeding – Thoughts and Guidelines Developed from Two Experiences 
Creator: Chandler M. 
Publication: Association of Canadian Map Libraries and Archives Bulletin 
Date: 2023-12-01 
ISSN: 08409331 
DOI: 10.15353/ACMLA.N171.5291


A better way to view and work with this data is to add it to a dataframe so we can see all of the articles as rows, with columns for each field. 

In [14]:
df = pd.DataFrame(r_json['entry'])

# check the first three rows
df.head(3)

Unnamed: 0,@_fa,link,prism:url,dc:identifier,eid,dc:title,dc:creator,prism:publicationName,prism:issn,prism:eIssn,...,prism:aggregationType,subtype,subtypeDescription,source-id,openaccess,openaccessFlag,freetoread,freetoreadLabel,prism:volume,pubmed-id
0,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85152294109,2-s2.0-85152294109,Academic Map Library Weeding – Thoughts and Gu...,Chandler M.,Association of Canadian Map Libraries and Arch...,8409331,25612263,...,Journal,ar,Article,27508,1,True,"{'value': [{'$': 'all'}, {'$': 'publisherhybri...","{'value': [{'$': 'All Open Access'}, {'$': 'Hy...",,
1,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85152292606,2-s2.0-85152292606,Learning from a Distance: Assessing the impact...,Mondésir G.,Association of Canadian Map Libraries and Arch...,8409331,25612263,...,Journal,ar,Article,27508,1,True,"{'value': [{'$': 'all'}, {'$': 'publisherhybri...","{'value': [{'$': 'All Open Access'}, {'$': 'Hy...",,
2,True,"[{'@_fa': 'true', '@ref': 'self', '@href': 'ht...",https://api.elsevier.com/content/abstract/scop...,SCOPUS_ID:85152267964,2-s2.0-85152267964,Geographic digital divide - urban/rural issues...,Chandler M.,Association of Canadian Map Libraries and Arch...,8409331,25612263,...,Journal,ar,Article,27508,1,True,"{'value': [{'$': 'all'}, {'$': 'publisherhybri...","{'value': [{'$': 'All Open Access'}, {'$': 'Hy...",,


There are some columns that are left out of the display above (see the ellipsis in the center of the dataframe). Let's take a look at the full column list. 

In [15]:
df.columns

Index(['@_fa', 'link', 'prism:url', 'dc:identifier', 'eid', 'dc:title',
       'dc:creator', 'prism:publicationName', 'prism:issn', 'prism:eIssn',
       'prism:issueIdentifier', 'prism:pageRange', 'prism:coverDate',
       'prism:coverDisplayDate', 'prism:doi', 'citedby-count', 'affiliation',
       'prism:aggregationType', 'subtype', 'subtypeDescription', 'source-id',
       'openaccess', 'openaccessFlag', 'freetoread', 'freetoreadLabel',
       'prism:volume', 'pubmed-id'],
      dtype='object')

We can view a subset of the dataframe to make it easier to scan columns of interest.

In [16]:
df[['dc:title', 'prism:publicationName','prism:coverDate', 'dc:creator']]

Unnamed: 0,dc:title,prism:publicationName,prism:coverDate,dc:creator
0,Academic Map Library Weeding – Thoughts and Gu...,Association of Canadian Map Libraries and Arch...,2023-12-01,Chandler M.
1,Learning from a Distance: Assessing the impact...,Association of Canadian Map Libraries and Arch...,2023-12-01,Mondésir G.
2,Geographic digital divide - urban/rural issues...,Association of Canadian Map Libraries and Arch...,2023-12-01,Chandler M.
3,Bulletin Report – GIS Days 2022 The culminatio...,Association of Canadian Map Libraries and Arch...,2023-12-01,Berish F.
4,A change of art Learning research strategies i...,College and Research Libraries News,2023-04-01,Sheets L.A.
5,Status in academic libraries Seeking solidarit...,College and Research Libraries News,2023-04-01,Bignoli C.
6,Calligraphy art without boundaries Reviving hi...,College and Research Libraries News,2023-04-01,Ching S.
7,What’s missing? The role of community colleges...,College and Research Libraries News,2023-04-01,Wacha M.
8,Charged up What I learned on a day without power,College and Research Libraries News,2023-04-01,Deuink A.
9,2021 ACRL Academic Library Trends and Statisti...,College and Research Libraries News,2023-04-01,Taylor L.R.


### Query by journal ISSN
Here's an example where we page through more than 25 search results, by asking for all of the articles from a specific ISSN within a date range. Let's re-assign our search parameters:

In [48]:
search_string_parameters = {
    'ISSN' : '23301643', # the ISSN for Journal of the Association for Information Science and Technology (JASIST)
    'AUTHOR-NAME' : '', 
    'KEY' : '', 
    'PUBLISHER' : '', 
    'EXACTSRCTITLE' : '', 
    'TITLE' : '', 
    'start_year' : 2013, 
    'end_year' : 2018 
}
search_string = search_builder(search_string_parameters)
print(search_string)

ISSN(23301643) AND PUBYEAR > 2013 AND PUBYEAR < 2018


This time we want to send in our API call repeatedly, so every page of search results (25 at a time). We can use a while statement to continue to call the API and collect results until the ```r_json['cursor'][@next]``` value is equal to the ```r_json['cursor'][@current]``` value (meaning there are no more new results reflected in the @next token).

Before we send each request, we also want to make sure we're following [Elsevier's throttling rates](https://dev.elsevier.com/api_key_settings.html) so that we're not running into our weekly limit or sending in more calls per second than are allowed. The default settings for the Scopus Search API are 20,000 results per week, and 9 requests per second. We can import and use the time.sleep() method to pause our requests by 0.12 seconds each iteration of the while loop.

In [49]:
import time

In [60]:
# create default values for variable to track during while statement
next_ = '*'
flag = True

# when there are no more results we'll set the flag to false, stopping the while statement
while flag:
    # pause for .12 seconds
    time.sleep(0.12)
    
    #create url and send API call
    full_url = create_url(search_string)
    r_json, r_headers = connect_to_endpoint(full_url, next_ = next_)

    # track number of results
    total_results = int(r_json['opensearch:totalResults'])
    
    # if on first page of results save to new dataframe
    if r_json['cursor']['@current'] == '*':
        print('Collecting', total_results, 'results.')
        print('Limit:', r_headers['X-RateLimit-Limit'], 
          '\nRemaining:', r_headers['X-RateLimit-Remaining'], 
          '\nResets on:', datetime.fromtimestamp(int(r_headers['X-RateLimit-Reset'])))
        df = pd.DataFrame(r_json['entry'])
        
        # if there are more results available than are remaining in your weekly limit, stop the while loop
        if total_results > int(r_headers['X-RateLimit-Remaining']):
            print("\n** Too many results to collect this week - stopping loop. **")
            break 
    
    # if we're on the last page of results, change flag to False and end While statement
    elif r_json['cursor']['@next'] == r_json['cursor']['@current']:
        print('Loop done. Collected', len(df), 'rows.')
        flag = False
    
    # otherwise add result to existing df and continue
    else:
        df_add = pd.DataFrame(r_json['entry'])
        df = pd.concat([df, df_add])
    
    # update the next_ variable for the next iteration through the while statement
    next_ = r_json['cursor']['@next']

Collecting 802 results.
Limit: 20000 
Remaining: 19926 
Resets on: 2023-05-10 00:20:57
Loop done. Collected 802 rows.


We can take a look at the first few rows of the dataframe to make sure things look ok:

In [61]:
len(df)

802

We can also call a subset of the dataframe columns to look at interesting metadata, and sort the results by the articles that are the most highly cited (using the ```citedby-count``` field).

In [62]:
df[['dc:title', 'prism:coverDate', 'dc:creator', 'citedby-count']].sort_values(by='citedby-count', ascending=False)[0:10]

Unnamed: 0,dc:title,prism:coverDate,dc:creator,citedby-count
23,The knowledge base and research front of infor...,2014-05-01,Zhao D.,98
24,Patent overlay mapping: Visualizing technologi...,2014-12-01,Kay L.,98
16,Multidimensional assessment of scholarly resea...,2015-10-01,Moed H.F.,95
22,F1000 recommendations as a potential new data ...,2014-03-01,Waltman L.,94
2,Comparing grounded theory and topic modeling: ...,2017-06-01,Baumer E.P.S.,94
3,"Open-access repositories worldwide, 2005-2012:...",2014-12-01,Pinfield S.,92
24,When are readership counts as useful as citati...,2016-01-01,Maflahi N.,92
10,Map of science with topic modeling: Comparison...,2016-10-01,Suominen A.,91
10,User engagement in online News: Under the scop...,2014-10-01,Arapakis I.,90
18,Can Mendeley bookmarks reflect readership? A s...,2016-05-01,Mohammadi E.,90


### Save the results
After collecting data it's a good idea to save it to a pickle file which can be read into Python later on.

In [63]:
with open('api_results.pickle', 'wb') as handle:
    pickle.dump(df, handle, protocol=pickle.HIGHEST_PROTOCOL)

And here's how you can reassign the pickle file to a python variable to use in a different notebook or in a future session (after you stop this kernel). 

In [64]:
with open('api_results.pickle', 'rb') as handle:
    articles_df = pickle.load(handle)

# check to make sure the pickle file is the exact same as the original dataframe
print('The dataframes are equal:', df.equals(articles_df))

The dataframes are equal: True


### Abstract Retrieval API: DOI search

Let's modify our create_url and connect_to_endpoint functions to work specifically with DOIs in the Abstract Retrieval API.

Since the only metadata we need to send to the Abstract Retrieval API is an article DOI, we'll change the create_url function to work with DOIs. We'll also modify connect_to_endpoint to add a header that asks for the data back in JSON format, and we'll remove the need to page through results (each DOI query should only find one match).

In [65]:
def create_url(doi):
    """Accepts a DOI as a string that will be added to the Abstract Retrieval API URL. 
    Requires a global api_key variable.
    Formats and returns a URL to send to the Abstract Retrieval API.
    """
    doi = f'{doi}'
    url_template = 'https://api.elsevier.com/content/abstract/doi/{doi}?&apiKey={api_key}'
    full_url = url_template.format(doi=doi, api_key=api_key)
    return full_url

def connect_to_endpoint(full_url):
    '''Input full_url from create_url function;
    Send request to Scopus Abstract Retrieval API
    Returns r.json response;'''
    
    r = requests.get(full_url, headers =  {'Accept': 'application/json'})
    r.raise_for_status()
    return r.json(), r.headers

You can load your own list of DOIs to work with here. This example below uses a random sample of 25 DOIs from LIS journals over the last 20 years. First we'll load the DOIs from a CSV file.

In [70]:
# random sample of 25 DOIs from LIS journals
dois = pd.read_csv('doi_sample.csv')
dois.head()

Unnamed: 0,doi
0,10.1109/TIT.2008.928267
1,10.1109/TIT.2004.838092
2,10.1109/TIT.2015.2504967
3,10.1108/GKMC-09-2021-0151
4,10.1080/14778238.2020.1860663


In [75]:
dois.loc[0][0]

'10.1109/TIT.2008.928267'

We can test the code using a single DOI from the list. First we'll build the URL:

In [84]:
full_url = create_url(dois.loc[0][0])
print(full_url)

https://api.elsevier.com/content/abstract/doi/10.1109/TIT.2008.928267?&apiKey=6101173765c12924236cea2dddf74014


Then we can make the API request. You might notice that your API rate limit is different for the Abstract Retrieval API, and that the number of requests you can make for this API doesn't count against the requests you made for the Scopus Search API:

In [129]:
r_json, r_headers = connect_to_endpoint(full_url)
print('Limit:', r_headers['X-RateLimit-Limit'], 
      '\nRemaining:', r_headers['X-RateLimit-Remaining'], 
      '\nResets on:', datetime.fromtimestamp(int(r_headers['X-RateLimit-Reset'])))

Limit: 60000 
Remaining: 59994 
Resets on: 2023-05-10 02:00:26


Let's take a closer look at the r_json object by listing the dictionary keys:

In [130]:
r_json.keys()

dict_keys(['abstracts-retrieval-response'])

We can look at the record in a dataframe, though as we look at the results many of the cells contain chains of other key:value pairs that are a little difficult to read. 

In [131]:
r_json = r_json['abstracts-retrieval-response']
abstract_df = pd.json_normalize(r_json)
abstract_df

Unnamed: 0,affiliation,item.ait:process-info.ait:status.@state,item.ait:process-info.ait:status.@type,item.ait:process-info.ait:status.@stage,item.ait:process-info.ait:date-delivered.@day,item.ait:process-info.ait:date-delivered.@timestamp,item.ait:process-info.ait:date-delivered.@year,item.ait:process-info.ait:date-delivered.@month,item.ait:process-info.ait:date-sort.@day,item.ait:process-info.ait:date-sort.@year,...,coredata.openaccessFlag,coredata.prism:doi,coredata.prism:issn,coredata.prism:startingPage,coredata.dc:identifier,idxterms.mainterm,language.@xml:lang,authkeywords.author-keyword,subject-areas.subject-area,authors.author
0,"[{'affiliation-city': 'College Park', '@id': '...",update,core,S300,12,2022-04-12T17:27:35.000035-04:00,2022,4,15,2008,...,,10.1109/TIT.2008.928267,189448,4372,SCOPUS_ID:51349103946,"[{'$': 'Capacity region', '@weight': 'a', '@ca...",eng,"[{'@_fa': 'true', '$': 'Capacity region'}, {'@...","[{'@_fa': 'true', '$': 'Information Systems', ...","[{'ce:given-name': 'Nan', 'preferred-name': {'..."


Let's print out the cell values from each column to get a better sense of all of the data available in the JSON response.

In [134]:
for col in abstract_df.columns:
    print(col, '\n', abstract_df.loc[0, col], '\n')

affiliation 
 [{'affiliation-city': 'College Park', '@id': '60078684', 'affilname': 'A. James Clark School of Engineering', '@href': 'https://api.elsevier.com/content/affiliation/affiliation_id/60078684', 'affiliation-country': 'United States'}, {'affiliation-city': 'Palo Alto', '@id': '60012708', 'affilname': 'Stanford University', '@href': 'https://api.elsevier.com/content/affiliation/affiliation_id/60012708', 'affiliation-country': 'United States'}] 

item.ait:process-info.ait:status.@state 
 update 

item.ait:process-info.ait:status.@type 
 core 

item.ait:process-info.ait:status.@stage 
 S300 

item.ait:process-info.ait:date-delivered.@day 
 12 

item.ait:process-info.ait:date-delivered.@timestamp 
 2022-04-12T17:27:35.000035-04:00 

item.ait:process-info.ait:date-delivered.@year 
 2022 

item.ait:process-info.ait:date-delivered.@month 
 04 

item.ait:process-info.ait:date-sort.@day 
 15 

item.ait:process-info.ait:date-sort.@year 
 2008 

item.ait:process-info.ait:date-sort.@mont

We can save a subset of the columns to a new dataframe to make the data a little easier to work with:

In [139]:
abstract_df_min = abstract_df[['affiliation',
       'item.bibrecord.head.source.publicationdate.year',
       'item.bibrecord.tail.bibliography.@refcount',
       'coredata.prism:issueIdentifier', 
       'coredata.dc:description', 'coredata.prism:coverDate',
       'coredata.prism:aggregationType', 'coredata.prism:url',
       'coredata.subtypeDescription',
       'coredata.prism:publicationName', 
       'coredata.citedby-count', 'coredata.prism:volume', 
       'coredata.prism:pageRange', 'coredata.dc:title',
       'coredata.openaccessFlag', 'coredata.prism:doi', 'coredata.prism:issn',
       'authors.author']]

In [140]:
abstract_df_min

Unnamed: 0,affiliation,item.bibrecord.head.source.publicationdate.year,item.bibrecord.tail.bibliography.@refcount,coredata.prism:issueIdentifier,coredata.dc:description,coredata.prism:coverDate,coredata.prism:aggregationType,coredata.prism:url,coredata.subtypeDescription,coredata.prism:publicationName,coredata.citedby-count,coredata.prism:volume,coredata.prism:pageRange,coredata.dc:title,coredata.openaccessFlag,coredata.prism:doi,coredata.prism:issn,authors.author
0,"[{'affiliation-city': 'College Park', '@id': '...",2008,13,9,We provide a single-letter characterization fo...,2008-09-15,Journal,https://api.elsevier.com/content/abstract/scop...,Article,IEEE Transactions on Information Theory,24,54,4372-4378,The capacity region of a class of discrete deg...,,10.1109/TIT.2008.928267,189448,"[{'ce:given-name': 'Nan', 'preferred-name': {'..."


In [111]:
# the abstract is available in a few different places
print(r_json['item']['bibrecord']['head']['abstracts'])
print(r_json['coredata']['dc:description'])

We provide a single-letter characterization for the capacity region of a class of discrete degraded interference channels (DDICs). The class of DDICs considered includes the DADIC studied by Benzel in 1979. We show that for the class of DDICs studied, encoder cooperation does not enlarge the capacity region, and therefore, the capacity region of the class of DDICs is the same as the capacity region of the corresponding degraded broadcast channel. © 2008 IEEE.
We provide a single-letter characterization for the capacity region of a class of discrete degraded interference channels (DDICs). The class of DDICs considered includes the DADIC studied by Benzel in 1979. We show that for the class of DDICs studied, encoder cooperation does not enlarge the capacity region, and therefore, the capacity region of the class of DDICs is the same as the capacity region of the corresponding degraded broadcast channel. © 2008 IEEE.


In [107]:
r_json['authors']

{'author': [{'ce:given-name': 'Nan',
   'preferred-name': {'ce:given-name': 'Nan',
    'ce:initials': 'N.',
    'ce:surname': 'Liu',
    'ce:indexed-name': 'Liu N.'},
   '@seq': '1',
   'ce:initials': 'N.',
   '@_fa': 'true',
   'affiliation': [{'@id': '60078684',
     '@href': 'https://api.elsevier.com/content/affiliation/affiliation_id/60078684'},
    {'@id': '60012708',
     '@href': 'https://api.elsevier.com/content/affiliation/affiliation_id/60012708'}],
   'ce:surname': 'Liu',
   '@auid': '36195904400',
   'author-url': 'https://api.elsevier.com/content/author/author_id/36195904400',
   'ce:indexed-name': 'Liu N.'},
  {'ce:given-name': 'Sennur',
   'preferred-name': {'ce:given-name': 'Sennur',
    'ce:initials': 'S.',
    'ce:surname': 'Ulukus',
    'ce:indexed-name': 'Ulukus S.'},
   '@seq': '2',
   'ce:initials': 'S.',
   '@_fa': 'true',
   'affiliation': {'@id': '60078684',
    '@href': 'https://api.elsevier.com/content/affiliation/affiliation_id/60078684'},
   'ce:surname': '