**Accessing and Using OpenAlex**
     
    


While it is more convenient to download the entire snapshot to your computer, querying the dataset requires a lot of computational power and disk space (about 300 GB). When computational resources are limited, accessing OpenAlex through the official API could be a better choice.

In [4]:
import requests
import json

The API (Application Programming Interface) is a convinient way to get OpenAlex data. To use an API, we make calls to request data from the computer hosting the OpenAlex data. In order to do that, one way is to use the "requests" package imported above.

To get a single entity from OpenAlex, we need to construct an URL as:
```
https://api.openalex.org/<entity_name>/<entity_id>
```    
    
To give an example, let's retrieve a research work using the API:

By making an API request, we will receive a JSON response. We then parse it and save it as an Python dictionary object

In [7]:
work_W2320228714 = requests.get(
    'https://api.openalex.org/works/W2320228714'
).json()

In [8]:
work_W2320228714

{'id': 'https://openalex.org/W2320228714',
 'doi': 'https://doi.org/10.18632/oncotarget.8338',
 'title': 'Upregulation of long intergenic noncoding RNA 00673 promotes tumor proliferation via LSD1 interaction and repression of NCALD in non-small-cell lung cancer',
 'display_name': 'Upregulation of long intergenic noncoding RNA 00673 promotes tumor proliferation via LSD1 interaction and repression of NCALD in non-small-cell lung cancer',
 'publication_year': 2016,
 'publication_date': '2016-03-24',
 'ids': {'openalex': 'https://openalex.org/W2320228714',
  'doi': 'https://doi.org/10.18632/oncotarget.8338',
  'mag': '2320228714',
  'pmid': 'https://pubmed.ncbi.nlm.nih.gov/27027352',
  'pmcid': 'https://www.ncbi.nlm.nih.gov/pmc/articles/5041926'},
 'host_venue': {'id': 'https://openalex.org/V126644158',
  'issn_l': '1949-2553',
  'issn': ['1949-2553'],
  'display_name': 'Oncotarget',
  'publisher': 'Impact Journals, LLC',
  'type': 'publisher',
  'url': 'https://doi.org/10.18632/oncotarget

In [9]:
print('OpenAlex ID: ' + work_W2320228714['id'] + '\n')
print('Title: ' + work_W2320228714['title'] + '\n')
print('Publication Year: ' + str(work_W2320228714['publication_year']) + '\n')
print('First Author： ' + [a['author']['display_name'] for a in work_W2320228714['authorships'] if (a['author_position'] == 'first')][0] + '\n')
print('Cited by: ' + str(work_W2320228714['cited_by_count']))

OpenAlex ID: https://openalex.org/W2320228714

Title: Upregulation of long intergenic noncoding RNA 00673 promotes tumor proliferation via LSD1 interaction and repression of NCALD in non-small-cell lung cancer

Publication Year: 2016

First Author： Xuefei Shi

Cited by: 53


We could do the same with other entities. For example, we can retrieve a venue, but this time, let's try getting a random venue:

In [10]:
random_venue = requests.get(
    'https://api.openalex.org/venues/random'
).json()

In [11]:
random_venue

{'id': 'https://openalex.org/V4210211405',
 'issn_l': '0794-067X',
 'issn': ['0794-067X'],
 'display_name': 'Nigeria journal of business administration',
 'publisher': 'African Journals Online',
 'works_count': 15,
 'cited_by_count': 13,
 'is_oa': False,
 'is_in_doaj': False,
 'homepage_url': None,
 'ids': {'openalex': 'https://openalex.org/V4210211405',
  'issn_l': '0794-067X',
  'issn': ['0794-067X']},
 'x_concepts': [{'id': 'https://openalex.org/C144133560',
   'wikidata': 'https://www.wikidata.org/wiki/Q4830453',
   'display_name': 'Business',
   'level': 0,
   'score': 93.3},
  {'id': 'https://openalex.org/C162324750',
   'wikidata': 'https://www.wikidata.org/wiki/Q8134',
   'display_name': 'Economics',
   'level': 0,
   'score': 66.7},
  {'id': 'https://openalex.org/C205649164',
   'wikidata': 'https://www.wikidata.org/wiki/Q1071',
   'display_name': 'Geography',
   'level': 0,
   'score': 26.7},
  {'id': 'https://openalex.org/C127413603',
   'wikidata': 'https://www.wikidata.org

In many cases, instead of getting only one thing at a time, we want to get of list of things, In the code cell below, we ask for all concepts, instead of specifying a concept by its ID. The API returns a meta object with details about the query, along with a long list of concepts.

In [12]:
requests.get(
    'https://api.openalex.org/concepts'
).json()

{'meta': {'count': 65073,
  'db_response_time_ms': 15,
  'page': 1,
  'per_page': 25},
 'results': [{'id': 'https://openalex.org/C41008148',
   'wikidata': 'https://www.wikidata.org/wiki/Q21198',
   'display_name': 'Computer science',
   'level': 0,
   'description': 'theoretical study of the formal foundation enabling the automated processing or computation of information, for example on a computer or over a data transmission network',
   'works_count': 41229216,
   'cited_by_count': 202345450,
   'ids': {'openalex': 'https://openalex.org/C41008148',
    'wikidata': 'https://www.wikidata.org/wiki/Q21198',
    'mag': '41008148',
    'wikipedia': 'https://en.wikipedia.org/wiki/Computer%20science',
    'umls_cui': ['C0599726']},
   'image_url': 'https://upload.wikimedia.org/wikipedia/commons/6/6a/Sorting_quicksort_anim.gif',
   'image_thumbnail_url': 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Sorting_quicksort_anim.gif/100px-Sorting_quicksort_anim.gif',
   'international'

To get a meaningful list of entity objects, we need to add parameters to ```filter```, ```search``` and ```sort``` the returned result.

Filter parameters could be formatted like: ```filter=attribute:value,attribute2:value2```    
For instance, we want to get all the level-0 concepts:

In [13]:
level_zero_concepts = requests.get(
    'https://api.openalex.org/concepts?filter=level:0'
).json()

In [14]:
for c in level_zero_concepts['results']:
    print(c['display_name'])

Computer science
Medicine
Chemistry
Psychology
Biology
Political science
Materials science
Art
Business
Geography
Physics
Environmental science
Mathematics
Philosophy
Sociology
History
Geology
Engineering
Economics


Getting venues that published more than 1000 research works：

In [15]:
big_venues = requests.get(
    'https://api.openalex.org/venues?filter=works_count:>1000'
).json()

In [16]:
for v in big_venues['results']:
    print(v['display_name'])

Research Papers in Economics
ChemInform
Social Science Research Network
Lecture Notes in Computer Science
The Lancet
BMJ
Nature
Science
Social Science Research Network
Notes and Queries
PLOS ONE
JAMA
Bulletin of the American Physical Society
Reactions Weekly
Journal of the American Chemical Society
Journal of Biological Chemistry
Scientific American
Choice Reviews Online
Chemical & Engineering News
Journal of physics
Scientific Reports
Blood
Journal of the Acoustical Society of America
Proceedings of the National Academy of Sciences of the United States of America
bioRxiv


Institutions that are outside of the US:

In [17]:
non_us_institutions = requests.get(
    'https://api.openalex.org/institutions?filter=country_code:!us'
).json()

In [18]:
for i in non_us_institutions['results']:
    print(i['display_name'])

University of Tokyo
University of Toronto
University of Oxford
University of Cambridge
Tsinghua University
University of British Columbia
Kyoto University
University College London
Imperial College London
Tohoku University
Universidade de São Paulo
Zhejiang University
Osaka University
University of Liège
Shanghai Jiao Tong University
University of Alberta
McGill University
Sapienza University of Rome
National University of Singapore
University of Melbourne
Kyushu University
University of Manchester
University of Queensland
University of Sydney
National Autonomous University of Mexico


We can also search by fields:

In [19]:
authors_named_leibniz = requests.get('https://api.openalex.org/authors?filter=display_name.search:leibniz').json()

In [20]:
for a in authors_named_leibniz['results']:
    print(a['display_name'])

Gottfried Wilhelm Leibniz
Leibniz, Gottfried Wilhelm, Freiherr von
Leibniz Hang
Leibniz-Informationszentrum Wirtschaft
Freiherr von Leibniz
Leibniz, Gottfried Wilhelm, Freiherr von
Gottfried Wilhelm Leibniz
Gottfried Wilhelm Leibniz
Leibniz
Wilhelm Leibniz
Leibniz Universit
Gottfried Wilhelm Leibniz
Leibniz-Informationszentrum Wirtschaft
Otto Leibniz
Leibniz-Informationszentrum Wirtschaft
Gottfried Wilhelm Leibniz
Leibniz, Gottfried Wilhelm, Freiherr von
Gottfried Wilhelm Leibniz
Gottfried Wilhelm Leibniz
Leibniz
Gottfried Wilhelm Leibniz
Gottfried Wilhelm Leibniz
Leibniz, Gottfried Wilhelm, Freiherr von
Leibniz-Institut für Länderkunde Leipzig
Gottfried Wilhelm Leibniz


In [21]:
most_popular_after_2010 = requests.get(
    'https://api.openalex.org/works?sort=cited_by_count:desc&filter=publication_year:>2010'
).json()

In [22]:
for w in most_popular_after_2010['results']:
    print('Work Name: ' + w['display_name'])
    print('Cited Times: ' + str(w['cited_by_count']) + '\n')

Work Name: Deep Residual Learning for Image Recognition
Cited Times: 63980

Work Name: Diagnostic and Statistical Manual of Mental Disorders
Cited Times: 52577

Work Name: Diagnostic and Statistical Manual of Mental Disorders
Cited Times: 42813

Work Name: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
Cited Times: 42631

Work Name: R: A language and environment for statistical computing.
Cited Times: 42322

Work Name: Hallmarks of Cancer: The Next Generation
Cited Times: 42182

Work Name: Short-Term Effects of Nose-Only Cigarette Smoke Exposure on Glutathione Redox Homeostasis, Cytochrome P450 1A1/2 and Respiratory Enzyme Activities in Mice Tissues
Cited Times: 36154

Work Name: Deep learning
Cited Times: 35518

Work Name: Fitting Linear Mixed-Effects Models Using<b>lme4</b>
Cited Times: 34517

Work Name: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Cited Times: 34282

Work Name:

In some occasions, we want to group the returned results into facets. ```group_by``` parameter is available for such task.    
To give an example, we want to know the distribution of open access status in OpenAlex:

In [23]:
works_oa = requests.get('https://api.openalex.org/works?group_by=oa_status').json()

In [24]:
works_oa

{'meta': {'count': 6, 'db_response_time_ms': 645, 'page': 1, 'per_page': 200},
 'results': [],
 'group_by': [{'key': 'unknown',
   'key_display_name': 'unknown',
   'count': 109642813},
  {'key': 'closed', 'key_display_name': 'closed', 'count': 92427848},
  {'key': 'bronze', 'key_display_name': 'bronze', 'count': 13501332},
  {'key': 'gold', 'key_display_name': 'gold', 'count': 12923090},
  {'key': 'green', 'key_display_name': 'green', 'count': 8469943},
  {'key': 'hybrid', 'key_display_name': 'hybrid', 'count': 4328335}]}

We can also combine groups with filter. For example, we want to know the distribution of open access status for journal articles in OpenAlex:

In [25]:
journal_works_oa = requests.get('https://api.openalex.org/works?filter=type:journal-article&group_by=oa_status').json()

In [26]:
journal_works_oa

{'meta': {'count': 6, 'db_response_time_ms': 514, 'page': 1, 'per_page': 200},
 'results': [],
 'group_by': [{'key': 'closed',
   'key_display_name': 'closed',
   'count': 62256401},
  {'key': 'unknown', 'key_display_name': 'unknown', 'count': 24958836},
  {'key': 'gold', 'key_display_name': 'gold', 'count': 12399310},
  {'key': 'bronze', 'key_display_name': 'bronze', 'count': 11803095},
  {'key': 'green', 'key_display_name': 'green', 'count': 5949625},
  {'key': 'hybrid', 'key_display_name': 'hybrid', 'count': 3291267}]}