# Datasets for a given institution

<div style='background:#e7edf7'>
   Query the OpenAlex API to answer the question:
    <blockquote>
        <b><i>How many of datasets exist from a given institution</i></b>
    </blockquote>
</div>
<br>

In [2]:
import pandas as pd
import numpy as np
import requests    # module for executing API calls

## 1. Get datasets from Environment and Climate Change Canada.
The first step in querying OpenAlex is always to build the URL to get exactly the data we need. We need to ask two things:
1. About which entity type (author, concept, institution, venue, work) do we want data?  
 --> Since we want to query for metadata about "_datasets_", the entity type should be `works`.

2. What are the criteria the works need to fulfill to fit our purpose?  
    --> We want to query for "_datasets from Environment and Climate Change Canada_", so we will filter for the works that:
    * were published in the last 10 years (=recent):  
    `from_publication_date:2012-07-20`
    * are specified as datasets:   
    `type:Dataset`,
    * have at least one [authorship](https://docs.openalex.org/about-the-data/work#authorships) affiliation with Environment and Climate Change Canada:  
   `institutions.ror:https://ror.org/026ny0e17`,
    * are not [paratext](https://docs.openalex.org/about-the-data/work#is_paratext):  
   `is_paratext:false`,

Now we need to put the URL together from these parts as follows:  
* Starting with the base URL "`https://api.openalex.org/`", we add the entity type to it : "`https://api.openalex.org/works`"
* All criteria need to go into the query parameter `filter` that is added after a question mark: "`https://api.openalex.org/works?filter=`"
* Finally we take the criteria we specified before and concatenate them using commas as separators. This will be our `filter` value: "`https://api.openalex.org/works?filter=institutions.ror:https://ror.org/025r5qe02,type:Dataset,from_publication_date:2012-07-20,is_paratext:false`"

With this URL we can get all recent datasets from Environment and Climate Change Canada.

In [3]:
# build the 'filter' parameter
filter_by_institution_id = 'institutions.ror:https://ror.org/026ny0e17'
filter_by_paratext = 'is_paratext:false'
filter_by_type = 'type:Dataset'
filter_by_publication_date = 'from_publication_date:2012-07-20'

all_filters = (filter_by_institution_id, filter_by_paratext, filter_by_type, filter_by_publication_date)
filter_param = f'filter={",".join(all_filters)}'
print(f'filter query parameter:\n  {filter_param}')

# put the URL together
filtered_works_url = f'https://api.openalex.org/works?{filter_param}'
print(f'complete URL:\n  {filtered_works_url} ...')

filter query parameter:
  filter=institutions.ror:https://ror.org/026ny0e17,is_paratext:false,type:Dataset,from_publication_date:2012-07-20
complete URL:
  https://api.openalex.org/works?filter=institutions.ror:https://ror.org/026ny0e17,is_paratext:false,type:Dataset,from_publication_date:2012-07-20 ...


In [4]:
api_response = requests.get(filtered_works_url)
api_response

<Response [200]>

In [5]:
parsed_response = api_response.json()
#parsed_response   # uncomment to see the full (long) data

In [6]:
type(parsed_response)

dict

In [17]:
# Title of the first returned dataset (remember Python starts at 0)
parsed_response['results'][0]['title']

'A framework for validating noninvasive genetic spatial capture-recapture studies for rare and elusive species'

In [18]:
# Pub year of first dataset
parsed_response['results'][0]['publication_year']

2020

In [8]:
# Host venue of the first dataset
parsed_response['results'][0]['host_venue']

{'id': None,
 'issn_l': None,
 'issn': None,
 'display_name': 'Authorea',
 'publisher': None,
 'type': 'publisher',
 'url': 'https://doi.org/10.22541/au.158636346.65395646',
 'is_oa': True,
 'version': 'publishedVersion',
 'license': 'cc-by'}

In [9]:
# Let's check if this work was cited
parsed_response['results'][0]['cited_by_count']

0

In [None]:
## Not sure this worked, when checking the URL DOI for the dataset returned, it brings to a publication. 