# Exploring the Altmetrics footprint of NCAR Data DOIs 

### Keith E. Maull<sup>1</sup> and Matt Mayernik<sup>2</sup>

##### February 1, 2017

#### NCAR Library, National Center for Atmospheric Research
1. kmaull@ucar.edu 
2. mayernik@ucar.edu

## Summary
NCAR has minted thousands of DOI for datasets over the last several years.  With increasing interest of Altmetrics (Twitter, blogs, news, etc.), we explore what the footprint (if any) the DOIs which have been minted with a simple question:

* What (if any) is the [Altmetric](https://www.altmetric.com/) profile of the NCAR Dataset DOIs?

This implementation note, walks through the basic extraction of all of the NCAR DOIs (prefix `10.5065`). 

**EXECUTION REQUIREMENTS**
* active network connection
* Python 2.7.x
* Python [Requests](https://duckduckgo.com/?q=python+requests&t=opera&ia=about) library


## Results

The output files for this code are as follows (these files will not exist if you do not run the code):

| Filename | Description |
|----|----|
| [ncar_data_dois_01262017.txt](./ncar_data_dois_01262017.txt)| The output file with all of the NCAR dataset DOIs |
| [ncar_data_dois_altmetric_scores.txt](./ncar_data_dois_altmetric_scores.txt)  | The output file with the doi and altmetric score.  In this exploration, there are only 3. |

## Extracting all NCAR DOIs from `datacite.org`

Using the datasite endpoint at `https://search.datacite.org/api` we execute a simple query to build a list of all of the current NCAR datasets:

In [1]:
import requests
import json
import time

def make_ncar_data_doi_file(output_file="ncar_data_dois_01262017.txt"):
    HTTP_DATACITE_ENDPOINT = "https://search.datacite.org/api?q=*&fq=prefix:10.5065&wt=json&indent=true&fl=doi,title,description,publicationYear,contributor"

    with open(output_file, "w") as fo:
        start, status, rows = 0, 1, 500

        while status > 0:
            r = requests.get(HTTP_DATACITE_ENDPOINT+"&rows={}&start={}".format(rows, start*rows))

            if r.status_code == 200:
                payload = r.json() 
                dois = ["{}\n".format(o['doi']) for o in payload['response']['docs']]
                fo.writelines(dois)

                status = len(dois)
                start += 1
                print("{}/[{}]".format(start*rows, status))
            time.sleep(2)

## Using the `altmetric` API to get the score

It is a simple matter to get the [altmetric score](https://altmetric.com) from the JSON payload.  For example:

```python
    d = "10.5065/D62J68XR"
    r = requests.get("https://api.altmetric.com/v1/doi/{}".format(d))
    if r.status_code == 200:
        payload = r.json()
        print( payload['score'] )
```

Next, we merely want to develop a method of going through the altmetric API and looking up the score (and storing it somewhere).  We will use the default output files as specified in the `generate_score_file()`.

In [2]:
import requests

def get_altmetric_score(d):
    r = requests.get("https://api.altmetric.com/v1/doi/{}".format(d))
    if r.status_code == 200:
        return r.json()['score']
    else:
        return None

def generate_score_file(in_file="ncar_data_dois_01262017.txt", 
                        out_file="ncar_data_dois_altmetric_scores.txt",
                        start=0,
                        count=100):    
    with open(in_file) as fi, open(out_file, "a") as fo:
        dois = [l.strip() for l in fi.readlines()][start:start+count]

        for d in dois:
            score = get_altmetric_score(d)
            if score:
                fo.write("{},{}".format(d, score))
                print "{}:<{}>".format(d, score)
            else:
                print "{}:--".format(d, score)
            time.sleep(2)

Now let's put it all together ...

In [None]:
make_ncar_data_doi_file()
generate_score_file(start=0, count=700)

## Sample results

The results of this investigation are not surprising and here are a few statistics:

* 4163 total dataset DOIs
* 3 DOIs found with altmetric footprint:

| DOI | Dataset |Score |
|:---:|:-----:|:-----:|
| [10.5065/D6WD3XH5](http://dx.doi.org/10.5065/D6WD3XH5)|NCAR Command Language|0.25|
| [10.5065/D6PZ56TX](http://dx.doi.org/10.5065/D6PZ56TX)|Bridging data lifecycles: Tracking data use via data citations workshop report|6.064|
| [10.5065/D62J68XR](http://dx.doi.org/10.5065/D62J68XR)|Cloud Properties from ISCCP and PATMOS-x Corrected for Spurious Variability Related to Changes in Satellite Orbits, Instrument Calibrations, and Other Factors|10|

Here is an example with the DOI with the largest altmetric score:

In [25]:
%%html
<script type='text/javascript' src='https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js'></script>
<div class='altmetric-embed' data-badge-type='medium-donut' data-badge-details='right' data-doi="10.5065/D6WD3XH5"></div>

In [26]:
%%html
<script type='text/javascript' src='https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js'></script>
<div class='altmetric-embed' data-badge-type='medium-donut' data-badge-details='right' data-doi="10.5065/D6PZ56TX"></div>

In [27]:
%%html
<script type='text/javascript' src='https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js'></script>
<div class='altmetric-embed' data-badge-type='medium-donut' data-badge-details='right' data-doi="10.5065/D62J68XR"></div>