In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("../modules/orcid-python")
sys.path.append("../modules/pyalm")
import requests
import time
import orcid
import pyalm.pyalm as pyalm
import pyalm.utilities.plossearch as search

##Part A Collecting DOIs (X points)##

The first part this exercise will show collecting DOIs from a different source, a publisher API. Here we are using the PLOS Search API as an example because the PLOS Lagotto instance has the most information on article level metrics as discussed in the class.

We will first show an example of using the provided API wrapper and then you will use this to gather Article Level Metrics information on some authors from Caltech.

In [2]:
# Initiate and populate a query object
query = search.Request('author_affiliate:"California Institute of Technology"')

# Initiate the actual API call and get some results
response = query.get()
response

{u'response': {u'docs': [{u'doi': [u'10.1371/journal.pbio.1001153']},
   {u'doi': [u'10.1371/journal.pone.0026543']},
   {u'doi': [u'10.1371/journal.pone.0029172']},
   {u'doi': [u'10.1371/journal.pone.0046473']},
   {u'doi': [u'10.1371/journal.pbio.1000444']},
   {u'doi': [u'10.1371/journal.ppat.1001225']},
   {u'doi': [u'10.1371/journal.pone.0012353']},
   {u'doi': [u'10.1371/journal.pone.0007757']},
   {u'doi': [u'10.1371/journal.pone.0022201']},
   {u'doi': [u'10.1371/journal.pone.0008793']},
   {u'doi': [u'10.1371/journal.pgen.0020117']},
   {u'doi': [u'10.1371/journal.pone.0035934']},
   {u'doi': [u'10.1371/journal.pcbi.1000349']},
   {u'doi': [u'10.1371/journal.pone.0000787']},
   {u'doi': [u'10.1371/journal.pone.0000749']},
   {u'doi': [u'10.1371/journal.pone.0133682']},
   {u'doi': [u'10.1371/journal.pbio.0040112']},
   {u'doi': [u'10.1371/journal.pone.0021074']},
   {u'doi': [u'10.1371/journal.pone.0015429']},
   {u'doi': [u'10.1371/journal.pone.0045301']},
   {u'doi': [u'10.

This gives 220 DOI's found at PLOS which match the affiliation term "California Institute of Technology". You might want to change the search term to see if there are other articles, perhaps listed under Caltech or other variations of the name.

This search matches the terms that you will find in the Advanced Search functionality on the PLOS website: http://www.plosone.org/search/advanced?noSearchFlag so you can use that search form to construct a more advanced search and then use it with the function above. For instance a more complex search for Caltech might look like this:

In [14]:
# Initiate and populate a query object
query = search.Request("""
    author_affiliate:"California Institute of Technology"
    OR
    author_affiliate:"Caltech"
                       """)

# Initiate the actual API call and get some results
caltech = query.get()
len(caltech['response']['docs'])

224

<div class="alert alert-success">
Construct a search that looks for papers from Martin Karplus, Robert Grubbs or Eric Betzig. You should retrieve two articles.
</div>

In [4]:
# Initiate and populate a query object
query = search.Request("""
    author:"Eric Betzig"
    OR
    author:"Robert Grubbs"
    OR
    author:"Martin Karplus"
                       """)

# Initiate the actual API call and get some results
response = query.get()
len(response['response']['docs'])

2

In [9]:
assert len(response['response']['docs']) == 2

##Part D - Collecting and analysin ALM Data (

<div class="alert alert-success">
Based on the example notebooks obtain Article Level Metrics data on these two articles from the PLOS ALM API. Note that the ALM API wrapper can also accept a list of DOIs as well as a single DOI. You will need to construct a list of the two DOIs to pass to the function. Obtain the number of EuropePubmedCentral citations for all the articles.
</div>

In [10]:
# Need to configure the API URL as per the notebook example
pyalm.config.APIS = { 'plos' : {'url': 'http://alm.plos.org/api/v5/articles'},
                      'det'  : {'url' : 'http://det.labs.crossref.org/api/v5/articles'}
                    }

In [11]:
# Create a list of DOIs from the response above. You could either create a new list or use a list comprehension
dois = [doc.get('doi')[0] for doc in response.get('response').get('docs')]
plos_alm = pyalm.get_alm(dois, info='detail', instance='plos')

In [8]:
# Get the title and number of EuPMC citations for each article. Create a list of tuples called cites as follows
# [('title1', citations), ('title2', citations)]
cites = []
for article in plos_alm['articles']:
    cites.append((article.title, article.sources['pmceurope'].metrics.total))
    
print cites

[(u'Self-Organization of the <i>Escherichia coli</i> Chemotaxis Network Imaged with Super-Resolution Light Microscopy', 108), (u'A Src-Like Inactive Conformation in the Abl Tyrosine Kinase Domain', 90)]


In [12]:
assert cites[0] == (u'Self-Organization of the <i>Escherichia coli</i> Chemotaxis Network Imaged with Super-Resolution Light Microscopy', 108)
assert cites[1] == (u'A Src-Like Inactive Conformation in the Abl Tyrosine Kinase Domain', 90)

<div class="alert alert-success">
For the papers returned from a search for the first 50 articles affiliated with California Institute of Technology above output the number of EuropePMC citations, Facebook posts and Tweets. It may take some time for the API to return results for 50 articles.
</div>

In [15]:
# Create a list of the first 50 DOIs and get the ALMs from PLOS API
caldois = [doc['doi'][0] for doc in caltech['response']['docs']][0:50]
cal_alm = pyalm.get_alm(caldois, info='detail', instance='plos')

In [20]:
# Construct a list of tuples called `results` as above with each of the elements required plus the title
results = []
for article in cal_alm['articles']:
    results.append((article.title, 
                   article.sources['pmceurope'].metrics.total,
                   article.sources['facebook'].metrics.total,
                   article.sources['twitter'].metrics.total))
    
results

[(u'Exploring the Structure of Human Defensive Responses from Judgments of Threat Scenarios',
  0,
  0,
  3),
 (u'Spatio-Temporal Differences in Dystrophin Dynamics at mRNA and Protein Levels Revealed by a Novel FlipTrap Line',
  0,
  3,
  2),
 (u'Computational Design of the \u03b2-Sheet Surface of a Red Fluorescent Protein Allows Control of Protein Oligomerization',
  0,
  0,
  2),
 (u'Neural Computations Mediating One-Shot Learning in the Human Brain',
  0,
  28,
  12),
 (u'How Food Controls Aggression in <i>Drosophila</i>', 0, 0, 7),
 (u'The Herpes Virus Fc Receptor gE-gI Mediates Antibody Bipolar Bridging to Clear Viral Antigens from the Cell Surface',
  3,
  0,
  3),
 (u'Genome-Wide Analysis Reveals Coating of the Mitochondrial Genome by TFAM',
  3,
  0,
  7),
 (u'Cooperative Binding', 1, 0, 19),
 (u'Tuning Promoter Strength through RNA Polymerase Binding Site Design in <i>Escherichia coli</i>',
  15,
  0,
  11),
 (u'Enhancement in Motor Learning through Genetic Manipulation of th

In [21]:
assert len(results) == 50

<div class="alert alert-success">
Find articles with at least some tweets and obtain account names of the tweets. Identify whether there are accounts tweeting more than one article. Note that some accounts might tweet about the same article twice. We are only interested in cases where the same account is tweeting about more than one article. Make a list called `common_tweeters` that contains the account handles for anyone who tweeted more than one article.
</div>

In [25]:
# Probably easier to iterate through the article objects than to have to cross reference from the list we just
# but either will work
tweeted = []
for article in cal_alm['articles']:
    if article.sources['twitter'].metrics.total != 0:
        tweeted.append(article)
    
len(tweeted)

18

In [26]:
assert len(tweeted) > 10

In [31]:
# Now obtain all the account names. Look at the example notebook for how to get this information.
unique_accounts = set()
for article in tweeted:
    for tweet in article.sources['twitter'].events:
        unique_accounts.add(tweet['event']['user'])
        
len(unique_accounts)


87

In [33]:
# Now check whether an account occurs tweeting more than one article. This requires a little care and attention. 
common_tweeters = []

for account in unique_accounts:
    count = 0
    for article in tweeted:
        tweeters = [tweet['event']['user'] for tweet in article.sources['twitter'].events]
        if account in tweeters:
            count+=1
            
    if count > 1:
        common_tweeters.append(account)

len(common_tweeters)

2

In [34]:
common_tweeters

[u'uranus_2', u'thebaybot']

In [35]:
assert common_tweeters != []