# Researcher citations analysis 
This Python notebook shows how to use the [Dimensions Analytics API](https://www.dimensions.ai/dimensions-apis/) in order to ...

## Prerequisites

This notebook assumes you have installed the [Dimcli](https://pypi.org/project/dimcli/) library and are familiar with the *Getting Started* tutorial.


In [1]:
!pip install dimcli --quiet 

import dimcli
from dimcli.shortcuts import *
import json
import sys
import pandas as pd
import plotly.express as px
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports
  from plotly.offline import init_notebook_mode
  init_notebook_mode(connected=True)
#

print("==\nLogging in..")
# https://github.com/digital-science/dimcli#authentication
ENDPOINT = "https://app.dimensions.ai"
if 'google.colab' in sys.modules:
  import getpass
  USERNAME = getpass.getpass(prompt='Username: ')
  PASSWORD = getpass.getpass(prompt='Password: ')    
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
else:
  USERNAME, PASSWORD  = "", ""
  dimcli.login(USERNAME, PASSWORD, ENDPOINT)
dsl = dimcli.Dsl()

==
Logging in..
Dimcli - Dimensions API Client (v0.7.1)
Connected to: https://app.dimensions.ai - DSL v1.25
Method: dsl.ini file


## 1. Getting outgoing citations (references) for a researcher


* see [6-Working-with-lists.html](https://api-lab.dimensions.ai/cookbooks/1-getting-started/6-Working-with-lists.html)

Eg https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01372572360.83


In [62]:
RESID = "ur.01372572360.83" #@param {type:"string"}

query = f"""
search publications 
    where researchers.id="{RESID}" return publications[id+title+reference_ids]
"""

print("===\nQuery:\n", query)     
print("===\nRetrieving authored publications .. ")
researcher_pubs = dsl.query_iterative(query).as_dataframe()


#
# focus on references 
#

#
# Let's extract all referenced articles, and sort them by how many times they are referenced. 
# remember that Dimensions keeps track only of references that have a DOI
#


ref_ids = researcher_pubs.explode("reference_ids").dropna().groupby('reference_ids', as_index=False).count()[['reference_ids', 'id']]
ref_ids = ref_ids.rename(columns={'reference_ids' : 'id', 'id' : 'citations'})
print("===\nTotal referenced publications: ", len(ref_ids))
print("===\nRetrieving referenced publications, 250 at a time.. \n")

ids_chunks = chunks_of(list(ref_ids.id), 250) 

data = []
for c in ids_chunks:

    res = dsl.query_iterative(f"""
                     search publications
                        where id in {json.dumps(c)}
                        return publications
                    """)
    data += res.publications

#        
# finally, turn the pubs_list JSON into a Dimcli Dataset  
#

results = DslDataset.from_publications_list(data)

===
Query:
 
search publications 
    where researchers.id="ur.01372572360.83" return publications[id+title+reference_ids]

===
Retrieving authored publications .. 
1000 / ...
50 / 50
===
Records extracted: 50
===
Total referenced publications:  182
===
Retrieving referenced publications, 250 at a time.. 

1000 / ...
182 / 182
===
Records extracted: 182


### Most cited publications

In [63]:
references = results.as_dataframe()
references = references.merge(ref_ids, on="id").sort_values('citations', ascending=False)
references['url'] = references.apply(lambda x: dimensions_url(x['id']),  axis=1)
references.head(10)

Unnamed: 0,year,id,type,pages,author_affiliations,title,volume,issue,journal.id,journal.title,citations,url
70,1986,pub.1046809313,article,89-89,"[[{'first_name': 'Marc H.', 'last_name': 'Raib...",Legged Robots That Balance,1,4,jour.1033604,IEEE Intelligent Systems,8,https://app.dimensions.ai/details/publication/...
126,1977,pub.1036186182,article,95-110,"[[{'first_name': 'H.', 'last_name': 'Hemami', ...",The inverted pendulum and biped stability,34,1-2,jour.1007478,Mathematical Biosciences,5,https://app.dimensions.ai/details/publication/...
84,1984,pub.1033269614,article,75-92,"[[{'first_name': 'Marc H.', 'last_name': 'Raib...",Experiments in Balance with a 3D One-Legged Ho...,3,2,jour.1041772,The International Journal of Robotics Research,5,https://app.dimensions.ai/details/publication/...
123,1977,pub.1061472102,article,452-458,"[[{'first_name': 'H.', 'last_name': 'Hemami', ...",Postural and gait stability of a planar five l...,22,3,jour.1033500,IEEE Transactions on Automatic Control,5,https://app.dimensions.ai/details/publication/...
89,1984,pub.1062103690,article,75-81,"[[{'first_name': 'M. H.', 'last_name': 'Raiber...",Experiments in Balance With a 2D One-Legged Ho...,106,1,jour.1026140,Journal of Dynamic Systems Measurement and Con...,5,https://app.dimensions.ai/details/publication/...
72,1986,pub.1062622623,article,1292-1294,"[[{'first_name': 'M H', 'last_name': 'Raibert'...",Symmetry in running,231,4743,jour.1346339,Science,5,https://app.dimensions.ai/details/publication/...
146,1973,pub.1019025254,article,191-242,"[[{'first_name': 'Miomir', 'last_name': 'Vukob...",Mathematical models of general anthropomorphic...,17,3-4,jour.1007478,Mathematical Biosciences,4,https://app.dimensions.ai/details/publication/...
145,1973,pub.1034061036,article,313-314,"[[{'first_name': 'TERENCE J.', 'last_name': 'D...",Energetic Cost of Locomotion in Kangaroos,246,5431,jour.1018957,Nature,4,https://app.dimensions.ai/details/publication/...
73,1986,pub.1061308618,article,70-82,"[[{'first_name': 'M.', 'last_name': 'Raibert',...",Running on four legs as though they were one,2,2,jour.1143897,IEEE Journal on Robotics and Automation,4,https://app.dimensions.ai/details/publication/...
112,1979,pub.1062228853,article,583-592,"[[{'first_name': 'Cliff', 'last_name': 'Frohli...",Do springboard divers violate angular momentum...,47,7,jour.1056671,American Journal of Physics,3,https://app.dimensions.ai/details/publication/...


In [64]:
# for colab users: save to a google sheet
if COLAB_ENV:
    MYDF = references
    title = f"Publications cited by {RESID}"
    from google.colab import auth
    auth.authenticate_user()

    import gspread
    from gspread_dataframe import get_as_dataframe, set_with_dataframe
    from oauth2client.client import GoogleCredentials

    gc = gspread.authorize(GoogleCredentials.get_application_default())
    sh = gc.create(title)
    worksheet = gc.open(title).sheet1
    set_with_dataframe(worksheet, MYDF)
    spreadsheet_url = "https://docs.google.com/spreadsheets/d/%s" % sh.id
    print(spreadsheet_url)

### Most cited authors

* an authors dataframe can be obtained simply by using Dimcli's `as_dataframe_authors` method
* we group the authors using 3 fields: "researcher_id", "first_name", "last_name"
* don't be surprised if you find the same researchers at the top of the list! self-citations are quite a common thing..
* like before, we generate a nice Dimensions URL in case you want to follow up on these authors 

In [65]:
authors = results.as_dataframe_authors()\
    .groupby(["researcher_id", "first_name", "last_name"], as_index=False)\
    .count().sort_values("pub_id", ascending=False)\
    .dropna(subset=['researcher_id'])[['researcher_id', 'first_name', 'last_name', 'pub_id']]
authors.rename(columns={'pub_id' : 'citations'}, inplace=True)
authors['url'] = authors.apply(lambda x: dimensions_url(x['researcher_id']), axis=1)
authors.head(20)

Unnamed: 0,researcher_id,first_name,last_name,citations,url
292,ur.01372572360.83,Marc H.,Raibert,10,https://app.dimensions.ai/discover/publication...
147,ur.01013753617.38,H.,Hemami,9,https://app.dimensions.ai/discover/publication...
289,ur.01372572360.83,M. H.,Raibert,5,https://app.dimensions.ai/discover/publication...
404,ur.07506450371.42,R. McN.,Alexander,4,https://app.dimensions.ai/discover/publication...
142,ur.010040341511.95,G. Melvill,Jones,4,https://app.dimensions.ai/discover/publication...
176,ur.01063256350.86,A.,Takanishi,3,https://app.dimensions.ai/discover/publication...
148,ur.01013753617.38,Hooshang,Hemami,3,https://app.dimensions.ai/discover/publication...
190,ur.01107565641.99,Thomas A.,McMahon,3,https://app.dimensions.ai/discover/publication...
396,ur.07423063205.32,Masahiro,Fujita,3,https://app.dimensions.ai/discover/publication...
357,ur.016432514545.06,Yoshihiro,Kuroki,3,https://app.dimensions.ai/discover/publication...


In [66]:
# for colab users: save to a google sheet
if COLAB_ENV:
    MYDF = authors
    title = f"Authors cited by {RESID}"
    from google.colab import auth
    auth.authenticate_user()

    import gspread
    from gspread_dataframe import get_as_dataframe, set_with_dataframe
    from oauth2client.client import GoogleCredentials
    
    gc = gspread.authorize(GoogleCredentials.get_application_default())
    sh = gc.create(title)
    worksheet = gc.open(title).sheet1
    set_with_dataframe(worksheet, MYDF)
    spreadsheet_url = "https://docs.google.com/spreadsheets/d/%s" % sh.id
    print(spreadsheet_url)

## Conclusion

In this tutorial we have demonstrated how to query for ... using the [Dimensions Analytics API](https://www.dimensions.ai/dimensions-apis/). 

