<a href="https://colab.research.google.com/github/Soul-Jacker/GoogleScholarProfiler/blob/main/Google_Scholar_Profiler.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Google Scholar Profiler

---

This notebook serves as an interface to facilitate the collection of Google Scholar profile data.  The current implementation uses the  `scholarly` module.  Currently, this notebook returns a limited amount of data from each profile.  Users do not need to know how to program in Python to use this notebook.  Simply read each section of the notebook and run each set of command.  The code is hidden from the user, but each section can be expanded (or unhidden) to examine the underlying code.   

Searches can be done using either a scholar's name or their Google Scholar ID.  Please note the following:

*   This notebook can only extract Google Scholar profile data *if* a profile exists.  Users will have to use other software like _Publish or Perish_ if they want to examine metrics for scholars without a Google Scholar profile.  
*   The ideal method for extracting profile data is from a Scholar ID rather than a name.  Since Google Scholar covers all the scientific disciplines, there are numerous scholars with the same name.  

To find a Google Scholar ID, the user can perform a quick search with this notebook, which may return the ID (assuming the search was successful).  Otherwise, you can obtain a Google ID from a scholar's profile in the URL of their actual profile page.  The following example shows where in the URL the ID is embedded.   





#### Workspace set-up
This section installs the `scholarly` and `pandas` in quiet mode.  Feel free to expand this section to examine the code block.  Note that users have to run this only a single time in a given session.  

In [None]:
!pip install --upgrade gspread -q

In [None]:
!pip install scholarly==0.2.1

Collecting scholarly==0.2.1
  Downloading scholarly-0.2.1-py2.py3-none-any.whl (10 kB)
Collecting bibtexparser
  Downloading bibtexparser-1.2.0.tar.gz (46 kB)
[K     |████████████████████████████████| 46 kB 2.2 MB/s 
[?25hCollecting arrow
  Downloading arrow-1.2.2-py3-none-any.whl (64 kB)
[K     |████████████████████████████████| 64 kB 1.9 MB/s 
Collecting cryptography>=1.3.4
  Downloading cryptography-36.0.2-cp36-abi3-manylinux_2_24_x86_64.whl (3.6 MB)
[K     |████████████████████████████████| 3.6 MB 45.7 MB/s 
[?25hCollecting pyOpenSSL>=0.14
  Downloading pyOpenSSL-22.0.0-py2.py3-none-any.whl (55 kB)
[K     |████████████████████████████████| 55 kB 3.6 MB/s 
Building wheels for collected packages: bibtexparser
  Building wheel for bibtexparser (setup.py) ... [?25l[?25hdone
  Created wheel for bibtexparser: filename=bibtexparser-1.2.0-py3-none-any.whl size=36713 sha256=a2abe542155e2808a8bb6be8f38e6d47a9f788993e90df5cb1389dce0337cf97
  Stored in directory: /root/.cache/pip/wheel

In [None]:
from scholarly import scholarly
import pandas as pd
from google.colab import auth
import gspread
from oauth2client.client import GoogleCredentials

ImportError: ignored

## Quick search for a single author
---

Run this code block to perform a search for a given author, either by the scholar's name.  This can be a quick way to examine a profile and obtain the `scholar_id` in the output.  Please enter the scholar's name directly without any quotes.


In [None]:
#@title Run the single author search { form-width: "25%", display-mode: "form" }


Scholar_Name = "Robert Joseph Taylor" #@param {type:"string"}

#search_query = scholarly.search_author('Brian Perron')
search_query = scholarly.search_pubs_query('The perception of physical stability of 3d objects The role of parts')

author = next(search_query)
#scholarly.pprint(scholarly.fill(author, sections=['basics', 'indices']))
single_df = scholarly.fill(author, sections=['basics', 'indices'])
#single_df.pop('interests')
single_out = pd.DataFrame(single_df.items())
single_out = single_out.set_index(0).T
single_out = single_out.drop(['container_type', 'filled', 'source', 'url_picture', 'interests'], axis = 1)
single_out = single_out[['name', 'scholar_id', 'affiliation', 'email_domain', 'citedby', 'citedby5y', 'hindex', 'hindex5y', 'i10index', 'i10index5y']]
del(single_df)
single_out = single_out.T
print(single_out)



## Multiple author search

In this code block, the user can perform a search from a list of names from a Google Sheet found [here](https://docs.google.com/spreadsheets/d/13oHxzWswrzJkWUJRHx7Ljwlq50MGWRTlwWwoEDKcgcs/edit?usp=sharing).  Please note this is a public sheet for demonstration purposes.  Please note that the user will be prompted to authenticate.  You will be provided a link and then you can copy and paste the access code to perform the search.   

In [None]:
#@title Run multiple author search {display-mode: "form"}
%load_ext google.colab.data_table

auth.authenticate_user()
gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open('GoogleScholarProfiler_INPUT').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
#print(rows)

#Convert to a DataFrame and render.

a = pd.DataFrame.from_records(rows)
my_authors = a[0]
#df1 = single_out.drop(df.index, inplace = True)

df = scholarly.fill(author, sections=['basics', 'indices'])
df1 = pd.DataFrame(df.items())


#my_authors = ["Katie Richards-Schuster", "Lisa Wexler", "Todd Herrenkohl", "Lorraine Gutierrez", "Robert Joseph Taylor"]

for who in my_authors:
  try:
    print('retrieving:     ', who)
    search_query = scholarly.search_author(who)
    author = next(search_query)
    hope = scholarly.fill(author, sections=['basics', 'indices'])
    df = pd.DataFrame(hope.items())
    df = df.set_index(0).T
    df1 = pd.concat([df1, df])
  except:
    print('COULD NOT FIND: ', who)
    continue

df1 = df1.reset_index(drop = True)
df1 = df1.drop_duplicates(subset=['scholar_id'])
df1.drop([0,1], axis=1, inplace=True)
df1.drop([0], axis=0, inplace = True)
df1.drop(['container_type', 'filled', 'source', 'url_picture'], axis = 1)
df1 = df1[['name', 'scholar_id', 'affiliation', 'email_domain', 'citedby', 'citedby5y', 'hindex', 'hindex5y', 'i10index', 'i10index5y', 'interests']]
df1 = df1.reset_index(drop = True)
df1




retrieving:      Cristina Bares
retrieving:      William Elliott
retrieving:      Terri Friedline
retrieving:      Shawna Lee
retrieving:      Kathryn Maguire-Jack
retrieving:      Sandra Momper
COULD NOT FIND:  Sandra Momper
retrieving:      Emily Nicklett
retrieving:      Brian Perron
retrieving:      Rogerio Pinto
retrieving:      Luke Shaefer
retrieving:      Trina Williams Shanks
COULD NOT FIND:  Trina Williams Shanks
retrieving:      Daphne Watkins
retrieving:      Derek Brown
retrieving:      Sheretta Butler-Barnes
retrieving:      Leopoldo Cabassa
retrieving:      Alexis Duncan
retrieving:      Amy Eyler
retrieving:      Patrick Fowler
retrieving:      Michal Grinstein-Weiss
retrieving:      Jenine Harris
retrieving:      David Patterson Silver Wolf
retrieving:      Carmela Alc√°ntara
COULD NOT FIND:  Carmela Alc√°ntara
retrieving:      Heidi Allen
retrieving:      Jinyu Liu
retrieving:      Desmond Patton
retrieving:      Craig Schwalbe
retrieving:      Elwin Wu
retrieving:   

Unnamed: 0,name,scholar_id,affiliation,email_domain,citedby,citedby5y,hindex,hindex5y,i10index,i10index5y,interests
0,Cristina Bares,Rhw-i9AAAAAJ,"Associate Professor of Social Work, University...",@umich.edu,825,556,14,13,19,17,"[adolescent substance use, mental health, ciga..."
1,Keith Crocker,HE_T7wEAAAAJ,The William Elliott Chaired Professor of Insur...,@psu.edu,5624,1509,26,16,32,22,[]
2,Terri Friedline,eBaFd_8AAAAJ,"Associate Professor, University of Michigan Sc...",@umich.edu,1304,984,23,20,31,28,[Financial inclusion]
3,Shawna J. Lee,DJKxPRwAAAAJ,University of Michigan School of Social Work,@umich.edu,2922,2096,31,25,46,44,"[fathering, child welfare, parenting programs,..."
4,Kathryn Maguire-Jack,3AftVCAAAAAJ,"Associate Professor, School of Social Work, Un...",@umich.edu,1328,1290,20,20,29,28,"[Child maltreatment, prevention, neighborhoods]"
...,...,...,...,...,...,...,...,...,...,...,...
108,Mary Elizabeth Collins,nRkUF6kAAAAJ,"Professor of Social Welfare Policy, Boston Uni...",@bu.edu,3418,1555,29,19,51,36,"[child welfare, aging out of care, youth servi..."
109,Jacqueline Corcoran,ev3VfykAAAAJ,University of Pennsylvania,@upenn.edu,4587,2033,33,21,57,37,[]
110,Zvi Gellis,MijOFTMAAAAJ,Unknown affiliation,,2416,1192,24,19,37,29,"[Social Work, Geriatrics, Mental Health, Geron..."
111,Lori K. Holleran Steiker,ngnXsQQAAAAJ,"Professor, University of Texas",@mail.utexas.edu,1061,734,11,9,12,9,"[Youth and Substance Misuse, use disorders and..."


### Write the results to a Google Sheet


In [None]:
dfCombined = df1.copy()


In [None]:
dfCombined = dfCombined.drop(['interests'], axis=1)

In [None]:
lOfLists = dfCombined.to_numpy().tolist()
headers = dfCombined.columns.to_list()
dataToWrite = [headers] + lOfLists
worksheet = gc.open('GoogleScholarProfiler_OUTPUT').sheet1
worksheet.update(None, dataToWrite)


{'spreadsheetId': '12oCjjSktXXs7X3v6WZyGmWB8BW-q04thCqANiGCC4nU',
 'updatedCells': 1140,
 'updatedColumns': 10,
 'updatedRange': 'Sheet1!A1:J114',
 'updatedRows': 114}