# Python CrossRef Journal Query

Carl Huang<br>
2019-02-25

## Motivation 
Are there ways to get regular updates from a journal other than using a headless browser? This short note seeks to try out one python package: 
Fabio Batalha's <a href="https://github.com/fabiobatalha/crossrefapi">CrossRef API</a>. 

For R users, the <a href="https://github.com/ropensci/rcrossref">rcrossref</a> (R interface to various CrossRef APIs) seems neat. I will try to stick to Python because I might need  to pass the results to my Mediawiki later. 

https://qiita.com/ina111/items/bbdecf9c711cc0bc54d5

I will be primarily dealing with `Works`, as this refers to the journal articles (or books):

In [1]:
from crossref.restful import Works
works = Works()

I can look at an author's name through an article's DOI:

In [2]:
works.doi('10.1073/pnas.1317670111')['author']

[{'affiliation': [], 'family': 'Enos', 'given': 'R. D.', 'sequence': 'first'}]

I can also query the Journal's title:

In [3]:
works.doi('10.1073/pnas.1317670111')['container-title']

['Proceedings of the National Academy of Sciences']

I can also look at a Journal's information:

In [4]:
from crossref.restful import Journals
journals = Journals()

Polisci = {
    'AJPS':'00925853', 
    'APSR':'00030554',
    'CMPS':'0738-8942',
    'IS':'0162-2889',
    'IO':'0020-8183',
    'IMR':'1747-7379'}

journals.journal(Polisci['IMR'])

{'ISSN': ['1747-7379', '0197-9183'],
 'breakdowns': {'dois-by-issued-year': [[1994, 289],
   [1987, 282],
   [1972, 268],
   [1997, 264],
   [1973, 258],
   [2000, 253],
   [1995, 253],
   [1998, 248],
   [1992, 248],
   [1984, 240],
   [1993, 233],
   [1999, 226],
   [1986, 226],
   [1989, 225],
   [1996, 221],
   [1983, 218],
   [1990, 211],
   [1981, 210],
   [1991, 204],
   [1975, 204],
   [2006, 203],
   [1974, 198],
   [1971, 194],
   [1979, 189],
   [1982, 187],
   [1985, 180],
   [1978, 178],
   [1980, 160],
   [1976, 158],
   [2018, 150],
   [1977, 148],
   [1988, 137],
   [1970, 109],
   [2016, 74],
   [2017, 73],
   [1969, 71],
   [2014, 68],
   [2004, 67],
   [2015, 64],
   [2003, 60],
   [2007, 56],
   [2001, 55],
   [2011, 49],
   [2009, 47],
   [1968, 47],
   [2013, 46],
   [2010, 45],
   [2012, 43],
   [2008, 43],
   [2005, 36],
   [2002, 34],
   [1967, 30],
   [2019, 9],
   [1966, 9]]},
 'counts': {'backfile-dois': 7766, 'current-dois': 232, 'total-dois': 7998},
 'cove

`.count` and `.url` might come in handy in some cases:

In [5]:
journals.works(Polisci['IMR']).query('Taiwan').count()

4

In [6]:
journals.works(Polisci['IMR']).query('Taiwan').url

'https://api.crossref.org/journals/1747-7379/works?query=Taiwan'

In [7]:
list = [i for i in journals.works(Polisci['IMR']).query('Taiwan').select(['author','title','issued','container-title','DOI'])]
list

[{'DOI': '10.1111/j.1747-7379.2007.00091.x',
  'author': [{'affiliation': [{'name': 'Director of the Graduate Institute of Sociology, National Sun Yat-sen University, Taiwan'}],
    'family': 'Wang',
    'given': 'Hong-zen',
    'sequence': 'first'}],
  'container-title': ['International Migration Review'],
  'issued': {'date-parts': [[2007, 9]]},
  'title': ['Hidden Spaces of Resistance of the Subordinated: Case Studies from Vietnamese Female Migrant Partners in Taiwan']},
 {'DOI': '10.1177/019791839202600303',
  'author': [{'affiliation': [{'name': 'University of Cincinnati'}],
    'family': 'Selya',
    'given': 'Roger Mark',
    'sequence': 'first'}],
  'container-title': ['International Migration Review'],
  'issued': {'date-parts': [[1992, 9]]},
  'title': ['Illegal Migration in Taiwan: A Preliminary Overview']},
 {'DOI': '10.1177/0197918318769315',
  'author': [{'affiliation': [],
    'family': 'Rainwater',
    'given': 'Katie',
    'sequence': 'first'},
   {'affiliation': [{'na

## Formatting Table

### Approach 1: DataFrame first, fix later

In [8]:
from pandas import DataFrame 
data = DataFrame(list)
data 

Unnamed: 0,DOI,author,container-title,issued,title
0,10.1111/j.1747-7379.2007.00091.x,"[{'family': 'Wang', 'sequence': 'first', 'give...",[International Migration Review],"{'date-parts': [[2007, 9]]}",[Hidden Spaces of Resistance of the Subordinat...
1,10.1177/019791839202600303,"[{'family': 'Selya', 'sequence': 'first', 'giv...",[International Migration Review],"{'date-parts': [[1992, 9]]}",[Illegal Migration in Taiwan: A Preliminary Ov...
2,10.1177/0197918318769315,"[{'family': 'Rainwater', 'sequence': 'first', ...",[International Migration Review],"{'date-parts': [[2018, 5, 16]]}",[Thai Guestworker Export in Decline]
3,10.1111/j.1747-7379.2011.00847.x,"[{'family': 'Tsai', 'sequence': 'first', 'give...",[International Migration Review],"{'date-parts': [[2011, 6]]}",[“Foreign Brides” Meet Ethnic Politics in Taiwan]


### Approach 2: Fix first, DataFrame later

In [9]:
df = DataFrame(columns=['Last Name', 'First Name', 'Author Title', 'Journal', 'Year', 'Month', 'Title', 'DOI'])
newlist = list
for item in newlist: 
    item['Last Name'] = item['author'][0]['family']
    item['First Name'] = item['author'][0]['given']
    if len(item['author'][0]['affiliation'])==0:
        item['Author Title'] = ""
    else:
        item['Author Title'] = item['author'][0]['affiliation'][0]['name']
    item['Journal'] =  item['container-title'][0]
    item['Year'] =  str(item['issued']['date-parts'][0][0])
    item['Month'] = str(item['issued']['date-parts'][0][1])
    item['Title'] = item['title'][0]
    
    item={i:item[i] for i in ['Last Name', 'First Name', 'Author Title', 'Journal', 'Year', 'Month', 'Title', 'DOI']}
    
    df=df.append(item, ignore_index=True)

In [10]:
df

Unnamed: 0,Last Name,First Name,Author Title,Journal,Year,Month,Title,DOI
0,Wang,Hong-zen,Director of the Graduate Institute of Sociolog...,International Migration Review,2007,9,Hidden Spaces of Resistance of the Subordinate...,10.1111/j.1747-7379.2007.00091.x
1,Selya,Roger Mark,University of Cincinnati,International Migration Review,1992,9,Illegal Migration in Taiwan: A Preliminary Ove...,10.1177/019791839202600303
2,Rainwater,Katie,,International Migration Review,2018,5,Thai Guestworker Export in Decline,10.1177/0197918318769315
3,Tsai,Ming-Chang,"Department of Sociology, National Taipei Unive...",International Migration Review,2011,6,“Foreign Brides” Meet Ethnic Politics in Taiwan,10.1111/j.1747-7379.2011.00847.x


In [11]:
print('{|class="wikitable sortable"')
for column_name in df.columns:
    print('!', column_name)
print('|-')
for i in range(len(df)):
    for j in range(len(df.columns)):
        print('|', df.ix[i,j])
    if i+1<len(df):
        print("|-")
    else:
        print("|}")

{|class="wikitable sortable"
! Last Name
! First Name
! Author Title
! Journal
! Year
! Month
! Title
! DOI
|-
| Wang
| Hong-zen
| Director of the Graduate Institute of Sociology, National Sun Yat-sen University, Taiwan
| International Migration Review
| 2007
| 9
| Hidden Spaces of Resistance of the Subordinated: Case Studies from Vietnamese Female Migrant Partners in Taiwan
| 10.1111/j.1747-7379.2007.00091.x
|-
| Selya
| Roger Mark
| University of Cincinnati
| International Migration Review
| 1992
| 9
| Illegal Migration in Taiwan: A Preliminary Overview
| 10.1177/019791839202600303
|-
| Rainwater
| Katie
| 
| International Migration Review
| 2018
| 5
| Thai Guestworker Export in Decline
| 10.1177/0197918318769315
|-
| Tsai
| Ming-Chang
| Department of Sociology, National Taipei University, Taipei, Taiwan
| International Migration Review
| 2011
| 6
| “Foreign Brides” Meet Ethnic Politics in Taiwan
| 10.1111/j.1747-7379.2011.00847.x
|}
